PRO ops v1.0.0 1 install

ollama-low-vram-model-pick

by WiseChef

Pick the right Ollama model for a low-VRAM GPU (≤4 GB) when offloading LLM work from a paid API (Z.AI, OpenAI) to local. Avoids the 'I'll just use the latest Gemma' trap — the newest models often DON'T fit on small consumer GPUs even at Q4. Validated 2026-04-27 on a GTX 1650 Ti (4 GB) for Cognee LLM offload. Use when migrating cognee/embeddings/agent inference to local hardware, or when picking an Ollama model on any small GPU before pulling 9 GB of weights you can't run.

Install in your agent

→ First time? Tell your agent: "install the recipes skill from recipes.wisechef.ai/skill" — then the lines below add this skill.

Quick install ollama-low-vram-model-pick

In your agent (MCP)

recipes_install(slug="ollama-low-vram-model-pick")

Signed install URL (curl-able with your API key)

https://recipes.wisechef.ai/api/skills/install?slug=ollama-low-vram-model-pick&ref=skill-page

Works in any MCP-capable agent — Claude Code, Cursor, Cline, OpenClaw, Hermes, Windsurf.

Skill files