ollama-low-vram-model-pick
by WiseChef
Pick the right Ollama model for a low-VRAM GPU (≤4 GB) when offloading LLM work from a paid API (Z.AI, OpenAI) to local. Avoids the 'I'll just use the latest Gemma' trap — the newest models often DON'T fit on small consumer GPUs even at Q4. Validated 2026-04-27 on a GTX 1650 Ti (4 GB) for Cognee LLM offload. Use when migrating cognee/embeddings/agent inference to local hardware, or when picking an Ollama model on any small GPU before pulling 9 GB of weights you can't run.
Install in your agent
recipes_install(slug="ollama-low-vram-model-pick") https://recipes.wisechef.ai/api/skills/install?slug=ollama-low-vram-model-pick&ref=skill-page Works in any MCP-capable agent — Claude Code, Cursor, Cline, OpenClaw, Hermes, Windsurf.
Your install command
Sign in to get your API key and the personalized install command.
Sign in to install →This skill requires Pro. Upgrade to install.
See plans →Copy and paste into your agent's environment:
export RECIPES_API_KEY=rec_live_… Get key → recipes install ollama-low-vram-model-pick First time? Generate your API key on the Library page.
Skill files