published apr 17, 2026

Run Your Own Coding Agent on Your Laptop (for Free)

advanced
Step 1

Check your hardware and ask an LLM what to run

Before you pull anything, check what your machine can handle.

  • On Mac: Apple menu > About This Mac, then screenshot the specs panel.
  • On Windows: Settings > System > About, then screenshot the specs panel.

Drop that screenshot into Claude, ChatGPT, or any LLM, and ask:

Prompt
Which Ollama coder models can I realistically run on this machine with Claude Code or Codex?

It will read the RAM and chip from your screenshot and give you a short list.

Use this rough guide for what fits where:

  • 8 GB RAM: 3B parameters or smaller, such as qwen3-coder:3b.
  • 12 GB RAM: 4–7B parameters, such as gemma4:e2b or qwen3-coder:7b.
  • 16 GB RAM: 7–12B parameters, such as qwen3-coder:7b or gemma4:e4b.
  • 32 GB+ RAM or GPU: 20B+ parameters, such as qwen3-coder:32b, gpt-oss:20b, or gemma4:26b.
Pro tip: Download the biggest version you can reasonably fit. Bigger means more reliable tool calls, which is what actually breaks small models inside Claude Code.
Step 2

Pick a coder model that supports agentic tools

Browse ollama.com/search?q=coder and open the page for a model your LLM recommended.

On the model page, scroll to the Applications section and confirm it lists Claude Code, Codex, OpenCode, or OpenClaw. If it lists none of them, the model does not support the tool calls agentic coding requires, so skip it.

The three models that held up best in testing were:

  • qwen3-coder: the purpose-built coder pick, with strong raw code generation at its size.
  • gemma4: explicit tool-use and thinking training, and more reliable multi-step tool chains.
  • gpt-oss: OpenAI’s open-weights MoE model with strong agentic support.
Pro tip: If you cannot choose between two candidates, pull both. Disk is cheap, and you can swap between them with the --model flag. Keep the one that holds up in your workflow.
Step 3

Launch your coding agent with the local model

From the model’s Ollama page, copy the launch command. It looks like:

ollama launch claude --model gemma4:e4b

CleanShot 2026-04-17 at 10.48.39@2x

Open a terminal in your actual project folder, paste the command, and hit Enter. Confirm the download when prompted, then wait for the weights to pull the first time.

After that, Ollama drops you into Claude Code pointed at the local model instead of Anthropic’s API.

Inside the session, type /model to confirm which model is wired in. Every response from here costs zero tokens.

Pro tip: Run ollama ps in a second terminal to see what is actually running. It shows the active model, RAM in use, and GPU utilization. 100% GPU means you are fully accelerated. Anything lower means part of the model is spilling to CPU, and responses will be slower.
Step 4

Bump the context window before serious work

This is the single most important setting in the setup. By default, Ollama allocates only 4K of context per model, which is too small for agentic coding. Claude Code can read one file, fill the buffer, and immediately start forgetting the rest of the conversation.

Fix it once:

  • Open the Ollama app.
  • Go to Ollama menu > Settings.
  • Find the Context slider.
  • Bump it to 32K to start, or higher if your specs allow.
Pro tip: Ask your LLM what context size your specs can safely handle. Maxing it out can push the model past your GPU’s limits and crash things. Start at 32K, verify with ollama ps, then raise if there is headroom.
Step 5

Try OpenCode if Claude Code feels too heavy

Claude Code is a sophisticated harness with a lot of tools exposed at once. Small local models sometimes get confused by the volume of choices. OpenCode is a lighter-weight coding agent built for this case, and it uses the same ollama launch pattern.

Install it on Mac with one line:

curl -fsSL https://opencode.ai/install | bash

Then launch it the same way:

ollama launch opencode --model gemma4:e4b

Pro tip: Smaller models work better when you ask them to think less. Hit Shift+Tab inside Claude Code or OpenCode to toggle plan mode, where the agent writes its approach before touching files. Lower the reasoning effort in Settings if the agent keeps over-thinking simple tasks.