published apr 16, 2026

Run an Llm on Your Laptop for Free with Ollama

beginner
Step 1

Install Ollama

Go to ollama.com/download and download the installer for your operating system. No account, sign-in, or setup is required first.

  • Mac: open the downloaded file and drag the Ollama icon into your Applications folder.
  • Windows: run the installer and click through the wizard.
  • Linux: open a terminal and run curl -fsSL https://ollama.com/install.sh | sh .

Once it is installed, open Ollama from your Applications folder or Start menu.

Pro tip: Mac users who prefer Homebrew can run brew install ollama instead of using the .dmg installer.
Step 2

Pick a Model That Fits Your Machine

Click New Chat in the Ollama app, then click the model dropdown at the bottom of the chat window. The right model depends on how much RAM your machine has. The source tested gemma3:4b on a 16 GB MacBook and found it ran comfortably.

CleanShot 2026-04-16 at 16.39.40@2x
  • 4 GB RAM: gemma3:1b, qwen3:0.6b, or tinyllama. These are tiny models: fast, but with limited reasoning, best for short rewrites and simple Q&A.
  • 8 GB RAM: gemma3:4b, llama3.2:3b, or phi4-mini. This is the everyday tier for chat, drafting, and summarizing.
  • 16 GB RAM: gemma3:12b, llama3.1:8b, or qwen3:8b. This is the sweet spot for most laptops and is noticeably smarter for real writing work.
  • 32 GB+ RAM or a discrete GPU: gemma3:27b, gpt-oss:20b, or qwen3:32b. These are heavyweight models and the closest you will get to cloud quality on a single machine.

Apple Silicon Macs share memory between CPU and GPU, so a 16 GB M-series Mac generally runs models that a 16 GB PC struggles with. Pick a model and Ollama downloads it the first time you select it, then caches it after that.

Pro tip: Browse ollama.com/search to see every model Ollama supports, including specialized models for coding, vision, and tool use.
Step 3

Start Chatting in the App

Once the model finishes downloading, type a prompt and hit enter. You are now chatting with an AI model running entirely on your laptop, with no internet call, API key, or per-token cost.

The first response is usually slower because the model has to load into memory. After that, replies stream in faster.

Pro tip: If you prefer the terminal, run ollama run gemma3:4b instead. It is the same model and the same chat, without the GUI.
Step 4

Test That It Runs Locally

Turn on airplane mode or unplug your ethernet cable, then send another prompt. It should still work, because the model is running locally instead of hitting a server.

This is the point of running local AI: nothing is hitting a server, nothing is being stored anywhere, and you are not paying for each response. If you handle client NDAs, internal financials, or anything you would not paste into a hosted AI tool, this distinction matters.