Back to Blog
Ollama + OpenClaw: Run Your AI Agent Locally for Free (2026 Setup Guide)
ollama openclawopenclaw local llmopenclaw ollama setupopenclaw without openairun openclaw locally freeopenclaw freeollamaopenclawlocal llmopenclaw self host

Ollama + OpenClaw: Run Your AI Agent Locally for Free (2026 Setup Guide)

How to run OpenClaw with Ollama for $0 in API costs. Covers setup, recommended models, hardware requirements, common issues, and when local LLMs beat cloud APIs.


OpenClaw is free software. But as we've covered before, the API costs behind it are anything but free. Users routinely report bills of $200-500 per month — and one power user hit $3,600 in a single month by burning through 180 million tokens.

Ollama changes that equation entirely. It lets you run large language models on your own hardware, which means zero API costs, forever.

OpenClaw has an official Ollama integration documented at docs.openclaw.ai/providers/ollama. It uses Ollama's native /api/chat endpoint — not the OpenAI-compatible /v1 endpoint that most tutorials incorrectly suggest. That distinction matters more than you'd think, and getting it wrong is the #1 reason people fail to get Ollama working with OpenClaw.

This guide walks through the full setup, recommends specific models that actually work, and is honest about when you should skip Ollama entirely.

What You Need: Hardware Requirements

Running a local LLM is not like running a Docker container. The model weights need to fit in memory (ideally VRAM), and inference speed depends heavily on your hardware. Here's what you actually need:

Use CaseRAMGPU / VRAMCPUNotes
OpenClaw gateway only (remote Ollama)2 GBNone1 vCPUOpenClaw itself is lightweight
Local 7-8B models16 GB8 GB VRAM4+ coresMinimum for usable speed
Local 13B+ models32 GB16-24 GB VRAM8+ coresRequired for reliable agentic tasks

Real Hardware That Works

Mac Mini M4 with 32GB unified memory — This is probably the best bang-for-buck Ollama machine right now. The unified memory architecture means the full 32GB is available to the model without a discrete GPU. Users report running 13B parameter models comfortably.

NVIDIA RTX 4090 (24GB VRAM) — The community-validated workhorse for Ollama + OpenClaw. Can run 20B parameter models entirely in VRAM. Multiple community members have confirmed stable operation with gpt-oss:20b.

NVIDIA RTX 5090 (32GB VRAM) — The current top-end option. Users have verified running qwen3.5:35b-a3b with a 64K context window on this card.

What does NOT work well: Any CPU-only setup. You'll get 2-4 tokens per second, which means OpenClaw will timeout waiting for responses. If you don't have a GPU or Apple Silicon, Ollama is probably not the right choice for OpenClaw.

Step-by-Step: Setting Up Ollama with OpenClaw

1. Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Verify it's running
ollama --version

On macOS, Ollama installs as a system service and starts automatically. On Linux, the install script sets up a systemd service.

2. Pull the Recommended Default Model

OpenClaw's documentation recommends glm-4.7-flash as the default model for Ollama:

ollama pull glm-4.7-flash

This is a good starting point — it's small enough to run on modest hardware while being capable enough for basic agentic tasks. We'll cover better options in the model recommendations section below.

3. Configure OpenClaw for Ollama

The easiest path is to use OpenClaw's onboarding wizard:

openclaw onboard

When prompted for your LLM provider, select Ollama. The wizard will detect your local Ollama instance and configure everything automatically.

Manual Configuration (Remote Ollama or Custom Setup)

If you're running Ollama on a different machine (like a GPU server on your network), or if you need to configure things manually, edit your OpenClaw config file directly:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://192.168.1.100:11434",
      "apiKey": "ollama-local",
      "api": "ollama"
    }
  }
}

Key points about this config:

  • baseUrl: Point this at your Ollama server. Use http://localhost:11434 for local, or your network IP for remote.
  • apiKey: Set to "ollama-local". Ollama doesn't require authentication, but OpenClaw's config schema expects a value here.
  • api: This must be "ollama". Not "openai", not "openai-compatible". This tells OpenClaw to use the native /api/chat endpoint. More on why this matters in the critical warning section below.

4. Set Your Default Model

Tell OpenClaw to use your Ollama model by default:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/glm-4.7-flash"
      }
    }
  }
}

The format is ollama/<model-name>, where the model name must exactly match what appears in ollama list.

5. Verify Everything Works

# Check Ollama has your model loaded
ollama list

# Check OpenClaw can see the model
openclaw models list

# You should see your Ollama model in the output

If openclaw models list shows your Ollama model, you're good. If it doesn't, double-check that the baseUrl is reachable and the api field is set to "ollama".

Best Models for OpenClaw + Ollama

Not every Ollama model works well with OpenClaw. The agent framework relies heavily on tool calling (function calling), and most small models handle this poorly or not at all. Here's what the community has actually tested:

ModelSizeMinimum HardwareTool CallingNotes
glm-4.7-flash~8B16GB RAM, 8GB VRAMGoodOfficial OpenClaw default. Good balance of speed and capability
gpt-oss:20b20B24GB VRAMGoodCommunity validated on RTX 4090. Solid all-rounder
qwen3.5:35b-a3b35B32GB VRAMVery GoodVerified on RTX 5090. Sweet spot at 64K context window
lfm2.5-thinking~8B16GB RAMAcceptableOnly model confirmed working on Intel Mac
devstral2~22B24GB VRAMExcellentRecommended specifically for agentic/tool-calling tasks

Models to Avoid

deepseek-coder:6.7b — Hard error. This model does not support tool calling at all. OpenClaw will throw an error immediately, not just produce bad results.

Old llama variants (llama2, codellama) — These predate reliable tool-calling support. They'll run but will constantly fail to properly format tool calls, causing your agent to break mid-task.

Any 7B model on CPU-only — Even if the model supports tool calling, CPU inference at 2-4 tokens/sec means OpenClaw will timeout before the model finishes generating. You need a GPU or Apple Silicon for models of any size.

The Community Consensus on Model Size

There's an interesting tension in the OpenClaw + Ollama space. The official docs recommend glm-4.7-flash, a relatively small model. But the community consensus from users actually running this in production is that models under ~30B parameters are unreliable for complex agentic tasks — multi-step tool chains, long reasoning sequences, and tasks that require maintaining context across many turns.

The practical takeaway: glm-4.7-flash is fine for simple, single-tool tasks (send a message, check the weather, run a quick search). For anything involving multi-step reasoning or chaining several tools together, you'll want gpt-oss:20b or larger.

The /v1 Trap: The Most Common Ollama + OpenClaw Mistake

This deserves its own section because it trips up almost everyone.

Ollama exposes two API endpoints:

  1. http://localhost:11434/api/chat — Ollama's native API
  2. http://localhost:11434/v1/chat/completions — An OpenAI-compatible endpoint

Most LLM tutorials tell you to use the /v1 endpoint because it's "compatible with any OpenAI client." That advice is wrong for OpenClaw.

Do NOT use the /v1 OpenAI-compatible URL with OpenClaw. It breaks tool calling.

The OpenAI-compatible endpoint doesn't properly translate Ollama's native tool-calling format. When OpenClaw sends a tool call request through /v1, the response comes back malformed — tools either don't execute, or execute with garbled parameters.

The correct setup:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434",
      "apiKey": "ollama-local",
      "api": "ollama"
    }
  }
}

Notice: the baseUrl is just http://localhost:11434 — no /v1 suffix. And api is "ollama", which tells OpenClaw to use the native endpoint. If you've been struggling with tool calling failures, this is almost certainly why.

Common Issues and Fixes

Model Name Mismatch

OpenClaw requires the model name to exactly match what ollama list outputs. If Ollama shows glm-4.7-flash:latest but you configured glm-4.7-flash, it might work — but if it doesn't, try the full name with the tag.

# Check exact model names
ollama list

# Use exactly what appears in the NAME column

Ollama Drops Models from VRAM

Ollama unloads models from VRAM after a period of inactivity to free up resources. When OpenClaw sends a request to an unloaded model, there's a delay while Ollama reloads it. This can cause OpenClaw to timeout on the first request after an idle period.

The fix is to increase Ollama's keep-alive time:

# Set to 60 minutes (default is 5)
OLLAMA_KEEP_ALIVE=60m ollama serve

Or set it permanently in your Ollama configuration.

CPU-Only Inference is Too Slow

If you're seeing responses that take 30+ seconds for a single turn, you're running on CPU. At 2-4 tokens per second, a typical OpenClaw response (200-500 tokens) takes 1-4 minutes. Multi-step tool chains that require several LLM calls will take 10+ minutes for a simple task.

There's no real fix for this. CPU-only inference is fundamentally too slow for agentic AI. Either get a GPU, use Apple Silicon, or switch to a cloud API.

Context Window Defaults to 4096

Ollama defaults to a 4096-token context window unless you explicitly override it. For OpenClaw, this is far too small — a single conversation with tool calls can easily exceed 4096 tokens.

Set a larger context window when running the model:

# Set 32K context window
ollama run glm-4.7-flash --ctx-size 32768

Or configure it in a Modelfile:

FROM glm-4.7-flash
PARAMETER num_ctx 32768

For models that support it (like qwen3.5:35b-a3b), you can push this to 64K — but only if you have enough VRAM to handle the increased KV cache.

OpenClaw Timeouts on Long Responses

By default, OpenClaw may timeout waiting for slow Ollama responses. If you're running larger models or have modest hardware, you might need to increase the timeout in your OpenClaw configuration. Check the OpenClaw docs for the specific timeout setting for your version.

Cost Comparison: Ollama vs. Cloud APIs

Let's put real numbers on this.

ApproachPer-Token CostMonthly Cost (Typical)Monthly Cost (Heavy Use)Hardware Investment
Ollama (local)$0.00$0 (electricity only)$0 (electricity only)$600-2,500 (one-time)
Cloud API (Claude Sonnet)~$0.02/1K tokens$10-50$200-500+$0
Cloud API (GPT-4)~$0.03/1K tokens$15-75$300-750+$0

The math is simple: if you're spending more than $50/month on API calls, a dedicated GPU will pay for itself within a year. One user reported their OpenClaw instance sending 200,000 tokens per request for complex tasks — at cloud API rates, that's a wallet explosion waiting to happen.

The Hybrid Strategy

The smartest approach is to use both:

  • Ollama for simple, frequent tasks — cron jobs, quick lookups, single-tool operations, anything that runs on a schedule
  • Cloud API for complex reasoning — multi-step chains, tasks that require large context, anything where accuracy matters more than cost

OpenClaw supports this natively. You can configure different models for different agents, so your daily digest agent runs on local glm-4.7-flash while your research agent uses Claude Sonnet via API.

When NOT to Use Ollama with OpenClaw

Honest take: Ollama + OpenClaw is not for everyone. Skip it if:

You need reliable tool calling on complex tasks. Even the best local models are noticeably worse at multi-step tool chains compared to Claude Sonnet or GPT-4. If your agent needs to reliably chain 5+ tool calls without errors, you'll have a frustrating experience with local models.

You don't have a GPU or Apple Silicon. CPU-only inference is not viable for agentic AI. Full stop. Don't waste your time trying to make it work.

You're not comfortable troubleshooting. Ollama + OpenClaw is a two-system stack. When something breaks, you need to figure out whether the problem is in Ollama, in OpenClaw, in the model, or in the connection between them. It's not plug-and-play.

You want zero maintenance. Models need updating. Ollama needs updating. VRAM management needs babysitting. Context windows need tuning. It's a hobby project level of maintenance.

The Alternative: Skip All of This

If you want OpenClaw running without fighting Docker, Ollama, VRAM limits, or model compatibility — ClawdHost handles all of it.

One plan: $29/month, BYOK (bring your own API key). You get a dedicated Hetzner VPS with OpenClaw pre-configured and running. One-click deploy, 60-second setup, support for Discord, Telegram, WhatsApp, and Slack.

You still pay your own API costs (that's the BYOK part), but you skip every infrastructure headache covered in this guide. No Docker. No port conflicts. No VRAM management. No model compatibility issues.

For users who want the Ollama cost savings, you can point your ClawdHost instance at a remote Ollama server on your network — best of both worlds.

Final Setup Checklist

Before you call it done, run through this:

  • Ollama installed and running (ollama --version)
  • Model pulled (ollama pull glm-4.7-flash)
  • OpenClaw configured with api: "ollama" (not "openai")
  • baseUrl set to http://localhost:11434 (no /v1 suffix)
  • Context window increased beyond 4096 default
  • Model name in OpenClaw matches ollama list output exactly
  • openclaw models list shows your Ollama model
  • Test a simple tool-calling task before running complex agents
  • Ollama keep-alive increased if you're seeing cold-start timeouts

Get all of those green and you'll have a working, zero-cost OpenClaw setup. It won't match Claude Sonnet on complex tasks — but for everyday automation, it's hard to beat free.

Related Articles