Ollama + OpenClaw: Run Your AI Agent Locally for Free (2026 Setup Guide)
How to run OpenClaw with Ollama for $0 in API costs. Covers setup, recommended models, hardware requirements, common issues, and when local LLMs beat cloud APIs.
OpenClaw is free software. But as we've covered before, the API costs behind it are anything but free. Users routinely report bills of $200-500 per month — and one power user hit $3,600 in a single month by burning through 180 million tokens.
Ollama changes that equation entirely. It lets you run large language models on your own hardware, which means zero API costs, forever.
OpenClaw has an official Ollama integration documented at docs.openclaw.ai/providers/ollama. It uses Ollama's native /api/chat endpoint — not the OpenAI-compatible /v1 endpoint that most tutorials incorrectly suggest. That distinction matters more than you'd think, and getting it wrong is the #1 reason people fail to get Ollama working with OpenClaw.
This guide walks through the full setup, recommends specific models that actually work, and is honest about when you should skip Ollama entirely.
What You Need: Hardware Requirements
Running a local LLM is not like running a Docker container. The model weights need to fit in memory (ideally VRAM), and inference speed depends heavily on your hardware. Here's what you actually need:
| Use Case | RAM | GPU / VRAM | CPU | Notes |
|---|---|---|---|---|
| OpenClaw gateway only (remote Ollama) | 2 GB | None | 1 vCPU | OpenClaw itself is lightweight |
| Local 7-8B models | 16 GB | 8 GB VRAM | 4+ cores | Minimum for usable speed |
| Local 13B+ models | 32 GB | 16-24 GB VRAM | 8+ cores | Required for reliable agentic tasks |
Real Hardware That Works
Mac Mini M4 with 32GB unified memory — This is probably the best bang-for-buck Ollama machine right now. The unified memory architecture means the full 32GB is available to the model without a discrete GPU. Users report running 13B parameter models comfortably.
NVIDIA RTX 4090 (24GB VRAM) — The community-validated workhorse for Ollama + OpenClaw. Can run 20B parameter models entirely in VRAM. Multiple community members have confirmed stable operation with gpt-oss:20b.
NVIDIA RTX 5090 (32GB VRAM) — The current top-end option. Users have verified running qwen3.5:35b-a3b with a 64K context window on this card.
What does NOT work well: Any CPU-only setup. You'll get 2-4 tokens per second, which means OpenClaw will timeout waiting for responses. If you don't have a GPU or Apple Silicon, Ollama is probably not the right choice for OpenClaw.
Step-by-Step: Setting Up Ollama with OpenClaw
1. Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Verify it's running
ollama --version
On macOS, Ollama installs as a system service and starts automatically. On Linux, the install script sets up a systemd service.
2. Pull the Recommended Default Model
OpenClaw's documentation recommends glm-4.7-flash as the default model for Ollama:
ollama pull glm-4.7-flash
This is a good starting point — it's small enough to run on modest hardware while being capable enough for basic agentic tasks. We'll cover better options in the model recommendations section below.
3. Configure OpenClaw for Ollama
The easiest path is to use OpenClaw's onboarding wizard:
openclaw onboard
When prompted for your LLM provider, select Ollama. The wizard will detect your local Ollama instance and configure everything automatically.
Manual Configuration (Remote Ollama or Custom Setup)
If you're running Ollama on a different machine (like a GPU server on your network), or if you need to configure things manually, edit your OpenClaw config file directly:
{
"providers": {
"ollama": {
"baseUrl": "http://192.168.1.100:11434",
"apiKey": "ollama-local",
"api": "ollama"
}
}
}
Key points about this config:
baseUrl: Point this at your Ollama server. Usehttp://localhost:11434for local, or your network IP for remote.apiKey: Set to"ollama-local". Ollama doesn't require authentication, but OpenClaw's config schema expects a value here.api: This must be"ollama". Not"openai", not"openai-compatible". This tells OpenClaw to use the native/api/chatendpoint. More on why this matters in the critical warning section below.
4. Set Your Default Model
Tell OpenClaw to use your Ollama model by default:
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/glm-4.7-flash"
}
}
}
}
The format is ollama/<model-name>, where the model name must exactly match what appears in ollama list.
5. Verify Everything Works
# Check Ollama has your model loaded
ollama list
# Check OpenClaw can see the model
openclaw models list
# You should see your Ollama model in the output
If openclaw models list shows your Ollama model, you're good. If it doesn't, double-check that the baseUrl is reachable and the api field is set to "ollama".
Best Models for OpenClaw + Ollama
Not every Ollama model works well with OpenClaw. The agent framework relies heavily on tool calling (function calling), and most small models handle this poorly or not at all. Here's what the community has actually tested:
| Model | Size | Minimum Hardware | Tool Calling | Notes |
|---|---|---|---|---|
| glm-4.7-flash | ~8B | 16GB RAM, 8GB VRAM | Good | Official OpenClaw default. Good balance of speed and capability |
| gpt-oss:20b | 20B | 24GB VRAM | Good | Community validated on RTX 4090. Solid all-rounder |
| qwen3.5:35b-a3b | 35B | 32GB VRAM | Very Good | Verified on RTX 5090. Sweet spot at 64K context window |
| lfm2.5-thinking | ~8B | 16GB RAM | Acceptable | Only model confirmed working on Intel Mac |
| devstral2 | ~22B | 24GB VRAM | Excellent | Recommended specifically for agentic/tool-calling tasks |
Models to Avoid
deepseek-coder:6.7b — Hard error. This model does not support tool calling at all. OpenClaw will throw an error immediately, not just produce bad results.
Old llama variants (llama2, codellama) — These predate reliable tool-calling support. They'll run but will constantly fail to properly format tool calls, causing your agent to break mid-task.
Any 7B model on CPU-only — Even if the model supports tool calling, CPU inference at 2-4 tokens/sec means OpenClaw will timeout before the model finishes generating. You need a GPU or Apple Silicon for models of any size.
The Community Consensus on Model Size
There's an interesting tension in the OpenClaw + Ollama space. The official docs recommend glm-4.7-flash, a relatively small model. But the community consensus from users actually running this in production is that models under ~30B parameters are unreliable for complex agentic tasks — multi-step tool chains, long reasoning sequences, and tasks that require maintaining context across many turns.
The practical takeaway: glm-4.7-flash is fine for simple, single-tool tasks (send a message, check the weather, run a quick search). For anything involving multi-step reasoning or chaining several tools together, you'll want gpt-oss:20b or larger.
The /v1 Trap: The Most Common Ollama + OpenClaw Mistake
This deserves its own section because it trips up almost everyone.
Ollama exposes two API endpoints:
http://localhost:11434/api/chat— Ollama's native APIhttp://localhost:11434/v1/chat/completions— An OpenAI-compatible endpoint
Most LLM tutorials tell you to use the /v1 endpoint because it's "compatible with any OpenAI client." That advice is wrong for OpenClaw.
Do NOT use the /v1 OpenAI-compatible URL with OpenClaw. It breaks tool calling.
The OpenAI-compatible endpoint doesn't properly translate Ollama's native tool-calling format. When OpenClaw sends a tool call request through /v1, the response comes back malformed — tools either don't execute, or execute with garbled parameters.
The correct setup:
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434",
"apiKey": "ollama-local",
"api": "ollama"
}
}
}
Notice: the baseUrl is just http://localhost:11434 — no /v1 suffix. And api is "ollama", which tells OpenClaw to use the native endpoint. If you've been struggling with tool calling failures, this is almost certainly why.
Common Issues and Fixes
Model Name Mismatch
OpenClaw requires the model name to exactly match what ollama list outputs. If Ollama shows glm-4.7-flash:latest but you configured glm-4.7-flash, it might work — but if it doesn't, try the full name with the tag.
# Check exact model names
ollama list
# Use exactly what appears in the NAME column
Ollama Drops Models from VRAM
Ollama unloads models from VRAM after a period of inactivity to free up resources. When OpenClaw sends a request to an unloaded model, there's a delay while Ollama reloads it. This can cause OpenClaw to timeout on the first request after an idle period.
The fix is to increase Ollama's keep-alive time:
# Set to 60 minutes (default is 5)
OLLAMA_KEEP_ALIVE=60m ollama serve
Or set it permanently in your Ollama configuration.
CPU-Only Inference is Too Slow
If you're seeing responses that take 30+ seconds for a single turn, you're running on CPU. At 2-4 tokens per second, a typical OpenClaw response (200-500 tokens) takes 1-4 minutes. Multi-step tool chains that require several LLM calls will take 10+ minutes for a simple task.
There's no real fix for this. CPU-only inference is fundamentally too slow for agentic AI. Either get a GPU, use Apple Silicon, or switch to a cloud API.
Context Window Defaults to 4096
Ollama defaults to a 4096-token context window unless you explicitly override it. For OpenClaw, this is far too small — a single conversation with tool calls can easily exceed 4096 tokens.
Set a larger context window when running the model:
# Set 32K context window
ollama run glm-4.7-flash --ctx-size 32768
Or configure it in a Modelfile:
FROM glm-4.7-flash
PARAMETER num_ctx 32768
For models that support it (like qwen3.5:35b-a3b), you can push this to 64K — but only if you have enough VRAM to handle the increased KV cache.
OpenClaw Timeouts on Long Responses
By default, OpenClaw may timeout waiting for slow Ollama responses. If you're running larger models or have modest hardware, you might need to increase the timeout in your OpenClaw configuration. Check the OpenClaw docs for the specific timeout setting for your version.
Cost Comparison: Ollama vs. Cloud APIs
Let's put real numbers on this.
| Approach | Per-Token Cost | Monthly Cost (Typical) | Monthly Cost (Heavy Use) | Hardware Investment |
|---|---|---|---|---|
| Ollama (local) | $0.00 | $0 (electricity only) | $0 (electricity only) | $600-2,500 (one-time) |
| Cloud API (Claude Sonnet) | ~$0.02/1K tokens | $10-50 | $200-500+ | $0 |
| Cloud API (GPT-4) | ~$0.03/1K tokens | $15-75 | $300-750+ | $0 |
The math is simple: if you're spending more than $50/month on API calls, a dedicated GPU will pay for itself within a year. One user reported their OpenClaw instance sending 200,000 tokens per request for complex tasks — at cloud API rates, that's a wallet explosion waiting to happen.
The Hybrid Strategy
The smartest approach is to use both:
- Ollama for simple, frequent tasks — cron jobs, quick lookups, single-tool operations, anything that runs on a schedule
- Cloud API for complex reasoning — multi-step chains, tasks that require large context, anything where accuracy matters more than cost
OpenClaw supports this natively. You can configure different models for different agents, so your daily digest agent runs on local glm-4.7-flash while your research agent uses Claude Sonnet via API.
When NOT to Use Ollama with OpenClaw
Honest take: Ollama + OpenClaw is not for everyone. Skip it if:
You need reliable tool calling on complex tasks. Even the best local models are noticeably worse at multi-step tool chains compared to Claude Sonnet or GPT-4. If your agent needs to reliably chain 5+ tool calls without errors, you'll have a frustrating experience with local models.
You don't have a GPU or Apple Silicon. CPU-only inference is not viable for agentic AI. Full stop. Don't waste your time trying to make it work.
You're not comfortable troubleshooting. Ollama + OpenClaw is a two-system stack. When something breaks, you need to figure out whether the problem is in Ollama, in OpenClaw, in the model, or in the connection between them. It's not plug-and-play.
You want zero maintenance. Models need updating. Ollama needs updating. VRAM management needs babysitting. Context windows need tuning. It's a hobby project level of maintenance.
The Alternative: Skip All of This
If you want OpenClaw running without fighting Docker, Ollama, VRAM limits, or model compatibility — ClawdHost handles all of it.
One plan: $29/month, BYOK (bring your own API key). You get a dedicated Hetzner VPS with OpenClaw pre-configured and running. One-click deploy, 60-second setup, support for Discord, Telegram, WhatsApp, and Slack.
You still pay your own API costs (that's the BYOK part), but you skip every infrastructure headache covered in this guide. No Docker. No port conflicts. No VRAM management. No model compatibility issues.
For users who want the Ollama cost savings, you can point your ClawdHost instance at a remote Ollama server on your network — best of both worlds.
Final Setup Checklist
Before you call it done, run through this:
- Ollama installed and running (
ollama --version) - Model pulled (
ollama pull glm-4.7-flash) - OpenClaw configured with
api: "ollama"(not"openai") -
baseUrlset tohttp://localhost:11434(no/v1suffix) - Context window increased beyond 4096 default
- Model name in OpenClaw matches
ollama listoutput exactly -
openclaw models listshows your Ollama model - Test a simple tool-calling task before running complex agents
- Ollama keep-alive increased if you're seeing cold-start timeouts
Get all of those green and you'll have a working, zero-cost OpenClaw setup. It won't match Claude Sonnet on complex tasks — but for everyday automation, it's hard to beat free.
Related Articles
OpenClaw Use Cases: 15 Things People Actually Build With It (2026)
Real OpenClaw use cases from real users — from morning briefings and smart home control to autonomous QA agents earning $3,840/mo. What 247K+ GitHub stars actually looks like in practice.
OpenClaw Dashboard: Complete Guide to the Control UI (2026)
Everything about the OpenClaw Dashboard (Control UI) — what it does, how to access it, every feature explained, community alternatives, and remote access setup.
Is OpenClaw Free? Real Pricing & Costs Explained (2026)
OpenClaw is free and open-source, but running it isn't free. Here's what it actually costs — software ($0), API usage ($10-500+/mo), hosting ($5-29/mo), and the hidden cost nobody mentions.