Hi, I am Richard. On this blog, I share thoughts, personal stories — and what I am working on. I hope this article brings you some value.
Local AI Model Limitations: Why I Switched from Ollama to Claude for Autonomous Agents
Local AI agent: setup, limits, lessons
I have been writing about AI since early 2023. Over that time, I have watched it change how I code, how I think about content, and how I think about the future of work.
This is a story about going one level deeper — from using AI as a tool to trying to build something autonomous on top of it. It did not work the way I expected.
WHY I TRIED RUNNING AI LOCALLY
Before I had any real experience with it, local AI seemed like the most interesting move I could make. Not just because of flexibility or security — although both mattered — but because it felt like the most honest way to approach the technology.
In the middle of everything happening around AI, actually running a model locally, configuring it, connecting it to data, and seeing where it breaks felt fundamentally different from using a polished cloud interface. It felt like the difference between using a tool and understanding how that tool actually works.
At the same time, I was not approaching it as a purely technical experiment. I had a clear use case in mind from the beginning.
The first area I wanted to apply this to was SEO. SEO is a documented, relatively exact discipline. It has structure, rules, patterns, and measurable outcomes. In theory, that makes it ideal for automation. An agent can scan hundreds of subpages in minutes, identify structural issues, detect missing elements, and if it also has access to search trend data, it can produce meaningful content recommendations.
That is not an abstract idea. That is a real workflow with clear business value.
The broader vision was more ambitious. I wanted to build an agent that retrieves data based on configured automations, proposes steps based on what it finds, sends those proposals somewhere for review, and through that feedback loop gradually improves. At a certain point, once its proposed steps consistently match what I consider good decisions, it would start executing those actions autonomously.
Not just assisting. Acting.
That was the goal.
MAC MINI, OLLAMA, N8N
The setup itself was straightforward. I used a Mac Mini, ran a local model through Ollama, and handled basic orchestration via n8n.
Getting Ollama running was surprisingly simple. Much simpler than I expected. Within a short time, I had a model up, responding, and behaving like a chatbot. From a purely technical perspective, the barrier to entry was low.
Within a few hours, I had a basic pipeline in place. The model was able to retrieve data, run a basic marketing analysis, and I had a clear path toward automating alerts into Slack based on the output. At that stage, everything felt promising. The system was working, and it was working locally.
What I did not yet fully understand was how quickly I would run into its limits.
Then I tested it on representative sample data designed to simulate real-world conditions.
THE CONTEXT WINDOW
This is where the real limitation became obvious.
The model could handle a few pages of text. It could process a small table, or a dataset with a size of a few kilobytes. Within that range, it behaved in a way that looked functional.
But the moment I gave it representative SEO data — the kind of volume you actually need to analyse if you want meaningful output — the system broke down.
It processed what fit into its context window and ignored the rest. It produced output that, on the surface, looked structured, but when you looked closer, it had almost no value. It would pick up a number somewhere in the data and repeat it back. It did not combine signals. It did not prioritise correctly. It did not understand relationships across the dataset.
And the reason was simple. It could not see enough of it.
I noticed this immediately during the first real analysis. The quality of the output was roughly comparable to what cloud models were producing in 2023. That is not a criticism of the model itself. It is a reflection of the constraints.
The problem was not configuration. It was not prompting. It was not lack of effort.
The hardware determined which model I could run. And the model I could run simply could not hold the amount of information required for the task.
WHAT AUTONOMOUS ACTUALLY MEANS
At this point, it became clear what "autonomous" actually requires in practice — and where the system was falling short.
An autonomous agent is not just a loop that calls a model repeatedly. It requires the ability to reason across a large amount of context, maintain coherence across multiple steps, and produce outputs that are precise enough to act on without constant supervision.
That means it needs to hold not just the current input, but the accumulated state of the entire workflow. What data was retrieved, what actions were proposed, what decisions were made, what failed, what succeeded, and what the overall objective is.
This is where the limitation becomes structural.
A model with a constrained context window cannot maintain that state. It cannot connect decisions across time. It cannot evaluate its own outputs in a meaningful way because it lacks visibility into the full process.
The vision of the system was not the problem.
The infrastructure underneath it was.
SWITCHING TO CLAUDE CODE
At that point, I moved to a cloud-based solution and started working with Claude Code from Anthropic.
Full access to my thoughts, personal stories, findings, and what I learn from the people I meet.
Join the LibraryGet the full article by email and feel free to reply if you want to discuss it further.
Summary
If you have any thoughts, questions, or feedback, feel free to drop me a message at mail@richardgolian.com.