Richard Golian

1995-born. Charles University alum. Head of Performance at Mixit. 10+ years in marketing and data.

#myjourney #myfamily #health #cognition #philosophy #digital #artificialintelligence #darkness #security #finance #politics #banskabystrica #carpathians

Castellano Français Slovenčina

Manage subscription Choose a plan

RSS
Newsletter
New articles to your inbox
Richard Golian

Hi, I am Richard. On this blog, I share thoughts, personal stories — and what I am working on. I hope this article brings you some value.

Local AI Model Limitations: Why I Switched from Ollama to Claude for Autonomous Agents

Local AI agent: setup, limits, lessons

By Richard Golian

I have been writing about AI since early 2023. Over that time, I have watched it change how I code, how I think about content, and how I think about the future of work.

This is a story about going one level deeper — from using AI as a tool to trying to build something autonomous on top of it. It did not work the way I expected.

WHY I TRIED RUNNING AI LOCALLY

Before I had any real experience with it, local AI seemed like the most interesting move I could make. Not just because of flexibility or security — although both mattered — but because it felt like the most honest way to approach the technology.

In the middle of everything happening around AI, actually running a model locally, configuring it, connecting it to data, and seeing where it breaks felt fundamentally different from using a polished cloud interface. It felt like the difference between using a tool and understanding how that tool actually works.

At the same time, I was not approaching it as a purely technical experiment. I had a clear use case in mind from the beginning.

The first area I wanted to apply this to was SEO. SEO is a documented, relatively exact discipline. It has structure, rules, patterns, and measurable outcomes. In theory, that makes it ideal for automation. An agent can scan hundreds of subpages in minutes, identify structural issues, detect missing elements, and if it also has access to search trend data, it can produce meaningful content recommendations.

That is not an abstract idea. That is a real workflow with clear business value.

The broader vision was more ambitious. I wanted to build an agent that retrieves data based on configured automations, proposes steps based on what it finds, sends those proposals somewhere for review, and through that feedback loop gradually improves. At a certain point, once its proposed steps consistently match what I consider good decisions, it would start executing those actions autonomously.

Not just assisting. Acting.

That was the goal.

MAC MINI, OLLAMA, N8N

The setup itself was straightforward. I used a Mac Mini, ran a local model through Ollama, and handled basic orchestration via n8n.

Getting Ollama running was surprisingly simple. Much simpler than I expected. Within a short time, I had a model up, responding, and behaving like a chatbot. From a purely technical perspective, the barrier to entry was low.

Within a few hours, I had a basic pipeline in place. The model was able to retrieve data, run a basic marketing analysis, and I had a clear path toward automating alerts into Slack based on the output. At that stage, everything felt promising. The system was working, and it was working locally.

What I did not yet fully understand was how quickly I would run into its limits.

Then I tested it on representative sample data designed to simulate real-world conditions.

THE CONTEXT WINDOW

This is where the real limitation became obvious.

The model could handle a few pages of text. It could process a small table, or a dataset with a size of a few kilobytes. Within that range, it behaved in a way that looked functional.

But the moment I gave it representative SEO data — the kind of volume you actually need to analyse if you want meaningful output — the system broke down.

It processed what fit into its context window and ignored the rest. It produced output that, on the surface, looked structured, but when you looked closer, it had almost no value. It would pick up a number somewhere in the data and repeat it back. It did not combine signals. It did not prioritise correctly. It did not understand relationships across the dataset.

And the reason was simple. It could not see enough of it.

I noticed this immediately during the first real analysis. The quality of the output was roughly comparable to what cloud models were producing in 2023. That is not a criticism of the model itself. It is a reflection of the constraints.

The problem was not configuration. It was not prompting. It was not lack of effort.

The hardware determined which model I could run. And the model I could run simply could not hold the amount of information required for the task.

WHAT AUTONOMOUS ACTUALLY MEANS

At this point, it became clear what "autonomous" actually requires in practice — and where the system was falling short.

An autonomous agent is not just a loop that calls a model repeatedly. It requires the ability to reason across a large amount of context, maintain coherence across multiple steps, and produce outputs that are precise enough to act on without constant supervision.

That means it needs to hold not just the current input, but the accumulated state of the entire workflow. What data was retrieved, what actions were proposed, what decisions were made, what failed, what succeeded, and what the overall objective is.

This is where the limitation becomes structural.

A model with a constrained context window cannot maintain that state. It cannot connect decisions across time. It cannot evaluate its own outputs in a meaningful way because it lacks visibility into the full process.

The vision of the system was not the problem.

The infrastructure underneath it was.

SWITCHING TO CLAUDE CODE

At that point, I moved to a cloud-based solution and started working with Claude Code from Anthropic.

Continue reading:

Full access to my thoughts, personal stories, findings, and what I learn from the people I meet.

Join the Library
or just this article

Get the full article by email and feel free to reply if you want to discuss it further.

Visa Mastercard Apple Pay Google Pay

Summary

I tried building an autonomous AI agent locally — Mac Mini, Ollama, n8n. The context window limitations made meaningful analysis impossible. This is what I learned about local vs cloud AI, and why I switched to Claude Code.
Richard Golian

If you have any thoughts, questions, or feedback, feel free to drop me a message at mail@richardgolian.com.

Newsletter

New articles to your inbox

Common questions on this article's topic

What is the difference between running AI locally and using cloud AI?
Local AI runs on your own hardware — giving you full control over data and no recurring API costs, but with significant limitations in processing power and context window size. Cloud AI (like Claude or GPT-4) runs on remote servers with far larger models, longer context windows, and better reasoning capabilities, but requires sending data externally and paying per usage. In the article, local AI was initially chosen for privacy and control, but its limitations forced a switch to cloud.
What is a context window and why does it matter?
The context window is the amount of text an AI model can process in a single interaction — analogous to how much of a document it can see at once. Local models typically have much smaller context windows than cloud models. In the article, this was the critical limitation: when given real-world SEO data volumes, the local model could only process what fit in its window and ignored the rest, producing output that looked structured but had almost no analytical value.
What is Ollama and how easy is it to set up?
Ollama is an open-source tool that allows users to run large language models locally on their own hardware. In the article, setup is described as surprisingly simple — within a short time, a model was running and responding on a Mac Mini. The barrier to entry was low from a technical perspective. The problems emerged only when the model was tasked with processing real-world data volumes that exceeded its context window capacity.
Can local AI models handle real business data analysis?
In the article, the answer is not yet — at least not for complex, multi-dimensional analysis. The local model could handle small datasets and simple queries. But when given representative SEO data at production scale, it broke down: processing only what fit in its context window, picking up isolated numbers without understanding relationships, and producing output comparable to cloud models from 2023. The gap between local and cloud capability remains significant.
What is an autonomous AI agent?
An autonomous AI agent is a system that retrieves data, proposes actions based on what it finds, learns from feedback, and eventually executes decisions independently. In the article, the goal was to build such an agent for SEO: scanning subpages, identifying issues, proposing content recommendations, and gradually improving through a feedback loop until it could act without human intervention. The vision was not just AI assisting — but AI acting.
Should developers start with local AI or cloud AI?
In the article, starting locally provided valuable hands-on understanding of how models actually work — the difference between using a polished interface and understanding the underlying technology. However, for production use cases requiring complex reasoning and large data volumes, cloud AI was necessary. The practical recommendation is: experiment locally to build understanding, but use cloud models for real business applications where quality and context capacity matter.