Richard Golian

1995-born. Charles University alum. Head of Performance at Mixit. 10+ years in marketing and data.

#myjourney #myfamily #health #cognition #philosophy #digital #artificialintelligence #darkness #security #finance #politics #banskabystrica #carpathians

Castellano Français Slovenčina

Manage subscription Choose a plan

RSS
Newsletter
New articles to your inbox
Richard Golian

Hi, I am Richard. On this blog, I share thoughts, personal stories — and what I am working on. I hope this article brings you some value.

Training an AI Agent That Learns Between Sessions

How AI agents learn between sessions

By Richard Golian

The goal I set myself

I wanted to build an agent that does not just assist. One that acts.

The idea was straightforward: configure automations to retrieve data, let the agent analyse what it finds, have it propose next steps, send those proposals somewhere for review, and through that feedback loop — gradually improve. At a certain point, once its proposed steps consistently matched what I considered good decisions, it would stop waiting for approval and start executing on its own.

Not a chatbot. Not a co-pilot. An autonomous system that earns its authority through demonstrated accuracy.

That was the goal. I wrote about parts of it in my previous article on local AI models. This is the next chapter.

WHAT I BUILT — AND WHAT IT COULD NOT DO

The first version was simple by design. But the interesting part was not what it did. The interesting part was what it could not do.

The agent runs on a schedule. It retrieves data, analyses it, and sends a report to Slack. To make sure the output was consistent, I created a schema — an approved format the agent checks itself against before sending anything. If something does not match, it corrects itself. It loops until the output passes. If something prevents it from completing the process — such as a failed LLM call — it does not send a degraded output. It sends an alert to Slack instead.

I also added positive examples. Approved outputs from previous runs that the agent can reference when producing the next one.

This felt like a solid system. And for a while, I thought it was.

THE THING THAT KEPT BOTHERING ME

Every session starts from zero.

The schema is there. The examples are there. But the agent does not know what it struggled with yesterday. It does not know which rule it keeps violating. It does not know what it has already figured out.

And that changes everything.

The self-correction loop works within a single session. Between sessions, nothing accumulates. So the inconsistency I was seeing was not a configuration problem. It was not a prompting problem.

The problem was not technical. It was structural.

SELF-CORRECTION VS SELF-IMPROVEMENT

This is where I realised something important.

Self-correction means the agent catches its own errors before sending output. It happens inside one run, against a fixed schema. The session ends, and whatever the agent learned — disappears.

Self-improvement means the agent builds something across runs. Each session leaves a trace that the next session can use. Errors become rules. Rules become context. Context shapes the next output before generation even starts.

The first is a quality filter. The second is something closer to learning.

And this distinction is not just about AI agents. It is the difference between systems that repeat and systems that evolve. Between people who fix mistakes and people who stop making the same ones. Most organisations have self-correction. Very few have genuine self-improvement. The mechanism looks similar from the outside. The architecture underneath is completely different.

What I had was a good quality filter. What I was missing was the accumulation layer underneath it.

DOES CLAUDE CODE ALREADY HAVE PERSISTENT MEMORY?

This is a fair question — and one I had to work through myself.

Claude Code has a file called CLAUDE.md. It loads automatically at the start of every session. When you tell the agent to remember something for future runs, it can write it there. And next time, it will be there. That is real persistence. It is not an illusion.

So when Claude Code confirms it will remember something — it is not lying.

The problem is what "there" actually means in practice.

Continue reading:

Full access to my thoughts, personal stories, findings, and what I learn from the people I meet.

Join the Library
or just this article

Get the full article by email and feel free to reply if you want to discuss it further.

Visa Mastercard Apple Pay Google Pay

Summary

I wanted to build an autonomous AI agent that improves over time — not just one that corrects itself within a single session. The distinction is between self-correction and self-improvement. Claude Code's built-in memory has limits for agents that run daily. A structured memory layer changes what is possible.
Richard Golian

If you have any thoughts, questions, or feedback, feel free to drop me a message at mail@richardgolian.com.

Newsletter

New articles to your inbox

Common questions on this article's topic

What is the difference between AI self-correction and self-improvement?
Self-correction means the agent catches errors within a single session — checking output against a schema and looping until it passes. When the session ends, everything learned is lost. Self-improvement means the agent builds knowledge across sessions: errors become rules, rules become context, and context shapes future output before generation even starts. In the article, this distinction is identified as the critical gap in current AI agent architectures — and the key to building systems that genuinely evolve.
What is CLAUDE.md and what are its limitations for AI agents?
CLAUDE.md is a file that Claude Code loads automatically at the start of every session, providing persistent memory across runs. When the agent is told to remember something, it writes to this file. The persistence is real — not an illusion. However, in the article, the limitation is identified: CLAUDE.md is a static, unstructured file. It does not organise itself, distinguish relevant from outdated entries, or manage its own growth. For an agent running daily over weeks, the file becomes noise rather than signal.
Why does every AI agent session start from zero?
Because current AI models have no built-in mechanism for accumulating experience between sessions. The context window is populated fresh each time. In the article, this is identified as the structural — not technical — problem: the agent does not know what it struggled with yesterday, which rules it keeps violating, or what it has already figured out. The self-correction loop works within a session. Between sessions, nothing persists unless explicitly stored.
What is a structured memory layer for AI agents?
A structured memory layer sits alongside static memory files and organises accumulated experience into categories the agent can reference selectively. Instead of loading everything into the context window every time, the agent retrieves only what is relevant to the current task. In the article, this is the solution being built: a system where errors become rules, rules become context, and the agent's behaviour improves measurably across sessions rather than resetting each time.
Can you run autonomous AI agents locally?
Yes, but with significant limitations. In the article and its predecessor on local AI models, the setup used a Mac Mini with Ollama and n8n. Basic pipelines worked: data retrieval, simple analysis, Slack alerts. But the context window limitations of local models made complex analysis impossible. For autonomous agents that need to process real-world data volumes and maintain quality over time, cloud models with larger context windows proved necessary.
What does it take to build an AI agent that earns autonomy?
In the article, the principle is that autonomy must be earned through demonstrated accuracy — not granted by default. The architecture starts with human review of every proposed action. As the agent's proposals consistently match good decisions, it gradually gains permission to act independently. This requires not just good single-session performance but genuine improvement over time — which is why the structured memory layer is essential. Without cross-session learning, the agent cannot build the track record needed to justify autonomous action.