Hi, I am Richard. On this blog, I share thoughts, personal stories — and what I am working on. I hope this article brings you some value.
Training an AI Agent That Learns Between Sessions
How AI agents learn between sessions
The goal I set myself
I wanted to build an agent that does not just assist. One that acts.
The idea was straightforward: configure automations to retrieve data, let the agent analyse what it finds, have it propose next steps, send those proposals somewhere for review, and through that feedback loop — gradually improve. At a certain point, once its proposed steps consistently matched what I considered good decisions, it would stop waiting for approval and start executing on its own.
Not a chatbot. Not a co-pilot. An autonomous system that earns its authority through demonstrated accuracy.
That was the goal. I wrote about parts of it in my previous article on local AI models. This is the next chapter.
WHAT I BUILT — AND WHAT IT COULD NOT DO
The first version was simple by design. But the interesting part was not what it did. The interesting part was what it could not do.
The agent runs on a schedule. It retrieves data, analyses it, and sends a report to Slack. To make sure the output was consistent, I created a schema — an approved format the agent checks itself against before sending anything. If something does not match, it corrects itself. It loops until the output passes. If something prevents it from completing the process — such as a failed LLM call — it does not send a degraded output. It sends an alert to Slack instead.
I also added positive examples. Approved outputs from previous runs that the agent can reference when producing the next one.
This felt like a solid system. And for a while, I thought it was.
THE THING THAT KEPT BOTHERING ME
Every session starts from zero.
The schema is there. The examples are there. But the agent does not know what it struggled with yesterday. It does not know which rule it keeps violating. It does not know what it has already figured out.
And that changes everything.
The self-correction loop works within a single session. Between sessions, nothing accumulates. So the inconsistency I was seeing was not a configuration problem. It was not a prompting problem.
The problem was not technical. It was structural.
SELF-CORRECTION VS SELF-IMPROVEMENT
This is where I realised something important.
Self-correction means the agent catches its own errors before sending output. It happens inside one run, against a fixed schema. The session ends, and whatever the agent learned — disappears.
Self-improvement means the agent builds something across runs. Each session leaves a trace that the next session can use. Errors become rules. Rules become context. Context shapes the next output before generation even starts.
The first is a quality filter. The second is something closer to learning.
And this distinction is not just about AI agents. It is the difference between systems that repeat and systems that evolve. Between people who fix mistakes and people who stop making the same ones. Most organisations have self-correction. Very few have genuine self-improvement. The mechanism looks similar from the outside. The architecture underneath is completely different.
What I had was a good quality filter. What I was missing was the accumulation layer underneath it.
DOES CLAUDE CODE ALREADY HAVE PERSISTENT MEMORY?
This is a fair question — and one I had to work through myself.
Claude Code has a file called CLAUDE.md. It loads automatically at the start of every session. When you tell the agent to remember something for future runs, it can write it there. And next time, it will be there. That is real persistence. It is not an illusion.
So when Claude Code confirms it will remember something — it is not lying.
The problem is what "there" actually means in practice.
Full access to my thoughts, personal stories, findings, and what I learn from the people I meet.
Join the LibraryGet the full article by email and feel free to reply if you want to discuss it further.
Summary
If you have any thoughts, questions, or feedback, feel free to drop me a message at mail@richardgolian.com.