Article
Training an AI Agent That Learns Between Sessions
The goal I set myself
I wanted to build an agent that does not just assist. One that acts.
The idea was straightforward: configure automations to retrieve data, let the agent analyse what it finds, have it propose next steps, send those proposals somewhere for review, and through that feedback loop — gradually improve. At a certain point, once its proposed steps consistently matched what I considered good decisions, it would stop waiting for approval and start executing on its own.
Not a chatbot. Not a co-pilot. An autonomous system that earns its authority through demonstrated accuracy.
That was the goal. I wrote about parts of it in my previous article on local AI models. This is the next chapter.
WHAT I BUILT — AND WHAT IT COULD NOT DO
The first version was simple by design. But the interesting part was not what it did. The interesting part was what it could not do.
The agent runs on a schedule. It retrieves data, analyses it, and sends a report to Slack. To make sure the output was consistent, I created a schema — an approved format the agent checks itself against before sending anything. If something does not match, it corrects itself. It loops until the output passes. If something prevents it from completing the process — such as a failed LLM call — it does not send a degraded output. It sends an alert to Slack instead.
I also added positive examples. Approved outputs from previous runs that the agent can reference when producing the next one.
This felt like a solid system. And for a while, I thought it was.
THE THING THAT KEPT BOTHERING ME
Every session starts from zero.
The schema is there. The examples are there. But the agent does not know what it struggled with yesterday. It does not know which rule it keeps violating. It does not know what it has already figured out.
And that changes everything.
The self-correction loop works within a single session. Between sessions, nothing accumulates. So the inconsistency I was seeing was not a configuration problem. It was not a prompting problem.
The problem was not technical. It was structural.
SELF-CORRECTION VS SELF-IMPROVEMENT
This is where I realised something important.
Self-correction means the agent catches its own errors before sending output. It happens inside one run, against a fixed schema. The session ends, and whatever the agent learned — disappears.
Self-improvement means the agent builds something across runs. Each session leaves a trace that the next session can use. Errors become rules. Rules become context. Context shapes the next output before generation even starts.
The first is a quality filter. The second is something closer to learning.
And this distinction is not just about AI agents. It is the difference between systems that repeat and systems that evolve. Between people who fix mistakes and people who stop making the same ones. Most organisations have self-correction. Very few have genuine self-improvement. The mechanism looks similar from the outside. The architecture underneath is completely different.
What I had was a good quality filter. What I was missing was the accumulation layer underneath it.
DOES CLAUDE CODE ALREADY HAVE PERSISTENT MEMORY?
This is a fair question — and one I had to work through myself.
Claude Code has a file called CLAUDE.md. It loads automatically at the start of every session. When you tell the agent to remember something for future runs, it can write it there. And next time, it will be there. That is real persistence. It is not an illusion.
So when Claude Code confirms it will remember something — it is not lying.
The problem is what "there" actually means in practice.
Join the Library
Full access to my thoughts, personal stories, findings, and what I learn from the people I meet.
Join the Library — €29.99 per yearGet the full article by email and feel free to reply if you want to discuss it further.
Summary
Common questions on this article's topic
What is the difference between AI self-correction and self-improvement?
What is CLAUDE.md and what are its limitations for AI agents?
Why does every AI agent session start from zero?
What is a structured memory layer for AI agents?
Can you run autonomous AI agents locally?
What does it take to build an AI agent that earns autonomy?
Related articles
I am building an AI system to predict the S&P 500. It runs on my own machine, uses free public data — yfinance, FRED, the Shiller dataset — and grades every forecast against reality. This series documents the build itself: the decisions, the methodology, the mistakes. What I will eventually share from the running system is a separate question, and an honest one.
Yesterday I could not tear myself away from the computer. When I lifted my head, it was half past eight in the evening. I had been sitting alone upstairs for about three hours.
Before you can teach AI to understand anything, you need to see what it is hiding from you.
More articles
Prague, 13 May 2026. On my way to work I started thinking about something that stayed with me for days. If most routine work on a computer disappears in the next ten years, and a large share of repetitive manual work disappears with it, what happens to the flow of money? Who pays whom for what? Which economic layers will exist, how large will they be, and what relationships will run between them? This is the six-layer map I sketched as an answer.
Will AI take my job? A certified Google trainer told me in June 2024 that my profession would cease to exist. Twenty-two months later, my job title has not changed — but ninety percent of what I do during the day is different. I have delegated more of my thinking to AI agents than I thought possible. I am not afraid. This is why, and what it means for anyone asking the same question.
One hour. Fifty-five minutes. That is how long it took to build what a Czech software firm had quoted at over €50,000. I built it with Claude Code. Not a prototype. Not a proof of concept. A working tool — the one the company actually needed. By the evening of the same day, it was running on staging. This is not about Claude Code. It is about what Claude Code exposes.
I have conducted roughly one hundred and fifty practical interviews over the past four years. Fifty for data specialist roles. A hundred for advertising and performance marketing specialists. Almost every one of them involved sitting down with a candidate over a practical task — something close to a real problem we actually need to solve at the company. Not theory. Not trivia. Applied problem-solving. Over time, I started noticing a pattern.
The moment other people needed access to it, the problem changed completely. It was no longer about whether the agent could learn. It was about who gets to teach it.
This is what I learned about local vs cloud AI, and why I switched to Claude Code.
What happened — and how can it be reversed?
Four days in Catalonia. No computer, no AI, almost no social media. I bought this notebook so that I could write down what I would think about, and what I would come across and learn on the trip.
