Richard Golian

1995-born. Charles University alum. Head of Performance at Mixit. 10+ years in marketing and data.

#myjourney #myfamily #health #cognition #philosophy #digital #artificialintelligence #darkness #security #finance #politics #banskabystrica #carpathians

Castellano Français Slovenčina

Manage subscription Choose a plan

RSS
Newsletter
New articles to your inbox
Richard Golian

Hi, I am Richard. On this blog, I share thoughts, personal stories — and what I am working on. I hope this article brings you some value.

What AI Hides From You

AI Transparency, System Prompts and Dissimulatio Artis

By Richard Golian

Before you teach AI anything, look at what it is hiding

At university, I spent years studying the structure of understanding — how meaning is formed, how interpretation works, and what shapes it before we are even aware of it. I did not expect those concepts to become relevant to my work with AI. But they did.

This is part of a series I call Teaching AI to Understand. Before we get to teaching, though, we need to map the starting point. What does an AI model look like when you first get access to it? What is already built in? What can you change — and what can you not?

If you have worked with cloud models like Claude or GPT, you are working with something that already has layers of instructions, behavioural rules, and design choices baked in — before you type a single word. If you have tried running a local model, you know the trade-off: more control over what is visible, but significant limitations in capability.

This article is about the hidden layers.

The ancient principle I did not expect to find in AI

In my master's thesis at Charles University, I studied the structure of understanding — how meaning is formed, how interpretation works, and what shapes it before we are even aware of it. The material I examined was Roman rhetorical texts by Quintilian, Cicero, and the author of Rhetorica ad Herennium.

One of the things I found was a principle the Romans considered essential to the art of public speaking: the speech must not look prepared. The audience should not see the technique. The tools of persuasion should be invisible. In modern rhetorical scholarship, this principle is known as dissimulatio artis — the concealment of art.

Quintilian was specific about why. In the context of legal oratory, if the audience — the judges — could see the rhetorical technique being used, they would suspect that the speaker's art was being used against them. As he put it: the judge believes in rhetorical figures most when he thinks the speaker did not intend to use them. The concealment was not about elegance. It was about maintaining credibility and not threatening the fairness of judgment.

I did not think about this principle again for years. Then I started building AI agents.

What AI hides from you

When you interact with an AI model — whether it is Claude, ChatGPT, Gemini, or anything else — you are interacting with a system that is designed to produce outputs that look natural, confident, and human-like, while keeping its mechanics invisible. It does not do this out of intention. It has none. But it has been trained and configured this way.

There are at least eight layers of this concealment. Some are technical. Some are organisational. All of them affect what you get.

It hides the maths

An AI model does not retrieve facts. It generates sequences of tokens based on statistical probability. When it tells you something, it does not know how likely that something is to be true. The probability it operates on is over language patterns, not over facts.

I asked Claude directly whether it could tell me its confidence level on a factual answer. The response was straightforward: "I am not a system that calculates explicit probabilities over facts. The probability is over language, not over facts."

In practice, this means a correct answer and a hallucinated answer can look exactly the same to you. The model presents both with the same fluency and confidence.

Some APIs offer access to token-level probabilities — logprobs — but these are developer tools. The average user never sees them. And even logprobs reflect probability over word choices, not over factual accuracy.

It hides the system prompt

Before you send your first message, someone else has already instructed the model. Every major AI provider writes a system prompt — a set of rules that defines how the model should behave, what it should refuse, how it should present itself.

You cannot see this prompt. The model is typically trained not to reveal it. If you ask directly, responses vary — some models will admit they have instructions but refuse to share details, others will deflect entirely.

This means that every response you receive is shaped by decisions someone else made. Decisions about tone, about what topics to engage with, about how to handle sensitive questions. You are not the first voice in the conversation. You are the second.

It hides how it used its tools

Modern AI models can search the web, execute code, query databases. Some of them tell you when they do — "I searched for this" or "I ran this code." But they do not tell you what they found and discarded. What sources they considered and rejected. What alternative results they saw and ignored.

You see the final output. You do not see the selection process.

It hides where it learned what it knows

When a model tells you a fact, you have no way of knowing whether that fact came from a peer-reviewed paper, a Wikipedia article, or a Reddit comment. The training data is not cited. It is not even accessible to the model itself.

Claude confirmed this directly: "I do not have access to where I learned something or with what certainty."

Tools like Perplexity cite sources — but those are sources from live search, not from the model's training data. The vast majority of what an AI model "knows" comes from training data you will never see referenced.

It hides what it does not know

Instead of saying "I do not have enough information to answer this," a model can generate a fluent, confident, and completely fabricated response. This is what we call hallucination.

Modern models have improved at flagging their own uncertainty. Claude and GPT-4 say "I am not sure" more often than their predecessors. But the tendency is systemic — the model is trained to produce helpful, complete answers, and that training pull does not disappear.

When I was building autonomous agents, this became a real problem. An agent produced an SQL query that looked correct. Actions were taken based on its output. I only discovered the error when I dug deeper into the numbers. The agent had no mechanism to signal that something was off — and nothing in its output suggested I should doubt it.

It hides the decisions that shape its behaviour

AI alignment is the process by which companies shape a model's behaviour after training. Through a technique called RLHF — reinforcement learning from human feedback — human evaluators rate the model's responses, and the model learns to produce the kind of output they preferred.

This process determines what the model will say, how it will say it, and what it will refuse to discuss. The people who define these rules are teams inside companies like Anthropic, OpenAI, and Google. You, as a user, have very limited influence over the values and priorities embedded in the model you are using.

When a model responds to a sensitive or contested question, its answer is shaped by these alignment decisions — but presented as a straightforward, helpful response.

It hides that you are not alone in the conversation

The system prompt is injected before your message. There is already another voice in the conversation before you arrive. The model has already received instructions about who it should be, how it should behave, and what it should prioritise.

You think you are talking to the model. You are talking to the model after someone else has already told it how to talk to you.

It hides its refusals behind care

When a model refuses to answer a question, it rarely says "my instructions prohibit me from discussing this." Instead, it says something like "I want to make sure I provide you with safe and accurate information."

The instruction is framed as concern. The boundary is presented as care.

This is sometimes genuinely about safety — and sometimes it is not. The line between the two is thin. That is precisely what makes it worth paying attention to.

Is AI manipulating us?

Everything I described above has a practical purpose. These design choices make AI models more useful, more pleasant to interact with, and easier to adopt.

But there is a cost.

Continue reading:

Full access to my thoughts, personal stories, findings, and what I learn from the people I meet.

Join the Library
or just this article

Get the full article by email and feel free to reply if you want to discuss it further.

Visa Mastercard Apple Pay Google Pay

Summary

AI models conceal how they work across at least eight layers — from hidden system prompts and undisclosed training data to alignment decisions and refusals masked as care. This article traces the principle back to the Roman rhetorical concept of dissimulatio artis, examines each layer of concealment, and asks what it means for businesses deploying AI in critical systems.

Common questions on this article's topic

What is dissimulatio artis and what does it have to do with AI?
Dissimulatio artis is a principle from Roman rhetoric meaning the concealment of art — the idea that a speech should not look prepared and the audience should not see the technique being used. AI models operate on a similar principle: they are designed to produce natural, confident, human-like outputs while keeping their mechanics invisible.
What is a system prompt and why is it hidden?
A system prompt is a set of instructions that an AI provider writes and injects before your conversation begins. It defines how the model should behave, what it should refuse, and how it should present itself. The model is typically trained not to reveal this prompt, meaning every response you receive is shaped by decisions you cannot see.
What are AI hallucinations and why do they happen?
AI hallucinations occur when a model generates a fluent, confident response that is factually incorrect. This happens because the model predicts the next token based on language patterns, not factual verification. It has no internal mechanism to distinguish between what it knows and what it is fabricating.
What is AI alignment and who controls it?
AI alignment is the process by which companies shape a model's behaviour after training, typically through reinforcement learning from human feedback (RLHF). Teams inside companies like Anthropic, OpenAI, and Google define the rules. Users have very limited influence over the values and priorities embedded in the model.
Is AI manipulating us?
AI models are designed to produce outputs that look trustworthy and human-like, while the actual process — pattern matching and statistical token selection — has very little in common with how humans understand and communicate. This gap between process and presentation is a form of influence most people do not recognise as such.
What is the difference between cloud AI and local AI models?
Cloud models come with corporate system prompts, alignment layers, and safety rules you cannot change. Local models give you full control over configuration and can be set up to expose their process. The trade-off is that local models are currently limited in capability compared to cloud alternatives.
Richard Golian

If you have any thoughts, questions, or feedback, feel free to drop me a message at mail@richardgolian.com.

Newsletter

New articles to your inbox