
Do large language models know what humans know, or are they only echoing patterns found in language? That question now sits at the center of AI research, psychology, and philosophy. As models like GPT improve at predicting human behavior, the line between knowledge and imitation feels less clear. What looks like understanding might be something else entirely.
People don't just react to facts. We react to beliefs, especially when those beliefs are false. That ability is known as theory of mind—our capacity to reason about what someone else thinks, knows, or misunderstands. The uncomfortable question is whether LLMs are starting to show the same ability.
To answer this, we need to be precise about what "knowing" means. In humans, knowledge is deeply tied to mental states, experience, and social interaction. In LLMs, everything starts with language.
Belief attribution is the ability to track what another agent believes, even when that belief conflicts with reality. This skill is essential for communication, teaching, deception, and empathy. Children typically develop it early in life, without formal instruction.
For machines, belief attribution is not learned through experience. It is inferred from text alone. That raises a core issue: can exposure to language create sensitivity to belief states without a mind behind it?
This question drives the language exposure hypothesis, which suggests that patterns in language may be enough to approximate human-like belief reasoning.
Researchers often use the false belief task to test theory of mind. The setup is simple:
- A character places an object in Location A
- The object is moved to Location B while the character is absent
- The question: where will the character look?
Humans answer based on the character's belief, not reality. LLMs are tested using short written passages that describe these events. If the model predicts the original location, it shows sensitivity to the character's belief state.
This task is designed to separate surface-level recall from belief reasoning.
LLMs learn by detecting statistical relationships in language. They do not observe objects, track locations, or form intentions. Yet they still perform above chance on belief tasks.
This creates tension. If belief sensitivity emerges from language alone, then human theory of mind may rely more on linguistic exposure than previously assumed. However, performance gaps suggest language is not the whole story.
The most discussed results come from experiments comparing GPT-style models with human participants on false belief tasks.
In controlled studies, GPT-3 achieved approximately 74.5% accuracy on false belief tasks. That result surprised many researchers. The model was especially effective when belief states were implied rather than explicitly stated.
This matters because it suggests the model is not simply matching keywords. It is responding to narrative structure and belief cues embedded in language.
Larger models performed better than smaller ones, showing increased sensitivity to belief states as model size increased.
Human participants reached about 82.7% accuracy on the same tasks. That gap is important. It shows that while LLMs approximate belief reasoning, they do not match human consistency.
Humans also generalize better. When stories change format or include distractions, people remain accurate. Models degrade faster under those conditions.
This suggests that humans rely on additional mechanisms beyond linguistic statistics, such as embodied experience or innate cognitive structures.
Performance improves with scale. Models like text-davinci-002 outperform earlier versions by a wide margin. Larger parameter counts allow models to capture more subtle patterns in belief attribution.
However, scale alone does not close the gap. More data increases sensitivity, not understanding. That distinction matters.
These findings force researchers to confront a difficult idea: behavior consistent with belief reasoning does not guarantee belief understanding.
Some researchers argue for a practical stance. If a model behaves as if it understands beliefs, we should treat it as such. This view prioritizes observable behavior over internal mechanisms.
From this perspective, LLMs may already qualify as limited social agents. They predict belief-driven behavior reliably enough to be useful.
Others warn of shortcut learning. The Clever Hans effect describes systems that appear intelligent but rely on unintended cues. LLMs may exploit linguistic regularities without representing beliefs at all.
If so, their success reflects clever pattern matching, not mental state reasoning.
This concern limits how far we can generalize from task performance.
One productive outcome is methodological. LLMs provide a baseline for testing how much of human theory of mind can be explained by language exposure alone.
When models fail where humans succeed, those gaps point to mechanisms language cannot explain.
That makes LLMs useful not as minds, but as mirrors.
This is where the debate stops being technical and starts getting uncomfortable.
If a system reliably predicts what someone believes, many people assume it must "understand" them. That intuition is powerful—and dangerous if taken too far.
The duck test says: if it behaves like it understands beliefs, treat it as if it does. From a practical standpoint, this is tempting. LLMs already guide users, answer questions, and adapt responses based on perceived human knowledge states.
In applied settings, belief-sensitive behavior can be enough. Customer support, tutoring, and social tools benefit even if the system lacks inner awareness.
But usefulness is not the same as understanding.
A core objection is structural. LLMs lack embodiment, goals, and persistent mental states. They do not track beliefs across time unless prompted. Each response is generated independently, based on input text.
This creates a logical gap. Human belief attribution depends on continuity and self-modeling. LLMs simulate belief reasoning without maintaining belief representations.
Some argue this makes true theory of mind impossible for current models, regardless of performance.
The Clever Hans effect remains a serious concern. LLMs may exploit subtle cues in language that correlate with belief outcomes. That produces behavior consistent with belief reasoning without internal belief models.
This limits what we can infer from accuracy alone. Passing a task does not explain how the task was solved.
Ironically, the most valuable insight may not be about machines at all.
LLMs show that human language contains far more information about beliefs than previously assumed. Patterns of phrasing, verb choice, and narrative structure encode mental states implicitly.
That supports the idea that language exposure plays a major role in developing theory of mind.
However, humans still outperform models. That gap matters.
Humans use non-linguistic cues: perception, shared attention, physical context, and lived experience. LLMs lack all of these. Their failures highlight which parts of belief reasoning depend on more than text.
In that sense, LLMs act as a control condition. They show how far language alone can go—and where it stops.
This reframes the debate. The question is not whether models think like humans, but what human thinking actually requires.
So, do large language models know what humans know?
They show real sensitivity to belief states. They perform well on false belief tasks. Their behavior is often consistent with human reasoning. But they do not fully explain human social cognition.
Language alone gets surprisingly far. It does not get all the way.
The current evidence suggests LLMs approximate belief reasoning through exposure to linguistic patterns, not through genuine mental state representation. That distinction matters for ethics, trust, and how we deploy these systems.
As models improve, the question will not disappear. It will sharpen.
Understanding where imitation ends and cognition begins is now one of the most important problems in AI research.
What does "theory of mind" mean in AI research?
Theory of mind refers to the ability to attribute mental states, such as beliefs or knowledge, to others. In AI, it is tested through tasks that require predicting behavior based on what someone believes, not what is true.
How do researchers test belief understanding in large language models?
They often use false belief tasks written as short stories. If a model predicts behavior based on a character's belief state, it shows sensitivity to belief attribution.
Did GPT-3 actually pass false belief tasks?
Yes. GPT-3 achieved about 74.5% accuracy, which is well above chance but still below human performance.
Does this mean LLMs understand human minds?
No. Their behavior is consistent with belief reasoning, but there is no evidence they represent beliefs as mental states.
Why do larger models perform better on belief tasks?
Larger models capture more complex language patterns, including subtle cues related to beliefs and knowledge states.
Can language alone explain human theory of mind?
Language explains part of it, but humans likely rely on additional biological, social, and perceptual mechanisms.
Are these findings useful outside AI research?
Yes. They help psychologists understand how much social cognition may be learned from language exposure.