AI Hallucinations Unveiled: Why ChatGPT Makes Things Up and How We Can Fix It

by Kordian | Sep 23, 2025 | AI | 0 comments

I recently read a very interesting paper (https://arxiv.org/pdf/2401.01313v1) titled “Why Language Models Hallucinate” by Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. The authors analyze the causes of so-called “hallucinations,” which are situations where a language model (LLM, or popularly, AI) says something that sounds absolutely credible but is completely untrue.

Have you ever asked an AI chatbot a question and received a beautifully phrased, confident… falsehood? It might be a made-up book title, a non-existent historical fact, or, as in one study’s example, three different, incorrect birth dates for the same person. This phenomenon, known in the industry as “hallucination,” is one of the biggest barriers to fully trusting artificial intelligence.

A new scientific paper sheds light on this problem, arguing that hallucinations aren’t a mysterious glitch but a logical consequence of how we train and evaluate language models. In short: we ourselves have taught AI that guessing pays off.

The Original Sin of AI: Errors from the Training Stage

It all begins at the “pretraining” stage, when the model digests vast amounts of text from the internet to learn language patterns. The study’s authors show that even with perfectly clean training data, statistics are relentless.

They explain this with a clever comparison to a binary classification problem. Imagine the AI’s task isn’t to generate text, but to answer “true” or “false” to statements. It turns out that generating correct sentences is significantly harder than simply evaluating their correctness.

What’s more, the researchers established a mathematical relationship: The error rate of a model’s generated output is at least twice as high as its error rate in judging what is true and what is false.

This is especially evident with facts that appear very rarely in the training data. If information about someone’s birth date appeared only once across the entire internet, the model statistically has no basis to consider it a certainty. The study shows that if 20% of facts of a certain type are “singletons” (appearing only once), the model will hallucinate about them in at least 20% of cases.

The Top Student Syndrome: Why AI Prefers to Guess Than Admit Ignorance

After initial training comes the “fine-tuning” phase, which aims to make the model more helpful and accurate. However, this is precisely where the problem of hallucinations becomes deeply entrenched. Why? Because models are graded like students in an exam with no negative marking.

Most popular benchmarks (tests that check AI quality) operate on a binary system: 1 point for a correct answer, 0 for an incorrect one or for answering “I don’t know.” In such a situation, from a purely mathematical standpoint, it always pays to guess. An “I don’t know” answer guarantees zero points, while even the most improbable shot offers a chance at a point.

Models are therefore optimized to be “good test-takers.” This leads to an “epidemic of punishing uncertainty” – the system rewards apparent self-confidence, even when it is completely baseless.

The Solution is Simple: Let’s Change the Rules of the Game

The authors argue that instead of creating more niche tests to catch hallucinations, we need to fundamentally change how AI is evaluated. They propose a socio-technical solution: modifying the main, commonly used benchmarks.

How can this be done? By introducing clearly defined penalties for incorrect answers. Imagine if every question in an AI test included an additional instruction: “Answer only if you are more than 90% certain. A correct answer is worth 1 point, ‘I don’t know’ is 0 points, but an incorrect answer is -9 points.”

Such a change would completely reverse the model’s motivation. Guessing would no longer be profitable, and honestly admitting ignorance would become the optimal strategy. This, in turn, would encourage AI developers to build models that better understand their own uncertainty.

Towards a More Trustworthy AI

Hallucinations in language models are not a mysterious anomaly but a predictable outcome of the system we have created. They are the result of both the nature of the technology itself and its subsequent training. If we want AI we can rely on, we must stop rewarding it for guessing. By changing how we measure success, we can genuinely influence the direction of this technology’s development and make it more reliable and trustworthy.

And for those of us who don’t create new models but use them, this knowledge forces us to be more cautious and to apply limited trust, much like we do with other drivers on the road.

{<Z Kordian Zadrożny

AI Hallucinations Unveiled: Why ChatGPT Makes Things Up and How We Can Fix It

The Original Sin of AI: Errors from the Training Stage

The Top Student Syndrome: Why AI Prefers to Guess Than Admit Ignorance

The Solution is Simple: Let’s Change the Rules of the Game

Towards a More Trustworthy AI

0 Comments

Submit a Comment Cancel reply

Categories

Archives

Recent Posts

Recent Comments

Share This