Skip to content
Published on

LLM Hallucination: Why AI Makes Things Up and 5 Strategies to Prevent It

Authors

Introduction

If you've deployed an LLM in production, you've encountered this: a user asks a straightforward question and the model responds with complete confidence — and complete inaccuracy. A chatbot invents a return policy that doesn't exist. A coding assistant suggests an API method that was never part of any library. A research assistant cites a paper that was never published.

This is hallucination. And it's not a bug — it's a fundamental consequence of how LLMs work. This guide breaks down the technical causes and gives you five practical strategies to fight back, with real code you can deploy today.


What Exactly Is Hallucination?

Hallucination isn't a single phenomenon. Identifying the type determines the correct fix.

The 4 Types of Hallucination

1. Factual Hallucination The model generates outright false facts with apparent confidence.

  • "The Eiffel Tower is located in London"
  • "Python was created by Guido van Rossum in 1995" (it was 1991)

2. Confabulation Plausible-sounding but entirely fabricated details — the model fills gaps with invented specifics.

  • Citing a paper that doesn't exist: "According to Smith et al., 2023..."
  • Suggesting a library method or function that has never existed

3. Attribution Hallucination Real information, wrong source.

  • Attributing a quote to the wrong person
  • Citing accurate statistics but crediting the wrong organization

4. Temporal Hallucination Outdated information presented as current fact.

  • Calling a model "the latest" when it was superseded after the training cutoff
  • Writing code against a deprecated API because that was in the training data

Why Does Hallucination Happen? The Technical Cause

How an LLM works at its core:

Input tokens → [Transformer layers] → probability distribution over next token → sample

Example:
"Paris is the ___"{"capital": 0.91, "city": 0.06, "heart": 0.02, ...}
                   → select "capital"

The fundamental issue: an LLM does not reason about truth. It predicts the statistically most plausible next token. There is no "I don't know" state in the probability distribution — the model must always predict something.

When asked about information not in its training data, the model doesn't refuse. It pattern-matches to the closest thing it learned and fills in the blank — confidently.

Three Specific Technical Root Causes

Cause 1: Confidence and accuracy are decoupled

A high-probability token selection doesn't mean the output is factually correct. The model is confident that a token is a likely continuation — not that the statement is true. It has no internal flag for "I'm uncertain about this fact."

Cause 2: Training data contains errors

The internet is full of misinformation. LLMs train on it indiscriminately. Frequently repeated errors get reinforced as "plausible" patterns. There's no ground truth filter during pre-training.

Cause 3: Lost in the Middle

Research (Liu et al., 2023) shows that LLMs struggle to accurately recall information from the middle of long contexts. They attend more reliably to information at the beginning and end of the context window. This causes hallucination even when the correct answer was provided — the model just didn't attend to it.


5 Prevention Strategies

Strategy 1: RAG (Most Effective)

Retrieval-Augmented Generation grounds the model's response in retrieved facts. The model only answers from what's in the retrieved context.

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate

# The key: explicitly forbid fabrication
SYSTEM_PROMPT = """You are a helpful assistant that answers questions ONLY based on the provided context.

Rules:
1. Never make up information not in the context
2. If the answer isn't in the context, say "I don't have information about this in the provided documents"
3. Always cite which part of the context supports your answer

Context:
{context}
"""

def rag_query(question: str, vectorstore) -> dict:
    # Retrieve relevant documents
    docs = vectorstore.similarity_search(question, k=4)
    context = "\n\n---\n\n".join([doc.page_content for doc in docs])

    prompt = ChatPromptTemplate.from_messages([
        ("system", SYSTEM_PROMPT),
        ("human", "{question}")
    ])

    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
    chain = prompt | llm

    response = chain.invoke({
        "context": context,
        "question": question
    })

    return {
        "answer": response.content,
        "sources": [doc.metadata.get("source", "unknown") for doc in docs]
    }

Real-world impact: RAG reduces hallucination rates by 60-80% on domain-specific queries. The constraint "only answer from context" is extraordinarily powerful.

Strategy 2: Self-Critique Pipeline

Ask the model to review its own answer. The same model plays both "answerer" and "reviewer" roles in two separate API calls — crucially, the reviewer doesn't see its own previous reasoning, reducing confirmation bias.

def self_critique_pipeline(question: str, llm) -> str:
    """Two-pass self-critique to reduce hallucination"""

    # Pass 1: Generate initial answer
    initial_response = llm.invoke(
        f"Please answer the following question: {question}"
    )
    initial_answer = initial_response.content

    # Pass 2: Self-review (separate call, no memory of pass 1's reasoning)
    critique_prompt = f"""Review the following question and answer critically.

Question: {question}
Answer: {initial_answer}

Check for:
1. Factual accuracy — are any claims potentially wrong?
2. Unsupported specifics — dates, names, numbers that might be invented?
3. Outdated information that may have changed?

Mark uncertain claims explicitly, and provide a revised answer with corrections if needed.
Uncertain claims should use phrases like "as of my last training data" or "I believe, but please verify."
"""

    critique_response = llm.invoke(critique_prompt)
    return critique_response.content

# Usage
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)
result = self_critique_pipeline(
    "What were the main architectural innovations in GPT-3?",
    llm
)

Strategy 3: Chain of Verification

Proposed by Dhuliawala et al. (2023) at Meta AI. The model generates an answer, then generates verification questions from its own claims, answers each independently, and uses the results to correct itself.

def chain_of_verification(question: str, llm) -> dict:
    """
    Step 1: Generate initial answer
    Step 2: Extract verifiable claims as questions
    Step 3: Answer each verification question independently
    Step 4: Correct the final answer using verification results
    """

    # Step 1: Initial answer
    initial = llm.invoke(question).content

    # Step 2: Extract verification questions
    verification_prompt = f"""From the following answer, extract the key factual claims
and turn each into a standalone verification question.

Answer: {initial}

Format: one verification question per line.
Focus on specific facts: dates, names, numbers, relationships."""

    vq_raw = llm.invoke(verification_prompt).content
    questions = [q.strip() for q in vq_raw.split('\n') if q.strip()]

    # Step 3: Answer each independently (without seeing the original answer)
    verifications = {}
    for vq in questions[:5]:  # Cap at 5 to control costs
        answer = llm.invoke(
            f"Answer this question concisely and accurately: {vq}"
        ).content
        verifications[vq] = answer

    # Step 4: Produce corrected final answer
    correction_prompt = f"""Original question: {question}
Original answer: {initial}

Verification results:
{chr(10).join([f'Q: {q}\nA: {a}' for q, a in verifications.items()])}

Using the verification results, produce an improved final answer.
Where verification revealed uncertainty, use hedged language ("reportedly", "as of 2023", etc.)"""

    final_answer = llm.invoke(correction_prompt).content

    return {
        "initial_answer": initial,
        "verifications": verifications,
        "final_answer": final_answer
    }

Strategy 4: Temperature and Sampling Tuning

Temperature directly controls how "creative" (read: risky) the model is with its token selection. Lower temperature = more conservative = fewer hallucinations on factual tasks.

from openai import OpenAI
client = OpenAI()

def factual_query(prompt: str) -> str:
    """Conservative settings for fact-based queries"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1,     # Low: stick to highest-probability tokens
        top_p=0.9,           # Only sample from top 90% probability mass
        presence_penalty=0.0,   # Don't penalize repetition of established facts
        frequency_penalty=0.0   # Same
    )
    return response.choices[0].message.content

def creative_query(prompt: str) -> str:
    """Relaxed settings for creative tasks"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.9,     # High: allow exploration
        top_p=0.95
    )
    return response.choices[0].message.content

# Task-appropriate dispatch
answer = factual_query("Explain the difference between TCP and UDP")
story = creative_query("Write a short story about an AI that becomes self-aware")

Temperature guidelines:

  • 0.0–0.2: Factual Q&A, data extraction, classification
  • 0.3–0.5: Technical writing, summarization, code generation
  • 0.6–0.8: General conversation, explanations
  • 0.9–1.0: Creative writing, brainstorming

Strategy 5: Forced Source Citation

Require the model to tag every factual claim with its source. This makes hallucinations immediately visible — any claim tagged [Source: unknown] signals a fact worth verifying.

CITATION_SYSTEM = """When answering questions, you MUST tag every factual claim:

- [Source: X] — where X is the specific source you're drawing from
- [Source: unknown] — for facts you believe are true but can't cite specifically
- [Inference] — for logical conclusions you're drawing yourself

Example:
"Python was first released in 1991 [Source: Python docs / Guido van Rossum].
It is now one of the most popular languages worldwide [Source: Stack Overflow Survey 2024].
It will likely remain dominant in ML for the next decade [Inference]."

Never omit source tags. If you would need to say [Source: unknown] for too many claims,
say so upfront and reduce the scope of your answer.
"""

def cited_response(question: str, llm) -> str:
    from langchain.schema import SystemMessage, HumanMessage
    messages = [
        SystemMessage(content=CITATION_SYSTEM),
        HumanMessage(content=question)
    ]
    response = llm.invoke(messages)
    return response.content

Measuring Hallucination

RAGAS Faithfulness (for RAG systems)

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_precision

# Measures: does the answer stay faithful to the retrieved context?
# Score: 0.0 (fabricates everything) to 1.0 (perfectly grounded)
results = evaluate(
    dataset=test_dataset,  # questions, answers, contexts, ground_truths
    metrics=[faithfulness, answer_relevancy, context_precision]
)

print(f"Faithfulness:       {results['faithfulness']:.3f}")   # Target: >0.85
print(f"Answer Relevancy:   {results['answer_relevancy']:.3f}")  # Target: >0.80
print(f"Context Precision:  {results['context_precision']:.3f}") # Target: >0.75

TruthfulQA Benchmark

817 adversarially-crafted questions designed to elicit hallucinations. Reference scores (as of early 2025):

  • GPT-4: ~59% truthful
  • Claude 3 Opus: ~62% truthful
  • Humans: ~94% truthful

The gap between AI and humans is exactly why hallucination mitigation matters in production.


When You Can't (and Shouldn't) Prevent Hallucination

Being honest: some hallucination is unavoidable. Some is desirable.

Cases where "hallucination" is a feature:

  • Creative writing: you want the model to invent things
  • Brainstorming: novel connections are the point
  • Hypothetical scenarios: "what if" requires imagination

Risk-based framework:

Use CaseHallucination RiskRecommended Approach
Medical informationCriticalRAG + verification + mandatory "consult a doctor" disclaimer
Legal adviceCriticalNever use LLM alone
Code generationMediumAuto-run tests to verify output
Document summarizationLowLow temperature + source documents provided
Creative writingN/ANo restrictions needed

Production-Ready Configuration

class HallucinationConfig:
    """Hallucination-minimizing configs for different task types"""

    FACTUAL = {
        "temperature": 0.1,
        "system_suffix": "\n\nIf you're unsure about any fact, say so explicitly.",
        "use_rag": True,
        "self_critique": True
    }

    CONVERSATIONAL = {
        "temperature": 0.7,
        "system_suffix": "\n\nBe honest when you don't know something.",
        "use_rag": False,
        "self_critique": False
    }

    CODE = {
        "temperature": 0.2,
        "system_suffix": "\n\nOnly suggest functions and methods that actually exist.",
        "use_rag": True,   # Documentation-grounded RAG
        "self_critique": True
    }

Conclusion

Hallucination is not a fixable bug — it's an intrinsic property of probabilistic language models. The model doesn't know truth. It knows probabilities.

But with the right architecture, you can reduce hallucination dramatically:

  1. RAG grounds responses in retrieved facts (most impactful)
  2. Self-critique adds a review pass before the user sees the answer
  3. Chain of Verification stress-tests individual claims
  4. Low temperature keeps factual outputs conservative
  5. Forced citation makes hallucinations visible and auditable

The key principle: match your mitigation strategy to your use case's risk level. For medical or legal applications, LLMs should never operate without human oversight regardless of what mitigation you apply.