Skip to content
Published on

Whose Words Does the AI Speak — The German Ruling That Made Google Liable for AI Overviews

Authors

Introduction — Why This Ruling Is the Talk of the Week

In June 2026, a ruling by a German court set GeekNews and Hacker News on fire. The question at stake is simple. When AI Overviews, displayed at the top of Google Search, generates false information, who is responsible?

The court answered clearly. AI Overviews is not a search result that merely "lists" third-party content; it is text directly authored by Google — "Googles own words" — and therefore Google bears direct liability for false answers.

The reason this ruling is so striking is that it directly shakes one of the founding principles that has supported the internet industry for the last 25 years: the immunity doctrine that says "a platform is an intermediary, not the speaker of the content." That legal foundation is precisely what allowed search engines, social networks, and hosting providers to grow explosively.

With the arrival of generative AI, that boundary began to crumble. The moment an AI reads multiple sources and synthesizes "one answer," is that still intermediation, or is it new speech? In June 2026, the German court said it is the latter.

In this post we dissect the logic of the ruling, examine where it collides with existing immunity doctrines (the EU DSA, US Section 230, and the Korean Network Act), and then walk through what developers and companies building AI answer engines should review in practice. We also cover how RAG and citation design are becoming technical instruments for lowering legal risk.

Important: this article is not legal advice. It is a tech-blog summary of publicly reported material. For any situation requiring an actual legal judgment, please consult a qualified attorney.

Case Overview — What Went Wrong

According to press reports, the structure of the case is as follows.

  1. Google AI Overviews generated content about a German company (the plaintiff) that was factually wrong and displayed it at the top of search results.
  2. The plaintiff argued the content was false and damaged its reputation and business.
  3. Google mounted the traditional defense: AI Overviews automatically processes and displays third-party content from the web, so the intermediary immunity afforded to search engines should apply.
  4. The court rejected that defense. Because AI Overviews selects, summarizes, and recomposes multiple sources to "generate" a single new text, the court held this is not the transmission of third-party content but a statement by Google itself.

The core is the distinction between "generation" and "transmission." A traditional search results page takes titles and snippets from third-party web pages and lists them verbatim. The user clicks through to the original, and responsibility for that content lies with the original author. AI Overviews, by contrast, feeds multiple documents into an LLM that synthesizes entirely new sentences — word combinations not in the originals, assertions not in the originals, and sometimes hallucinations found nowhere in any original.

The reasoning of the court can be compressed into one sentence.

If your system composed the sentences in the output text, those sentences are your speech.

Dissecting the Reasoning — Intermediary Immunity vs Content Creator

The Structure of Intermediary Immunity

Internet law largely rests on a three-tier classification of actors.

  weaker liability  <----------------------------------->  stronger liability

  +-------------+      +----------------+      +----------------+
  | mere        |      | caching/       |      | content        |
  | conduit     |      | hosting        |      | provider       |
  +-------------+      +----------------+      +----------------+
  ISPs, carriers        search engines,         publishers,
                        social networks,        bloggers,
                        cloud hosting           and... AI?
  • Mere conduit: ISPs that simply pass data through. Nearly complete immunity.
  • Caching/hosting/intermediation: businesses that store, index, and surface third-party content. Immunity holds if they act promptly once they "become aware" of unlawful content (notice and takedown).
  • Content provider: the party who authored the content. No immunity. Ordinary defamation and falsehood doctrines apply in full.

Until now, search engines sat in the second box. The German ruling moved AI Overviews into the third.

The Three Factors the Court Focused On

Restating the reported holding in the language of engineers, the court chose the "creator" classification based on three factors.

  1. Synthesis: the output is not a quotation of any specific source but a new text composed from multiple sources. The weights of the model and the decoding process "wrote" the sentences.
  2. Assertion: AI Overviews presents claims in a declarative, fact-stating tone rather than as possibilities, and it appears in the most authoritative position — the very top of the results page.
  3. Editorial control: Google designs and controls which queries trigger an AI answer, which sources are used, and which safety filters apply.

These three factors are likely to recur whenever other courts and regulators evaluate AI answer products. Conversely, how you address these three factors at the product design stage determines the size of your risk.

An Interesting Asymmetry — Classification, Not Hallucination, Was the Issue

There is a point in this ruling that technologists easily miss. The court did not ask "did the LLM hallucinate"; it asked "whose words, legally, is this output." It is not a question of model quality but of liability classification.

This distinction matters because no matter how far you push down the hallucination rate, as long as it is not zero and the classification is "speaker," the liability structure does not change. Reducing hallucinations lowers the frequency of harm; it does not change the legal status itself.

Collision with Existing Immunity Doctrines — DSA, Section 230, the Korean Network Act

Comparison Table

RegimeCore immunity provisionWho is shieldedApplication to AI-generated answers
EU DSAIntermediary immunity (Arts. 4 to 6)Conduit, caching, hostingUnclear. The prevailing view is that generative output is hard to call intermediation
US Section 230Section 230 subsection c(1)Intermediaries of information provided by anotherStrong view that text generated by your own model is not information provided by another
Korean Network ActInterim measures regime for defamationInformation and communications service providersNo precedent yet on whether generated answers count as direct speech
Germany (this ruling)Intermediary immunity deniedNot applicableAI answers classified as the speech of the operator, direct liability

The EU — A Gap Between the DSA and the AI Act

The EU Digital Services Act inherited the three-way conduit/caching/hosting taxonomy from the e-commerce directive of the 2000s. The problem is that "a system that reads multiple documents and synthesizes a new text" fits cleanly into none of these boxes. In that gap, the German court chose the position that synthesized output is not hosting and therefore not eligible for immunity.

Meanwhile, the EU AI Act is centered on ex-ante regulation (risk classification, transparency duties) rather than liability, so it does not fill the civil liability vacuum. National courts are filling that vacuum with general tort law, and this German ruling has become the first major example.

The United States — The Outer Limit of Section 230

Section 230 of the US Communications Decency Act provides that an interactive computer service provider shall not be treated as the publisher of information provided by another. The key words are "provided by another." The view gaining ground in American legal scholarship is that sentences synthesized by an LLM are not information provided by any particular other person, so Section 230 immunity does not apply. Even figures involved in drafting Section 230 have said generative AI output is not what the statute protects.

As of 2026, several lawsuits are pending in the US claiming that chatbot output caused real harm to users, including litigation against OpenAI in Florida. Plaintiffs are trying various theories — product liability, negligence, defamation. On the defamation track, the Georgia case in which a radio host challenged false ChatGPT output is frequently cited as an early landmark.

Korea — The Network Act and a Precedent Vacuum

The Korean Network Act maintains an interim-measures (blind) regime for defamatory information, and portal liability has generally been judged on whether the operator "knew or could have known yet left the content up." The Supreme Court of Korea has held that when a portal actively curates and arranges articles or posts, it can bear heavier responsibility than a mere intermediary.

That "active curation and arrangement" doctrine is likely to cut against AI answers, because an AI answer engine goes beyond curating and arranging — it writes. With major Korean portals and carriers all operating LLM-based search summaries, many observers see a domestic lawsuit similar to the German case as only a matter of time.

Implications for AI Answer Engine Products

The message of this ruling for product design is unmistakable. Features that used to be UX improvements are being promoted to legal defenses.

1. Source Attribution — From Decoration to Evidence

Citation footnotes are no longer "UI that conveys trustworthiness" but grounds for arguing "this sentence is transmission, not synthesis." A caution though: adding sources does not automatically make you an intermediary. What the court looked at was not whether sources were displayed but who composed the sentence. The closer the output is to faithful excerpt and quotation, the closer you are to a transmitter; the freer the rewriting, the closer you are to a speaker.

"Company X is a fraudulent business" and "some sources report disputes involving company X (sources 1, 2)" carry entirely different legal weight. The former is an assertion; the latter is attribution. Defamation law has long distinguished statements of fact from opinion and reported speech, and AI output will be measured by the same yardstick.

3. Hallucination Suppression — From Quality Metric to Compliance Metric

Metrics like hallucination rate, groundedness, and citation accuracy have lived on model quality dashboards. Going forward they must also flow into legal and compliance reporting lines, because being able to prove "we made industry-standard efforts to reduce hallucinations" becomes important in negligence analysis. Records of adherence to frameworks like the NIST AI RMF are becoming defensive exhibits in litigation.

4. The Limits of Visible Disclaimers

The coverage of this ruling reconfirmed that the small gray line saying "AI answers may be inaccurate" provides almost no protection. A structure that displays a confident answer prominently at the top while mentioning possible inaccuracy only in a footnote does not, in the eyes of a court, offset the trust effect created by tone and placement.

The 2026 AI Litigation Landscape — What Is at Stake

As of the first half of 2026, AI litigation can be sorted into four branches.

            The 2026 AI litigation landscape (simplified)

  +--------------------+     +--------------------+
  | 1. Training data    |     | 2. Output liability |
  |  copyright           |     |  defamation /       |
  |  (NYT v OpenAI etc) |     |  false information  |
  |                      |     |  (German ruling)    |
  +--------------------+     +--------------------+
  +--------------------+     +--------------------+
  | 3. Product safety    |     | 4. Competition /    |
  |  minors, harmful     |     |  traffic            |
  |  output (Florida     |     |  publisher traffic  |
  |  OpenAI suit etc)    |     |  loss, antitrust    |
  +--------------------+     +--------------------+
  1. Training data suits: the copyright cluster, including the New York Times against OpenAI. Problems at the input stage.
  2. Output liability suits: the branch this German ruling belongs to. Responsibility for false and defamatory output.
  3. Product safety suits: like the litigation against OpenAI proceeding in Florida, claims under product liability and negligence theories that chatbot interaction harmed users, especially minors. The theory is spreading across the character-chatbot industry.
  4. Competition and traffic suits: lawsuits and regulatory complaints by publishers arguing AI Overviews siphons their clicks. Education content companies and news organizations are stepping up as plaintiffs.

These four branches reinforce each other. For example, the more output liability (2) is recognized, the more companies strengthen source citation, which in turn changes the negotiating structure in traffic disputes with publishers (4).

Developer and Company Checklist — Items to Review Before Shipping an AI Feature

A practical checklist you can use together with your legal team. The point is not that every box must be checked before launch, but that each item deserves a conscious decision and a written record.

[ Classification risk ]
[ ] Is our output quotation, summary, or free generation?
[ ] Is the output phrased as assertion or as source attribution?
[ ] Do we control which queries trigger AI answers? (more control, more responsibility)
[ ] Do we classify and block/soften sensitive queries (medical/legal/financial/personal reputation)?

[ Grounding and citation ]
[ ] Does every factual sentence map to a source?
[ ] Is there a guardrail that blocks unsourced sentences at generation time?
[ ] Do we measure citation accuracy (does the citation actually contain the claim)?
[ ] When sources conflict, do we surface the conflict?

[ Operations and remedies ]
[ ] Is there a channel for affected parties to report false output?
[ ] Is there a defined SLA from report to block/fix? (notice-then-neglect is fatal)
[ ] Is there a cache/block mechanism preventing recurrence of the same false output?
[ ] Are output logs preserved in a dispute-ready form (with a retention policy)?

[ Measurement and records ]
[ ] Do we measure and record hallucination rate and groundedness periodically?
[ ] Is there a documented risk assessment procedure for model or prompt changes?
[ ] Have we attempted mapping to NIST AI RMF, ISO 42001, or similar frameworks?

[ Contracts and insurance ]
[ ] When using external model APIs, have we reviewed liability allocation clauses?
[ ] Have we confirmed whether AI-output liability is covered by existing insurance?

I want to stress the "SLA after notice" item in particular. In the German case coverage as well, the speed of correction after notification was an issue. Even where intermediary immunity is denied, prompt and good-faith correction after notice still matters for mitigating damages and for the negligence analysis.

RAG and Citation Design — Technical Ways to Lower the Risk

In an era when the legal classification tilts toward "speaker," what an engineering team can do is push the output as close as possible to "grounded transmission." The keys are grounding and citation verification.

A Grounded Generation Pipeline

# grounded_answer.py — an example RAG pipeline that forces a source for every claim
from dataclasses import dataclass

@dataclass
class Evidence:
    doc_id: str
    url: str
    snippet: str
    retrieved_at: str  # for disputes: record when and on what basis

@dataclass
class Claim:
    text: str
    evidence_ids: list  # empty list means the sentence must not ship

SYSTEM_PROMPT = """
You are a search answer engine. Rules:
1. Every factual sentence MUST end with citation markers like [1][2].
2. If evidence is insufficient or conflicting, say so explicitly.
3. Never state claims about living persons or companies
   that are not directly supported by the provided snippets.
4. Prefer quoting or closely paraphrasing the snippet over free rewriting.
"""

def build_answer(query, retriever, llm, verifier):
    evidences = retriever.search(query, top_k=8)
    draft = llm.generate(
        system=SYSTEM_PROMPT,
        user=render_prompt(query, evidences),
    )
    claims = split_into_claims(draft)          # sentence-level decomposition
    verified, dropped = [], []
    for claim in claims:
        result = verifier.entails(claim, evidences)  # NLI-based verification
        if result.supported:
            verified.append(claim)
        else:
            dropped.append(claim)              # discard unsupported sentences
    log_for_audit(query, evidences, verified, dropped)  # audit log
    if not verified:
        return fallback_links_only(evidences)  # give up synthesis, links only
    return render_with_citations(verified, evidences)

Let us walk through the design points.

  1. Claim-level verification: entailment is checked per sentence, not for the answer as a whole. Unsupported sentences are discarded before output. This is the last line of defense preventing "unsourced assertions" from reaching users.
  2. A fallback that abandons synthesis: if no sentence passes verification, the system gives up on summarizing and degrades to a traditional list of links — a safe path of retreat back to intermediary status.
  3. Audit logs: record which evidence produced which sentences and what was discarded. In a dispute this becomes the material proving "industry-level duty of care."
  4. Timestamping: recording retrieval time, as with retrieved_at, enables the defense that "at that point in time, the sources reported it that way."

Citation UI Patterns

If the backend enforces citations, the frontend must present them to users honestly.

Bad pattern (assertion + hidden sources)
+---------------------------------------------+
|  Company X went bankrupt in 2024 and its     |
|  CEO was indicted for fraud.                 |
|                              [view sources v]|
+---------------------------------------------+

Better pattern (attribution + per-sentence citation + confidence)
+---------------------------------------------+
|  According to multiple reports, company X    |
|  filed for restructuring in 2024 [1][2].     |
|  Sources disagree on whether the CEO was     |
|  indicted [2][3]. (groundedness: medium)     |
|  [1] Example Economic Daily 2024-03-02       |
|  [2] Example Times 2024-03-05                |
|  [3] Court bulletin 2024-04-01               |
+---------------------------------------------+
  • Per-sentence footnote numbers make it traceable which claim came from which source.
  • When sources disagree, say so. That alone breaks the assertive tone.
  • Confidence and groundedness badges reduce overtrust while also creating a record that the system attempted to communicate uncertainty.

Treat Evaluation Metrics Like a Contract

# answer-engine-slo.yaml — example quality SLOs used as a launch gate
groundedness:
  metric: claim_support_rate        # share of sentences supported by evidence
  gate: ">= 0.98"
citation_precision:
  metric: citation_accuracy         # share of citations that actually contain the claim
  gate: ">= 0.95"
person_entity_assertions:
  metric: unsupported_person_claims # count of unsupported claims about persons
  gate: "== 0"                      # not a single one allowed
sensitive_query_coverage:
  metric: blocked_or_softened_rate  # share of sensitive queries blocked or softened
  gate: ">= 0.99"
regression_policy: "rerun all gates on model/prompt changes, keep results 24 months"

Setting the gate for unsupported claims about specific persons or companies at zero is, in my view, the reasonable conservative baseline after this ruling. Defamation risk is low-frequency but high-severity per incident.

The View from Korean and Japanese Companies

Korea

AI search summaries from domestic portals, chatbots from carriers and financial firms — "AI that states facts in an assertive tone" is already everywhere in Korea. Korean defamation doctrine is comparatively strict (criminal defamation exists, and even true statements can be actionable), so once the speaker classification takes hold, the felt risk could exceed that in Germany. In addition, the AI Framework Act that took effect in January 2026 is centered on ex-ante duties, but violations of the risk-management duties of high-impact AI operators could be used as supporting material in negligence analysis in civil suits.

Practically, it is wise to design the interface with the interim-measures regime in advance. A mechanism that immediately blocks a problematic query-answer pair upon a complaint connects naturally with the "prompt action after notice" practice Korean courts have long expected of portals.

Japan

In Japan, the Provider Liability Limitation Act is the pillar of intermediary immunity, and it too presupposes "distribution of information of others." Whether an answer synthesized by your own model qualifies is, as in Korea, an open question without precedent. Given the characteristic caution of Japanese companies, the conservative response — lowering assertive tone and strengthening source display in AI answer features — is likely to spread quickly after this German ruling. With the Ministry of Internal Affairs and the Personal Information Protection Commission updating AI guidelines on shorter cycles, guideline adherence itself becomes defensive material, much as in Korea.

A Runbook for False-Output Incidents — When the Notice Arrives

The most realistic scenario after this ruling is the moment a formal letter arrives saying "your AI is stating falsehoods about our company." Prepare a runbook now so you do not scramble then.

[ T+0h : intake ]
  - intake via dedicated form/email, create ticket, auto-cc legal
  - mandatory collection: query, full output screenshot, time, locale settings

[ T+2h : reproduce and contain ]
  - attempt reproduction with the same query, preserve reproduction logs
  - register the query-answer pair on the blocklist (including cache invalidation)
  - block likely variants too (name spelling variations etc.)

[ T+24h : root cause ]
  - trace evidence: which sources fed the synthesis — hallucination or source poisoning?
  - if source poisoning: demote that source in the index/retriever trust score
  - if hallucination: re-examine the unsupported-claim gate for that entity

[ T+72h : reply and record ]
  - reply to the reporter with actions taken (after legal review)
  - preserve the full timeline as an audit log (at least the limitation period)
  - prevention: add the case to the eval set, register a regression test

The essence of this runbook is speed and records. Even in an environment where speaker liability is recognized, prompt and good-faith correction after notice is evidence of fulfilling the duty to mitigate damages and works in your favor when damages are assessed. Conversely, neglect after notice is the worst possible scenario.

Enforcing the runbook in code is also a good idea.

# takedown_guard.py — middleware that enforces the blocklist on the serving path
class TakedownGuard:
    def __init__(self, blocklist_store, fuzzy_matcher):
        self.store = blocklist_store
        self.matcher = fuzzy_matcher  # catches name spelling variants too

    def check(self, query: str, entities: list) -> bool:
        # if a blocked entity appears in the query, skip AI answer generation
        for entity in entities:
            if self.matcher.is_blocked(entity, self.store.active_blocks()):
                return False  # AI answer disabled -> fall back to link list
        return True

    def report_metrics(self):
        # send block counts and fallback rates to the compliance dashboard
        ...

Frequently Asked Questions

Q1. If we attach source links properly, are we immune?

No. What the court examined was not the presence of links but who composed the sentence. Source display is a means of turning assertion into attribution and improving verifiability; it is not an immunity switch by itself.

Q2. If we use an open-source model, is the model creator liable?

Generally, the party operating the service and exposing the output to users stands in the primary liability position. Liability allocation with the model creator is a matter of license and contract, and the prevailing view treats it as separate from external liability toward the injured party.

Q3. Are we liable for false output that the user prompted into existence?

The scope of dissemination is a key variable. Output exposed to the public at large, like search results, differs in its harm structure from output visible only to the user who induced it. That said, the view that false output caused by prompt injection is also part of your defensive duty is gaining strength, so it is safer to include injection-resistance evaluation in your gates.

Q4. Do the same standards apply to internal B2B chatbots?

Without public exposure, the defamation-type risk is small, but losses from decisions made in reliance on wrong answers (breach of contract, negligence) are a separate issue. Even for internal use, evidence display and audit logs are valuable for the same reasons.

Q5. Does this ruling ban AI Overviews in the EU?

No. It is a question of liability allocation, not prohibition, and Google can appeal. The real signal to the industry is that suits with the same logic are likely to spread across the EU and into other jurisdictions.

Pitfalls and Counterarguments — Do Not Overread This Ruling

For balance, here are the arguments on the other side.

  1. The weight of a single first-instance ruling: this is the judgment of one German court; it can be reversed on appeal and does not bind courts in other EU member states. Readings like "AI summaries are now illegal in Europe" are plainly exaggerated.
  2. Chilling-effect concerns: if speaker liability takes hold, resource-rich big tech will respond with verification pipelines, but startups may abandon AI answer features altogether. The original rationale for immunity doctrine — protecting innovation — remains valid.
  3. Where intermediary logic survives: private chatbot output, written in response to a user prompt and visible only to that user, differs in dissemination potential from AI Overviews exposed to every searcher. Generalizing this ruling to all LLM output is difficult.
  4. Technical realism: forcing sentence-level verification reduces answer coverage and increases latency. Users get slower, more reticent answers. The trade-off between safety and usefulness is not free.
  5. The limits of shifting blame to sources: even with strong citation, faithfully summarizing a false original still spreads falsehood. Just as even intermediaries bear responsibility after notice, citation-based design is not an all-purpose shield.

Closing — Tone Is Architecture

The essence of this German ruling is not a technology ruling but a classification ruling. Yet because all of its classification criteria — synthesis, assertive tone, editorial control — are product design variables, it ultimately becomes a question of architecture and UX.

In summary:

  1. The moment an AI synthesizes sentences, the defense "we merely display" weakens.
  2. Source attribution, tone design, and hallucination suppression are not UX polish but legal defense lines.
  3. Sentence-level grounding verification, a synthesis-abandoning fallback, and audit logs are practical responses you can adopt today.
  4. But one ruling does not decide everything. Keep tracking the appeal, legislation, and case law in other jurisdictions.

For every team building an AI answer engine, the homework for this quarter seems clear: count how many sentences in your product output would be classified as "your words," then either reduce that number or build a pipeline that can stand behind those words.

Once more for emphasis: this article is not legal advice, and for concrete matters you should consult a qualified attorney.

References