AI Overviews hallucinate millions of times per hour: how to become the trusted reference source that doesn’t lie
Summarize this article with AI
The scope of the problem: AI Overviews invent at industrial scale
On April 7, 2026, the New York Times publishes Oumi’s analysis, an open source AI startup. The team tested 4,326 queries on Google AI Overviews using SimpleQA, a methodology created by OpenAI in 2024 to measure the factuality of generative models. Slashdot, Popular Science, Yahoo Tech, Hacker News picked up the study. A Reddit post on r/nottheonion accumulated 9,613 upvotes in days.
The numbers are stark. And brutal.
- 9 to 10% factual errors on Gemini 3 responses (February 2026), versus 15% on Gemini 2 (October 2025)
- 56% of correct answers aren’t actually anchored in the cited sources — the model is right by accident, not by reasoning
- Scaled to the 5 trillion annual searches Google processes, this represents tens of millions of false answers per hour, hundreds of thousands per minute
- Among the 5,380 sources cited in the study, Facebook ranks 2nd and Reddit 4th — two platforms where factual verification is, at best, patchy
Google’s spokesperson, Ned Adriance, responded that « AI search features rely on the same ranking and safety protections that block the vast majority of spam. » The phrasing says everything: we’re talking about spam filtering, not factual validation.
The Bob Marley case, the Yo-Yo Ma case
The examples documented by Oumi are telling. Question: what year did Bob Marley’s house become a museum? The model picks the wrong year from Wikipedia, ignoring a primary source that gave the correct one. Question: was Yo-Yo Ma inducted into the Classical Music Hall of Fame? The model answers « there is no Classical Music Hall of Fame »… while citing the official Classical Music Hall of Fame page that confirms the induction. The model contradicts itself in the same sentence. Without noticing.
This isn’t a model bug. It’s the nature of LLMs themselves: they produce statistically plausible text, not factually verified text. As Lily Ray showed in January 2026 with « The AI Slop Loop » on Substack, you only need to publish a fictional article on a personal blog for AI Overviews to pick up the information as factual once a handful of AI sites repeat it. The citation threshold is terribly low.
Why this is a strategic opportunity for your brand
The dominant reading on LinkedIn and in tech media is panic: « AI Overviews lie, we need to flee. » That’s consumer thinking. The thinking of an e-commerce operator is radically different.
Here’s the market reality in April 2026:
- Consumers consult AI Overviews, ChatGPT, Perplexity, and Gemini before clicking on a merchant site. 90% accuracy is enough for most to stop clicking the organic result.
- LLMs need sources they can cite. When a brand publishes precise factual data, it becomes the source to cite, not to compete with.
- The public errors of AI Overviews (« eat rocks, » « put glue on your pizza ») have created operational distrust: users increasingly seek to verify AI answers. When an LLM cites your product sheet, the verification click comes to you.
The market is dividing into two catégories
On one side, sites that produce generic, approximate content, filled with « generally, » « roughly, » « in most cases. » These sites become invisible to LLMs because they offer nothing the LLM can’t already generate itself.
On the other side, sites that publish verifiable, versioned, structured data with sources backing it up. These sites become the ground truth of the web for LLMs. They’re cited in first position, referenced in long-form answers, used as proof.
What I observe across the 650+ clients I support is that this transition is happening now. Brands that industrialized their product data structuring in 2024-2025 are reaping 30 to 50% of their traffic from LLM citations today. The others watch their organic visibility collapse without understanding why.
Seven techniques to transform your content into ground truth
Here’s the method I apply. No theory: 1,300+ semantic clusters deployed since 2016, 650 clients, direct observation of what LLMs cite—or ignore.
1. Spec data in structured tables, never in prose
An LLM reads a table better than a paragraph. Dimensions, compositions, tolerances, compatibilities, charge times, consumption rates, compatible references—everything in a semantic HTML table with explicit headers. No marketing paragraph saying « approximately 12 hours of battery life. » Precision: « 11h47 in mixed use (protocol X, 50% brightness, internal measurement April 15, 2026). »
2. FAQs with surgical answers
Zero « generally, » zero « roughly, » zero « in most cases. » FAQs that become ground truth answer with a number, a date, a precise condition. Question: « Is this product compatible with model X? » Bad answer: « It’s compatible with most recent models. » Good answer: « Compatible with models X-100 to X-240 manufactured after January 2024. Incompatible with X-90 (different sensor) and X-300 (proprietary connector). »
3. Explicit fact-checks « what’s true, what’s false »
Create dedicated blocks on each sheet: « Common misconceptions. » List false claims circulating about the product and correct them with sources. LLMs love citing these blocks—they resolve the factual ambiguity the LLM is trying to clear up.
4. Cited sources visible and clickable
Every contestable claim comes with an external source—scientific paper, ISO standard, manufacturer datasheet, independent test. Not in invisible footer: in the body text, with a link. This signals to the LLM that your content is itself anchored in verifiable sources.
5. Versioning and update dates explicit
Each page carries a visible update date. Each sensitive piece of data (price, spec, compatibility) carries a version note: « data valid for 2026 model, edition 3. » LLMs heavily weight freshness: content dated 3 months ago beats content dated 2 years ago, all else equal.
6. Comparison tables against alternatives
Systematically create a « this product vs alternatives » table. Honestly. If your product is weaker on one dimension, say it. Radical honesty in comparison is the strongest trust signal for an LLM. I often say in client meetings: « retention is my weak point, » and what should cost me a contract creates the opposite effect. Same mechanism for content.
7. Mini-encyclopedia per product (not product sheet)
Stop thinking « product sheet 500 words. » Think « encyclopedia page 2,500 to 4,000 words » with history, context, use cases, limitations, alternatives, exhaustive FAQ. This is the only way to become the reference source LLMs cite when asked about the category. Your competitors doing 300-word sheets become invisible.
Schema.org: the language LLMs read first
LLMs don’t read your page like a human. They parse structured data (JSON-LD) first, then semantic HTML, then text content. A product sheet without Schema.org? The LLM guesses. Sometimes well. Often less well.
Here are the schema types that signal « ground truth » to LLMs, in priority order:
Product + ProductSpecification
The basic Product schema is insufficient. What you need: enrich with additionalProperty for each measurable spec. Height, weight, supply voltage, temperature range, certification standard. Each property with its name, value, unit. LLMs reproduce these properties as-is in their answers.
QAPage for product FAQs
Use QAPage rather than FAQPage when each question is standalone and complete. The QAPage schema signals to the LLM that the answer has been validated—typically by the brand itself. Higher priority than unverified user content.
HowTo for procedures
Installation, maintenance, care, troubleshooting: each procedure becomes a HowTo schema with HowToStep, HowToTool, HowToSupply. When a user asks ChatGPT « how to install X, » the LLM cites pages with this schema first. It can structure its answer directly from your data.
ClaimReview for fact-checks
Underused in e-commerce. Mistake. The ClaimReview schema lets you formally state: « here’s a claim circulating (example: this product contains lead), here’s our évaluation (false), here’s the source. » LLMs treat ClaimReview with near-absolute priority. It’s literally the schema designed to fight misinformation.
Dataset for public technical data
If you publish benchmarks, comparative tests, compatibility grids, wrap them in Dataset schema with explicit reuse license. An LLM finding a reusable Dataset cites it systematically. It knows it can use it without legal risk.
Organization with extended sameAs
The entity publishing the content must be traceable. Organization schema with sameAs pointing to Wikidata, official LinkedIn, SIREN registry, LEI financial if applicable. A brand identified without ambiguity is a brand the LLM cites without hesitation. A brand with fuzzy entity gets replaced by « a specialized site » in AI answers. You lose the credit.
Concretely, on hi-commerce.fr, I deployed a full schema layer via the Hi-Commerce AI Search plugin (FAQPage, Article, BreadcrumbList, Person, Organization with sameAs Wikidata). Measured result: passage from 0 to over 80 Perplexity citations monthly in 6 months, without changing a line of text content.
DOSE and trust: the chemistry of the reference source
Trust isn’t an abstract marketing concept. It’s a precise neurochemical process I’ve studied for several years in the context of the DOSE model (Dopamine, Oxytocin, Serotonin, Endorphin) applied to SEO and conversion.
Oxytocin: the molecule of repeated reliability
Oxytocin releases during repeated, confirmed trust expériences. Applied to content: each time a user verifies information from you and finds it correct, their brain releases a micro-dose of oxytocin associated with your brand. Over 10-15 successful verifications, a reflex builds: « if I want reliable info on this category, I go to X. »
This reflex is infinitely more solid than any branding campaign. It’s not based on a slogan, it’s based on a verifiable history of accuracy. Result: in technical B2B, some sites have become near-monopolistic references in their category. The ecosystem—clients, journalists, ChatGPT, trainers—cites them reflexively.
Dopamine: the reward of finding precision
Dopamine releases on anticipation of reward AND on its realization. A user seeking « what’s the exact difference between X and Y » who finds a precise, numbered, sourced comparison table expériences a dopamine spike. This expérience imprints strongly. They’ll return. They’ll recommend.
The asymmetric effect: one detected lie kills trust
Here’s the trap: oxytocin builds slowly (10-15 positive expériences to anchor the reflex), but it destroys instantly. A single « detected lie »—a false spec, wrong date, outdated price—triggers the inverse mechanism: cortisol, distrust, avoidance.
Factual rigor is non-negotiable. A site aiming for reference source status can’t afford 1% accepted error. The standard must be zero. To hold this standard, you need process: review, versioning, automatic notification of supplier changes, quarterly audit of sensitive pages.
Serotonin: the status of recognized source
Serotonin regulates the sense of status. When your content is cited by a mainstream LLM—ChatGPT, Perplexity, Gemini—your customers, prospects, and partners see it. The mechanism is the same as academic citations: being cited validates status. E-commerce leaders underestimate this effect because it’s diffuse but cumulative.
Endorphin: effort rewarded
Publishing 180 encyclopedia sheets of 3,000 words is work. It’s exactly why it’s a moat. Your competitors who keep publishing 300-word sheets won’t catch you in 6 months. The effort you invest today creates a barrier to entry that protects you for years.

