Google buries special markup for AI Search: focus on content

Summarize this article with AI

In short: In brief : Google officially clarifies that special files (llms.txt, markdown, chunking) are not a lever for AI Search. Efforts should be redirected toward HTML content quality and semantic structure. Stop wasting money on these useless formats.
54/62sites audited with llms.txt: zero AI Search impact
0%measurable gain from flat files
4 monthsof wasted effort per site on average

I review 15 sites per week: here’s the common mistake

15 SEO audits per week. Every Wednesday, I see the same pattern.

A /llms folder sitting at the root. Sometimes an llms.txt file at 2.3 MB. Often a /markdown directory with hundreds of variations.

I asked around. Out of 62 e-commerce sites analyzed since January, 54 had created a special file for AI Search. All 54 told me the same thing: zero additional traffic from a generative engine.

« We were told it was essential. »

This morning, a tech client calls me. He spent 3,200 € on automated markdown generation for 4,500 product pages. Result after 6 weeks: 0 clicks from Search Generative Expérience. Zero.

The diagnosis is brutal. It’s not a bug. It’s a misallocation.

The truth? It comes straight from the clearest memo Google has ever published on this. I’m giving it to you unfiltered.

« You don’t need to create new machine readable files »: Google’s clarity

The statement dropped. Sharp. No nuance.

What Google says, word for word:
« LLMs.txt files and other ‘special’ markup: You don’t need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search. »

I translate: don’t create special files for AI Search. No llms.txt. No dedicated markdown. No additional markup. None of it.

This is the first time Google has used such direct language. A clarification was needed; here it is.

And it goes further. The documentation adds that « Google may discover, crawl, and index many kinds of files in addition to HTML on a website: this doesn’t mean that the file is treated in a special way. »

In other words: crawling a markdown file gives it no advantage. The file is ingested like any other content. Without preferential treatment.

The recommendation is unequivocal. It comes from the Search Central Team and was massively shared on r/SEO. I’m giving you the source, no personal interpretation. This is a direct instruction from Google.

Why did so many agencies sell the opposite? Because the idea was seductive. Yet the underlying technical mechanism never validated this need. And for good reason.

The real mechanism: one language, well-structured HTML

Google parses, analyzes, and classifies page content. No alternative version. No overlay.

The real mechanism of AI Search rests on a triplet:

  • Explicit textual content (headings, paragraphs, lists)
  • Readable semantic structure (HTML tags, schema.org data when they document visible content)
  • Thematic authority built on interconnected page clusters

Native HTML, refined carefully, is enough. The generation engine pulls the essentials from article, section, and main blocks directly. No plugin. No intermediate file.

With my own clients, I’ve observed a constant: whenever a site appears in the AI Overview, it’s always the canonical HTML version that’s the source. Never the markdown file. Never the llms.txt. Not a single occurrence in 18 months of tracking across 47 deployments.

And chunking in all this? Same answer. Google confirms it plainly:

Save your cognitive energy:
« There’s no requirement to break your content into tiny pieces for AI to better understand it. Google systems are able to understand content without special chunking. »

No need to chunk. No need to format for a machine. Your blog editor, your product pages, your in-depth articles: that’s already the optimal format.

3,200 € and 4 months: the real price of a myth

Back to my tech client. July, we review his llms.txt file. 2.3 MB of content. A full catalog crawl completed in under 3 weeks by an agency.

The cost: 3,200 €.

The result: 0 clicks from an AI Search interface over 4 months of tracking. Not a flutter. Not a mention. A flat line.

We stopped everything. In 3 days, we redirected effort to the only thing that works: editorial depth on product pages. We densified answers to recurring questions. We enriched schema.org markup for already-visible attributes.

Four weeks later, 5 placements in AI Overview on long-tail commercial queries. No special markup. No additional file. No chunking.

The worst part? It’s not an isolated case. Out of the 54 special files I audited, not a single one showed causal impact. No proven link between creating an llms.txt and appearing in AI Search. No correlated curve.

The only anomaly I saw: an 8% drop in regular organic traffic 3 weeks after deploying a large flat file. Probably a diluted crawl signal. But nothing beneficial.

Ignore it, free yourself: the urgency to stop now

Stopping an unproductive effort takes more courage than launching a new one. I know; I myself lost weeks on the AMP issue in 2018.

Today, Google’s memo is a strong signal. You have to make a cut.

Don’t delete your existing files brutally: a simple noindex or a 410 status code is enough. But stop all new production.

Here’s my immediate checklist for an e-commerce site:

  • Audit the presence of a /llms/, /markdown/, or llms.txt file
  • Measure the time allocated to this maintenance: developers, writers, agency
  • Decide in 24 hours: we stop
  • Redirect the investment to the only thing that matters: HTML content quality

3 hours of auditing is enough to secure this shift. Less time than it takes to generate a flat file for a catalog of 2,000 items.

The ROI is immediate. The energy freed is massive.

Remember: Google has never conditioned AI Search presence on a third-party format. The clarification is just a technical reminder to kill an expensive rumor.

The only lever that delivers lasting results

If you want your pages sourced by AI Search, work on your content architecture, not an off-site text file.

In practice, that means:

  • Clean HTML tags, using <h1> through <h3> hierarchically
  • Dense paragraphs with explicit answers (one clear question, one direct answer below it)
  • Thematic clusters that link pages together through coherent internal linking
  • schema.org data describing visible content (FAQ, HowTo, Product), never hidden content

It’s not glamorous. It’s the foundational work I forge with my clients: a system that runs, without depending on a trendy file Google will delist in 6 months.

I guided a DIY site with 3,800 pages; we focused on editorial re-engineering. In 8 weeks, the site went from 0 to 23 mentions in AI Overview, on queries with 470 cumulative monthly volumes. No llms.txt. No markdown.

AI Search doesn’t reward a format. It rewards expertise delivered within a readable semantic framework.

And you, how heavy is your useless file?

I ask because I’ve seen too many sites carry hundreds of MB of flat content without ever generating a single organic conversion.

Entire teams believe in it. Budgets go into it. Sprints are built around these artifacts.

Meanwhile, the content that pays — your product pages, your in-depth articles, your thematic silos — sleeps on a slow crawl thread, ignored by generative engines because it’s unoptimized.

I have a ritual when I onboard a new client. I ask « Show me your most recent flat file. » That’s when I measure the urgency.

So, when are you turning off your llms.txt?

A live audit to decide in 90 minutes

We review your flat file, measure its real impact (or lack thereof), and redirect your effort toward what Google actually rewards. One call, no sales pitch. Just pages.

Book a strategic call — 45 min

Frequently Asked Questions

Should I immediately delete my existing llms.txt file?

Not brutally. Add a noindex via your robots.txt file or an X-Robots-Tag header, or return a 410 status. Stop all new generation, period.

Will Google penalize my site if I keep a markdown file?

No direct penalty. The risk is elsewhere: wasted crawl budget, team time, and zero results. Google crawls the file without treating it specially. Zero advantage.

What do I replace my chunking efforts with?

Strengthen your internal linking, place direct answers under your <code><h2></code> and <code><h3></code> tags, and build a thematic silo plan. It’s simpler, and actually correlated with AI Overview placements.

Is schema.org also useless for AI?

Absolutely not. Structured data helps clarify content type. But focus only on types that reflect visible content (FAQ, Product, Article). Don’t create hidden ‘special AI’ markup.

How long until I see results after refocusing on HTML?

Across the 47 deployments I tracked, improving structured HTML content generated AI Overview mentions between 3 and 9 weeks. No flat file ever lasted more than a week without proving its uselessness.

Stéphane Jambu

Stéphane Jambu

SEO & AI Engineer

I build growth systems / AI / Neuroscience | 650+ clients · 80 LinkedIn testimonials · 30 years of expertise · 15 years of systems running without me.

Follow on LinkedIn

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *