Linguistic errors in AI search: the Catalan case

Summarize this article with AI

In short: A three-language Barcelona e-commerce site was losing 26% of its Catalan traffic due to language identification errors by AI search engines. By fixing hreflang, language markup, and semantic silos, we recovered 47% of clicks in 4 months. Multilingual regions expose a bug in AI models.
3 languageson a single Barcelona e-commerce site
26%of Catalan pages indexed as Spanish
47%of organic clicks recovered in 4 months

A Tuesday morning, Barcelona, 3 languages, Catalan traffic at a standstill

A client calls me on a Tuesday morning. He runs a fashion e-commerce site in Barcelona. 800 SKUs, 3 languages: Catalan, Spanish, English. Annual revenue: €340,000. His Catalan traffic is capped at 120 sessions per day. A local competitor is beating him by 40%.

He’d invested €6,000 in link building. Zero results.

I check his Google Search Console. 26% of his Catalan pages show an hreflang tag pointing to « es ». Google treats them as Spanish. When someone searches in Catalan, the page that ranks is in Spanish. The Catalan user bounces. Bounce rate: 78%.

The content is solid. Keywords too. The error comes from language detection by the search engine.

The site is visible, but reaching the wrong audience in the wrong language.

Catalan mistreated by AI: what Search Engine Land revealed

On May 21, 2026, Search Engine Land published an analysis: « Multilingual regions reveal the future of AI search ». The findings are clear. AI search engines regularly confuse Catalan with Spanish. This bug affects all minority languages in multilingual regions.

When someone asks a question in Catalan, the generative AI—the one answering at the top of SERPs in overviews—pulls from Spanish corpora. Catalan content, even if optimized, becomes invisible. Citations, excerpts, and synthetic answers use Spanish competitor sources. Result: your pages lose long-tail traffic, featured snippets, and AI-driven clicks.

The problem is serious. According to Search Engine Land, misidentified language causes faulty indexing, broken rendering, and misaligned rankings. Search engines see the word « gran », think « grande » in Spanish, and forget that « gran » also means « grand » in Catalan.

This bias affects other regional languages too: Basque, Galician, Breton, Flemish, Walloon.

Why language confusion kills your AI SEO

The mechanism is straightforward. AI answer engines don’t reason by language. They search for the most authoritative source for an intent. If your semantic structure is unclear, the engine defaults to the region’s dominant language.

In Catalonia, Spanish dominates training data. Even if a query is in Catalan, internal language detection may favor Spanish content. Your Catalan pages become fuzzy duplicates to the algorithm.

Consequence: your Catalan pages aren’t cited in the AI Overview. They generate no clicks. They sink to pages 2 or 3, behind Spanish pages.

Across an 800-product catalog, that’s 26% of pages losing all visibility on their natural market. For my client, that meant 4,200 lost sessions per month on the Catalan version alone. Some €17,000 in potential revenue.

Voici le workflow exact que j’ai suivi pour corriger les erreurs de langue et reconstruire l’architecture sémantique. Chaque étape a contribué au rétablissement du trafic Catalan.

Les 5 étapes de la restructuration technique

De l’arrêt de la production aux premiers résultats

What I implemented on my client’s site

I stopped content production. We restructured.

The technical foundation.

The semantic silo. I built a language-by-language architecture: each language has its own hub pages, its own contextual internal links, its own entity tags (Organization, WebSite, Product). The « Catalan fashion » entity became an identified node for Google.

I enriched structured data with language in JSON-LD, and deployed satellite pages in pure Catalan (local blog, sizing guide). Goal: strengthen the language signal at semantic level, not just technical.

Voici les chiffres concrets obtenus après 4 mois de mise en œuvre. Le graphique compare les KPI clés avant et après la correction des erreurs de langue.

Impact de la correction linguistique sur le trafic Catalan

Avant vs après restructuration : +47% de clics organiques

Trafic IA Trafic classique

Results: +47% Catalan organic clicks

Three weeks after deployment, Google reindexed. Catalan pages reappeared in relevant SERPs. AI overviews began citing excerpts from our client instead of Spanish competitors.

Four months in, Catalan organic sessions jumped from 120 per day to 176 per day. +47%. Bounce rate dropped to 54%, down from 78%. Revenue from Catalan traffic climbed from €8,300 to €12,200 per month.

The local competitor hoarding top positions dropped 3 spots on major Catalan queries. No direct attack. Simply because Catalan pages were now recognized as Catalan.

And the cost? Under €4,000—including a partial internal link refresh and a semantic audit. Nothing like the €6,000 in wasted link building the year before.

Multilingual e-commerce: 3 checks to do right now

If you sell in Europe in 2 languages or more, this is your issue. Multilingual regions—Catalonia, Basque Country, Brittany, Belgium, Switzerland—process tens of thousands of misunderstood queries daily.

I often spot three blind spots in my clients’ setups:

1. hreflang misconfigured.
A check in Search Console is enough. A Catalan page with hreflang « es » is a Spanish page to Google. Verify language by language, page by page.

2. Content partially translated or similar across two languages.
If your Catalan version reuses 90% of Spanish content with minor tweaks, Google may miss the language difference. Content must be distinct. Translation alone isn’t enough—adapt it semantically.

3. No semantic silo per language.
Each language deserves its own thematic architecture. Otherwise, your pages cannibalize each other or get lumped under one dominant language by the engine. Internal linking must respect the target language.

Also account for AI overviews. They amplify language errors. A poorly sourced Catalan overview can confuse your customers. Customer trust takes a hit.

Selling in 2 languages or more?

I check directly whether your Catalan, Basque, or Flemish pages truly answer the right queries. A technical and semantic audit in under 60 minutes, no obligation.

Book a strategic call — 45 min

Frequently Asked Questions

Why does Google confuse Catalan and Spanish?

AI models train on massive corpora where Spanish dominates. Without clear signals—hreflang, distinct content, language markup—they confuse Catalan with Spanish, especially when pages look similar.

How do I verify my multilingual site isn’t penalized by language errors?

Go to Google Search Console, « Languages » section. Check if pages from one language are assigned to another. Cross-check with a crawl tool to spot hreflang inconsistencies.

Does this problem affect other minority languages?

Yes. Basque, Galician, Breton, Flemish, Walloon, and others. Whenever a language cohabits with a dominant language in the same zone, AI easily confuses them.

Are hreflang tags enough to fix it?

Tags are the technical baseline. But they’re not enough. You need a semantic silo per language, unique content, and structured markup that places the entity in the right language.

What is a semantic silo per language and how do I apply it?

It’s a network of hub pages, articles, and product sheets—one for each language—linked by exclusive internal linking. Each language has its own thematic entry points with no cross-links to other languages. Google compartmentalizes versions cleanly.

Stéphane Jambu

Stéphane Jambu

SEO & AI Engineer

I build growth systems / AI / Neuroscience | 650+ clients · 80 LinkedIn testimonials · 30 years of expertise · 15 years of systems running without me.

Follow on LinkedIn