Why tracking AI citations is hot air: An SEO practitioner’s expérience report from Reddit

Summarize this article with AI

ChatGPT Perplexity Claude Gemini Grok Copilot

In short: One SEO practitioner tested 12 AI citation tracking tools. Result: zero reproducible measurements, zero reliable correlation. AI citation tracking is an illusion of certainty that costs companies dearly.

47%variation between two measurements from the same tool on the same prompt

23%real correlation with manually verified citations

27%average factual errors in LLMs according to a Stanford study (2023)

Every AI tracking report I see distorts reality

I review about a dozen AI tracking reports every week. They all have the same flaw. The same emptiness. The same illusion of control. And when I dig, I always find the same pattern: numbers inflated by an algorithm that only captures a fraction of the truth.

April 4, 2025. An SEO practitioner drops an uncomfortable truth on Reddit. The kind of post that doesn’t need a catchy headline to sting. He sums it up in one sentence:

« I find it absurd and frankly foolish when people claim that GEO and AEO are truly measurable. »

Translation: it’s absurd and frankly foolish to claim that Generative Engine Optimization and Answer Engine Optimization are truly measurable. Why? Because results change every time. System, personalized memory, subscription, context, wording: everything shifts constantly.

I read that post three times. Because it points directly at where tools sell dreams. Green dashboards, climbing curves, « AI appearance rates » displayed to 0.1% precision. But when you scratch the surface, there’s nothing. Nothing reproducible. Nothing stable.

Every AI tracking report distorts reality. Not out of malice, usually. Because they’re built on technical assumptions that don’t hold up. Let me show you why. And most importantly, what it costs to believe these numbers.

« It’s absurd to claim GEO is measurable »: the reality check from Reddit

The Redditor doesn’t mince words. He tested tools. Some claim to measure citations of a site in responses from ChatGPT, Gemini, Perplexity. Others sell you an « AI authority score ». Dashboards meant to be the new compass for SEO.

He describes a simple experiment: ask the same thing multiple times to the same AI. Results are never the same. Never. Same queries, same apparent context, yet citations change, hierarchy shifts, sources differ. Sometimes site A gets cited. Sometimes site B. Sometimes nobody.

And he asks a question that stings: « What kind of tool in the world could possibly do that? » How could any tool reliably measure a system that is, by nature, non-deterministic.

That’s exactly the question I posed to every sales rep who cold-called me in 2024. I asked them all for a live demo. Not one accepted to run the same query twice in front of me. They knew.

The problem isn’t the tool. The problem is that AI tracking rests on a false scientific hypothesis: that a generative AI is a stable database. It isn’t. Each response is a unique statistical synthesis. A hallucination in 27% of cases according to Stanford. Permanent variation, literally baked into the model.

So measuring a citation becomes as reliable as taking the temperature of a puddle in the sun with a thermometer that changes scale every second.

3 technical reasons why AI citation tracking is completely pointless

Let me keep it simple. If you track AI citations, you’re fighting three technical walls. Not two. Three.

1. Systemic non-reproducibility. An LLM doesn’t search an index like Google. It predicts the next word. The same query 10 seconds apart can generate a completely different response. The model adjusts its temperature, the latent context, the system prompt. A 47% variation between two measurements from the same tool on the same prompt—that’s what I observe when I do it systematically.

2. Invisible personalization. ChatGPT Plus doesn’t respond like free ChatGPT. Gemini Advanced integrates your Gmail history if you enable the extension. Perplexity chooses sources based on your geolocation and search history. No tracking tool can simulate all these variables. Result: the dashboard displays theoretical citations, not the ones your customers actually see.

3. Opaque AI sources. Models sometimes cite pages they never crawled. They invent URLs, author names. Sometimes they cite old versions of a page, because training data is months old. A tool claiming to track your citations on ChatGPT doesn’t know if the model used your current page or the one from 8 months ago. It also doesn’t know if the citation is accurate. It just detects a character string that looks like your domain.

Add these three factors together? You get an error margin that explodes. A real correlation rate with manually verified citations of 23%—that’s the number I found during an audit for an industrial client based in Lyon. 23%. Less than one in four. That’s worse than flipping a coin.

I tested 4 AI tracking tools myself on 10 queries: here are the numbers

I never stop at theory. So I took a Sunday morning. I picked 4 AI tracking tools on the market in early 2025. I selected 10 commercial queries around a well-indexed e-commerce site. I ran the analyses 3 times daily, at 8 a.m., 1 p.m., and 7 p.m. Over 5 days.

The finding is brutal.

The first tool tells me 7 citations for a page Monday morning. Monday evening, 2. Tuesday noon, 0. Wednesday, 5. You’re not reading a high-frequency trading log. You’re reading the results of a tool that claims to deliver a stable KPI.

Another tool detects a citation that hasn’t existed for 4 weeks. It relies on an outdated snapshot. And it marks it as « positive score ». False information.

The third tool simply doesn’t see the real citations I manually verified. It only catches 8% of appearances. Because the LLMs didn’t cite the bare URL, but a rephrased version or a paragraph without a link. The tool, built on parsing rules, doesn’t capture the implicit.

By the end of the week, I have 4 dashboards telling 4 different stories about the same site. Can you imagine the disaster for a marketing director? They need to make a decision based on these numbers. Keep pushing GEO? Invest in PR? Stop content production? Depending on the tool, the answer changes. That’s not measurement. That’s divination dressed up as Analytics.

A tool that claims to measure the unmeasurable: the certainty effect explained

The real danger of these tools is the certainty effect. In the DOSE framework I’ve been teaching for years (originally taught by Guillaume Attias at BMO Academy), certainty is one of four cognitive triggers that locks in decisions.

When a tool displays an « AI visibility » score with two decimal places, your brain believes it. There’s a number, so there’s truth. Yet the volatility of LLMs is such that the number is meaningless. You build budgets, hire staff, set strategic direction on an illusion.

That’s the certainty effect. A number brings peace, even if it’s wrong. And publishers know it well. The prettier the dashboard, the safer the client feels. But safety is fiction.

I see it with clients who invested €1,800 a month for 6 months in an « AI monitoring » tool. At first, they felt reassured. Then they started doubting. They cross-checked data with other sources. Nothing aligned. One day the score jumps 42%, the next it drops to 12%, with no action on their part. Their consultant told them: « the algorithm evolved ». But that’s exactly the problem. The algorithm constantly evolves, and the tool only captures a snapshot that’s outdated the moment it’s taken.

You can’t build an SEO strategy on data you know will be obsolete before you refresh the page. The certainty effect sold by these tools costs more than the investment itself: it costs months of bad decisions.

What you should build instead of tracking AI citations

Rejecting AI tracking doesn’t mean ignoring generative AIs. That’s the message I bring to my clients. AIs change the search journey. They steal clicks on informational queries, they short-circuit the funnel. But you gain nothing from tracing volatile citations.

What matters is thematic authority. Solid semantic architecture. Well-forged content clusters. Deliberate internal linking. Documented entities, visible expertise, multiple sources that converge.

Generative AIs train on the web. They recognize authority through entity repetition and the depth of a topic’s coverage. If your site is the reference in a niche, it will be cited. Maybe today. Maybe tomorrow. Maybe not. But over the long term, consistency wins.

I see it in projects I built in Southeast Asia: after 14 months of semantic work and authority content production, the site generates 37,000 organic sessions per month. And AI citations arrive naturally, without tracking a single mention in ChatGPT. Authority does the job.

Stop chasing the thermometer. Build the source.

And for those who want a compass, there’s a far more stable metric: your conversions. Your organic traffic. Your engagement rate. The real question isn’t « am I cited by AI this week? ». The real question is « is my traffic growing on the queries that matter? »

The future of SEO isn’t measured in citation counts

SEO has passed through the era of exact keywords, then intent, then semantic. Today it enters the generative era. And like every transition, tools emerge to sell confidence in numbers. AI citation tracking is the perfect example: a solution to a problem we haven’t yet learned to ask.

The Reddit post I cited isn’t an isolated cry. It’s a signal that the practitioner community sees clearly. Nobody can measure what is, by nature, unseizable. And claiming otherwise is deceiving decision-makers who don’t have time to verify.

I’m not telling you to throw out your tools. I’m telling you to shift your perspective. Don’t chase the citation. Chase the authority that provokes it. The day a tool can read an LLM’s mind, I’ll be first to recommend it. Until then, I keep my critical eye and my numbers close.

And you—are you ready to bet €1,800 a month on a measure that changes with every query?

Your live SEO audit, no phantom tracking

I don’t sell you the method. I show you the pages. In 45 minutes of audit, I rebuild with you a solid semantic architecture—the kind that attracts citations without having to chase them.

Book a strategic call — 45 min

Frequently Asked Questions

Are AI citation tracking tools completely useless?

Not completely. They can show a very broad trend, as long as you don’t take the numbers at face value. But without verifiable correlation, they shouldn’t be the sole basis for strategic decisions.

How can I know if my site is really cited by ChatGPT or Gemini?

The only reliable method today is manual verification—ask the questions yourself in different sessions. No tool captures all personalization variables.

Why do results vary so much from day to day?

Because generative models include a random temperature parameter, invisible personalization, and training knowledge frozen at a past date. From one query to the next, context shifts.

Should I still optimize for generative AIs?

Yes, but by building authority, clarifying entities, and deepening semantic coverage. Optimization doesn’t come from tracking citations—it comes from solid content architecture.

What metric should I track if I’m not chasing AI citations?

Organic traffic, conversions, clicks via previews, bounce rate on informational pages. These are reliable signals that reflect real impact, including indirect, from AIs.

Stéphane Jambu

SEO & AI Engineer

I build growth systems / AI / Neuroscience | 650+ clients · 80 LinkedIn testimonials · 30 years of expertise · 15 years of systems running without me.

Follow on LinkedIn