GitHub

Summary

Yes, you can track brand mentions in AI search: SparkToro's 600-volunteer study shows single AI answers vary wildly, but a fixed prompt set sampled repeatedly across ChatGPT, Gemini and Perplexity turns mention rate into a reliable metric you can set up in five steps.

Yes — it is possible to track brand mentions in AI search. The method is sampling: because ChatGPT, Gemini, Perplexity, Grok and Google AI Overviews generate probabilistic answers, you don't track them the way you track a Google ranking. You run a fixed set of buying questions repeatedly, across engines and phrasings, and measure how often your brand shows up — and the seemingly random answers converge into a stable visibility metric. Here's how that works, what the numbers can and cannot tell you, and a five-step setup you can finish in an afternoon.

Key takeaways

Brand mentions in AI search are trackable by sampling: run a fixed prompt set repeatedly and measure mention rate as a percentage, not a rank position.
A 600-volunteer study by SparkToro and Gumshoe.ai found that brand rankings inside AI answers vary almost randomly between runs, while whether a brand appears at all stays consistent enough to measure.
OpenAI offers no search-console equivalent, so impression counts don't exist. Mention rate, citation rate and Share of Model are the proxy metrics that fill the gap.
Manual spot-checking gives you a first snapshot, but personalization, memory and fan-out queries make single-run checks unreliable within weeks.
The stakes are real: ChatGPT passed 800 million weekly users in October 2025, and Adobe measured a 1,200% surge in generative-AI-sourced traffic to US retail sites.

Why you can measure something that changes every run

Ask ChatGPT "what's the best cordless vacuum for pet hair" twice and you'll get two different answers. Large language models sample from a probability distribution; some randomness is built in by design. On top of that, engines now run fan-out retrieval — Google AI Mode and ChatGPT decompose your one question into several hidden sub-queries, each pulling different sources, so even the evidence behind the answer rotates between runs.

This is exactly why single checks mislead. When SparkToro and Gumshoe.ai had 600 volunteers ask twelve identical questions across ChatGPT, Claude and Google's AI results, the study found that brand rankings inside answers were close to random — but whether a brand appeared at all was far more stable. Rand Fishkin's practical read: any tool selling you a precise "AI rank position" is overselling, while a visibility percentage built from repeated runs is legitimate measurement.