ChatGPT May Scrape Google, but the Results Don’t Match
We know that AI assistants like ChatGPT access search indices, like Google and Bing, to retrieve URLs for their response. But how, exactly?
To find out, we’ve run a series of experiments looking at the relationship between the URLs cited by AI assistants, and the results found in Google when searching for the same topics.
So far, we’ve tested long-tail prompts (very long, very specific queries just like those you’d enter into ChatGPT); fan-out queries (mid-length prompts that relate to the original long-tail prompt); and today we’re testing short-tail keywords—ultra-short, ultra-specific “head” terms.
Short-tail keywords offer the clearest illustration of how AI citations track with Google results.
Based on three separate studies, our conclusion is that ChatGPT (and similar systems) don’t just lift URLs directly from Google, Bing, or other indexes. Instead, they apply additional processing steps before citing sources.
Even when we examined fan-out queries—the actual search prompts these systems send to search engines—the overlap between AI and search engine citations was surprisingly low.
In other words, while ChatGPT may pull from Google’s search index, it still appears to apply its own selection layer that filters and reshuffles which links appear.
It’s therefore not enough to identify fan-out queries and rank well for them—there are additional factors influencing which URLs get surfaced, that are outside of a publisher’s control.
Xibeijia Guan analyzed citation overlap between AI and search results for informational long-tail and fan-out prompts, using Ahrefs Brand Radar.
This time, she has taken a sample of 3,311 classic SEO-style head terms, covering informational, commercial, transactional, and navigational intent.
Example query
Informational
Commercial
Transactional
Navigational
1
cincinnati bearcats basketball
best credit card rewards
pools for sale
onedrive sign in
2
protein in shrimp
soundbar for tv
shop girls dress
verizon customer support
3
what is cybersecurity
at home sauna
buy a domain
costco toilet paper
Each keyword has been run through ChatGPT, Perplexity, and Google’s top 100 SERPs to analyze citation overlap between AI and search.
OpenAI and Perplexity have been scraping Google results via a third-party provider.
It’s possible we’d see more overlap if our study focused only on ‘real-time’ queries (e.g., news, sports, finance), since those are reportedly the kinds ChatGPT scrapes Google for.
ahrefs.com/writing-tools/, while ChatGPT finds a better “fit” on ahrefs.com/blog/ and cites another.
If true, this reinforces the value of creating cluster content—optimizing multiple pages for different topic intents, to have the best chance of being found.
Another possibility is that both lean on the same pool of authoritative domains, but disagree on arbitrary pages.
Assess your cluster content in AI and search
You can check the SEO performance of your cluster content in the Related Terms report in Ahrefs Keywords Explorer.
This will show you if and where you rank across an entire cluster of related keywords.
Just add a Parent Topic filter, and a Target filter containing your domain.
Once you’ve done that, head to Ahrefs Brand Radar to check on the AI performance of your cluster content.
Run individual URLs through the Cited Pages report in Ahrefs Brand Radar to see if your cluster content is being cited by AI assistants like ChatGPT, Perplexity, Gemini, and Copilot.
Work out if any content is missing from either surface, then optimize until you’ve filled those gaps and enriched the overall cluster.
You can use topic gap recommendations in Ahrefs’ AI Content Helper to help with this.
studied by SQ and Xibeijia) show the least overlap. They match only 6.82% of Google’s top 10 results.
We’re not comparing apples-with-apples here. These percentages represent different studies, and different sized datasets.
But each study produces similar findings: the pages that ChatGPT cites don’t overlap significantly with the pages that Google ranks. And it’s largely the opposite for Perplexity.
SQ seems the most probable one to me:
“ChatGPT likely uses a hybrid approach where they retrieve search results from various sources, e.g. Google SERPs, Bing SERPs, their own index, and third-party search APIs, and then combine all the URLs and apply their own re-ranking algorithm.”
Whatever the case, search and AI are shaping discovery side-by-side, and the best strategy is to build content that gives you a chance to appear on both surfaces.
We analyzed 900,000 newly created web pages in April 2025 and found that 74.2% of them contained AI-generated content. At Ahrefs, our machine learning team has built an AI content detector (codenamed bot_or_not). We’re about to release the AI content detector for Ahrefs customers to use, so we decided to put it through its paces…
Search is changing. People are no longer just “Googling it.” Increasingly, they’re asking AI systems for answers and getting them instantly without having to click and sift through different websites. SEO is still essential for brands to show up in these answers, but visibility now depends on more than rankings. What matters is whether your…
Content that stays current stays visible. Your content’s publish date can be a competitive advantage or a liability. Fresh content or content freshness simply means how new or recently updated a page is. Keeping your content up to date tells Google and AI assistants that your information is current, helpful, and reliable, which helps you…
Every introduction to coding starts with a “Hello, World!” example, right? With Drupal, it’s a bit more complex than just echo “Hello, World!”. To follow Drupal best practices, we should provide content from our custom code in a way that allows a site administrator to choose where and when it’s shown, instead of hard-coding those…
I don’t think this result will come as a surprise to anyone. Websites that get more traffic in traditional organic search also get mentioned more in AI Search. Popular sites are popular, even if the search system changes. I looked at the top 50 websites mentioned in Ahrefs Brand Radar for Google AI Overviews, ChatGPT,…
MCP (Model Context Protocol) connects your AI tools directly to your marketing stack—your CMS, analytics, CRM, social platforms, and more—through one standardized connection. This means you can ask: ‘Show me which blog posts lost traffic last month, what keywords they rank for in Ahrefs, and how many support tickets mentioned those topics in Intercom’—and get…