The AI Bots That ~140 Million Websites Block the Most
AI bots power some of the most advanced technologies we use today, from search engines to AI assistants. However, their increasing presence has led to a growing number of websites blocking them.
There’s a cost to bots crawling your websites and there’s a social contract between search engines and website owners, where search engines add value by sending referral traffic to websites. This is what keeps most websites from blocking search engines like Google, even as Google seems intent on taking more of that traffic for themselves.
When we looked at the traffic makeup of ~35K websites in Ahrefs Analytics, we found that AI sends just 0.1% of total referral traffic—far behind that of search.
I think many site owners want to let these bots learn about their brand, their business, and their products and offerings. But while many people are betting that these systems are the future, they currently run the risk of not adding enough value for website owners.
The first LLM to add more value to users by showing impressions and clicks to website owners will likely have a big advantage. Companies will report on the metrics from that LLM, which will likely increase adoption and prevent more websites from blocking their bot.
The bots are using resources, using the data to train their AIs, and creating potential privacy issues. As a result, many websites are choosing to block AI bots.
We looked at ~140 million websites and our data shows that block rates for AI bots have increased significantly over the past year. I want to give a huge thanks to our data scientist Xibeijia Guan for pulling this data.
Cloudflare Radar.
There is a moderate positive correlation between the request rate and the block rate for these bots. Bots that make more requests tend to be blocked more often. The nerdy numbers are 0.512 Pearson correlation coefficient, p-value of 0.0149, and this is statistically significant at the 5% level.
Here’s the data for the overall blocks:
Here is the total number of websites blocking AI bots:
Here’s the data:
Bot Name
Count
Percentage %
Bot Operator
GPTBot
8245987
5.89
OpenAI
CCBot
8188656
5.85
Common Crawl
Amazonbot
8082636
5.78
Amazon
Bytespider
8024980
5.74
ByteDance
ClaudeBot
8023055
5.74
Anthropic
Google-Extended
7989344
5.71
Google
anthropic-ai
7963740
5.69
Anthropic
FacebookBot
7931812
5.67
Meta
omgili
7911471
5.66
Webz.io
Claude-Web
7909953
5.65
Anthropic
cohere-ai
7894417
5.64
Cohere
ChatGPT-User
7890973
5.64
OpenAI
Applebot-Extended
7888105
5.64
Apple
Meta-ExternalAgent
7886636
5.64
Meta
Diffbot
7855329
5.62
Diffbot
PerplexityBot
7844977
5.61
Perplexity
Timpibot
7818696
5.59
Timpi
Applebot
7768055
5.55
Apple
OAI-SearchBot
7753426
5.54
OpenAI
Webzio-Extended
7745014
5.54
Webz.io
Meta-ExternalFetcher
7744251
5.54
Meta
Kangaroo Bot
7739707
5.53
Kangaroo LLM
It gets a little more complicated. For the above, we looked at the main robots.txt file for a website, but every subdomain can have its own set of instructions. If we look at the ~461M robots.txt in total, then the total block % for GPTBot goes up to 7.3%.
AI bot blocks over time
More top-trafficked sites began blocking AI bots in 2024, but the trend is decreasing towards the end of the year. It looks like the decrease mostly comes from generic blocks. The trend for AI bots themselves is increasing and I’ll show you that in a minute.
Do certain types of sites block AI bots more?
Here’s how it breaks down for each individual bot in different categories of websites. I was actually expecting news to be more blocked than other categories because there were a lot of stories about news sites blocking these bots, but arts & entertainment (45% blocked) and law & government (42% blocked) sites blocked them more.
The decision to block AI bots varies by industry. There can be a number of unique reasons for this. These are somewhat speculative:
Arts and Entertainment: ethical aversions, reluctance to become training data.
Books and Literature: copyright.
Law and Government: legal worries, compliance.
News and Media: prevent their articles from being used to train AI models that could compete with their journalism and take away from their revenue.
Shopping: prevent price scraping or inventory monitoring by competitors.
Sports: similar to news and media on the revenue fears.
Over the past year, we interviewed top SaaS founders and marketing executives on the Ahrefs Podcast. We asked them about strategies that helped them scale. From Crypto.com to Airwallex, Paddle, and Surfer, these battle-tested methods have fueled growth in competitive markets with limited resources. In this article, we share their best insights so you can apply…
Many marketers hit limits not because they lack ideas, but because they can’t execute or scale them. Not everyone has the luxury of developers, designers, or analysts who can help turn big ideas into reality. Even if you do, you still need to compete for time and help with the rest of your organisation. But times…
Tools and Tips to Successfully Organize Your Small Business Running a small business involves juggling multiple tasks, from managing finances to handling customer relationships. Staying organised is essential for efficiency, productivity, and growth. Below is a detailed guide to essential tools and actionable tips to help streamline your operations effectively. Essential Tools to Organize Your…
We’ve attended dozens of SEO conferences and found some standout events that consistently deliver value, insight, and real connections. Here are the ones we think are worth attending in 2025. Conference When Location BrightonSEO, US Sep. 23 & 24, 2025 San Diego, USA WTSFest Philadelphia Oct. 7, 2025 Philadelphia, USA Ahrefs Evolve ⭐️ Oct. 14 & 15, 2025 San…
[embedded content] This week, at DrupalCon Barcelona 2024, Drupal project founder Dries Buytaert announced a new sponsorship program for Drupal CMS documentation. Drupal CMS is the official name for Drupal Starshot, a new no-code CMS product for non-technical users being built on top of Drupal core. At Drupalize.Me, we’ve been talking with Dries and folks…
Here are two ideas that can be true at the same time: SEO offers worse returns than it used to. SEO is still one of the best marketing channels. SEO really has changed: Talk to website owners, and you’ll even hear whispers that some companies are losing 20–40% of their monthly clicks from search. 1…