Contact Us About Us
Log In
7 min read

Cloudflare vs. Perplexity: The Battle Over AI Crawling and Robots.txt Explained

View as Markdown

Cloudflare has officially delisted and blocked Perplexity AI from crawling websites via its infrastructure, citing β€œstealth crawling” practices and deceptive bot behavior.

Cloudflare vs. Perplexity

According to Cloudflare, the AI assistant service repeatedly violated its Verified Bots policy by ignoring robots.txt directives, rotating IP addresses and spoofing user agents to disguise its crawlers as legitimate human traffic.

That is a serious accusation. But is it entirely accurate? Or is this a larger battle about how the web treats modern AI assistants?

What Exactly Happened?

Cloudflare, one of the most widely used web infrastructure and security providers on the internet, maintains a Verified Bots Program that allows trusted bots to access websites without interference.

To stay verified, these bots must play by the rules primarily, obeying the robots.txt protocol, identifying themselves properly via IP addresses and user agents and avoiding deceptive crawling tactics.

But Perplexity, according to Cloudflare, was not playing fair.

In a blog post, Cloudflare revealed it had been receiving complaints from website owners about suspicious bot activity coming from Perplexity.

Following an investigation, Cloudflare found the AI assistant was not only bypassing site restrictions but also actively disguising itself to sneak in undetected.

The result? Perplexity was delisted as a verified bot and blocked across Cloudflare’s vast network of protected websites.

Here is the exact statement Cloudflare gave:

β€œBased on Perplexity’s observed behavior, which is incompatible with [webmaster] preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.”

But what exactly qualifies as stealth crawling?

 

The Alleged Violations

Cloudflare accused Perplexity of engaging in several tactics designed to evade detection:

1. Rotating IP Addresses & ASN Switching

Perplexity’s official crawlers are expected to use a known range of IP addresses from a specific ASN (Autonomous System Number).

Rotating IP Addresses & ASN Switching in Perplexity

Instead, Cloudflare alleges the service used a variety of undeclared IPs from unrelated ASNs which make it impossible to trace or block them reliably.

2. User-Agent Spoofing

Even more concerning, according to Cloudflare, was that Perplexity’s bots started disguising themselves as ordinary human browsers.

User Agent Spoofing in Perplexity

One example: the user agent string used was:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36

This string mimics a user browsing with Chrome on a Mac and a deliberate attempt to avoid bot detection filters, Cloudflare says.

3. Ignoring robots.txt

Perhaps the biggest violation of trust: ignoring robots.txt.

This is the industry standard file used by websites to instruct bots what they can and can’t crawl.

Perplexity, Cloudflare claims, bypassed these instructions entirely.

Why It Matters: The AI Web Crawl Debate

The AI Web Crawl Debate

Here is where things get more worth understanding.

Perplexity AI is not denying the traffic occurred.

But they are challenging the entire premise of the accusation. Their argument? β€œWe are not crawling like Google. We are acting on behalf of users.”

Let’s understand that.

Perplexity says its system only fetches web content in real-time, when a user asks a question.

Unlike traditional crawlers (like Googlebot or Bingbot), which preemptively index billions of pages and store that content in massive databases, Perplexity claims it only pulls content once just long enough to summarize it and display an answer.

So, is that a crawler? Or is that more like a digital assistant, doing what you told it to do?

Perplexity compares its approach to Google’s user-triggered fetchers for example, when Google reads a webpage aloud on Android or when it verifies your site with Search Console.

These fetches also bypass robots.txt and they do not store or reuse content.

Perplexity’s Response: β€œThis is a Misunderstanding”

In a lengthy technical rebuttal titled β€œAgents or Bots? Making Sense of AI on the Open Web,” Perplexity laid out its side of the story:

β€œModern AI assistants work fundamentally differently from traditional web crawling.

When you ask Perplexity a question, the AI does not already have that information sitting in a database.

Instead, it fetches it in real-time and uses it immediately to answer your question.”

Perplexity argues that:

  • Their agents only fetch content in response to user prompts.
  • The content is not stored, indexed or used to train models.
  • Their system behaves like a browser or RSS reader and not like a crawler.

They also pointed fingers back at Cloudflare, claiming:

  • Cloudflare misattributed traffic from third-party services (like BrowserBase) to Perplexity.
  • Their blocking decision was based on fundamentally flawed technical analysis.
  • They were possibly used as a scapegoat to generate PR for Cloudflare.

In their words:

β€œWhen you misattribute millions of requests, publish inaccurate diagrams, and misunderstand how AI assistants work, you’ve forfeited any claim to expertise in this space.”

Who Owns Access to the Open Web?

This debate is not just about Perplexity vs. Cloudflare. It is about the future of information access.

If AI tools are blocked from fetching real-time content, even when requested by users does that mean only the biggest players like Google and Microsoft can crawl the web?

Will smaller, independent AI startups be denied access entirely?

If every AI agent is treated as a rogue bot, what happens to innovation in the open web?

This is the core of Perplexity’s argument: if user-initiated agents are misclassified as bots, we’re closing the door to a more dynamic and personalized web experience.

As one Perplexity engineer put it, β€œImagine your email client being blocked because it fetched a newsletter.”

Real-World Examples to Understand The SituationΒ 

Say you are using Perplexity to look up restaurant reviews in your area.

You ask, β€œWhat are people saying about that new bistro in Brooklyn?”

Perplexity fetches relevant content from recent blog posts, Google reviews and local forums. Within seconds, it summarizes the tone: β€œMixed reviews praised for ambiance, criticized for wait times.”

That info was fetched on demand and not stored. It means no indexing, no training and just helping you, the user.

Cloudflare would label that behavior as rogue bot activity and unless your tool is on its verified list.

Perplexity says that is the problem that today’s web infrastructure is not built to tell the difference between real-time AI assistants and malicious scrapers.

So, Who is Right?

That depends on where you stand.

  • Cloudflare argues: rules are rules. If you bypass robots.txt, spoof your identity, and crawl from unlisted IPs, you can not be trusted. Their job is to protect website owners and in their eyes, Perplexity broke that trust.
  • Perplexity argues: this is a misunderstanding. Their agents are tools acting on behalf of users, not autonomous bots. Blocking them limits innovation, access and the democratization of real-time information.

A Web at a Crossroads

This incident raises deeper questions about the future of the web:

  • Should AI agents follow the same rules as search engine bots?
  • Is the robots.txt file enough to govern the modern AI-driven internet?
  • Who gets to decide what constitutes legitimate access?

What we are seeing here is not just a tech dispute but a value conflict between open access and gatekeeping, between traditional infrastructure and next-gen intelligence.

One thing is clear: the lines between bots, browsers, assistants and agents are blurring fast. And if infrastructure providers can not keep up, it may not be the bots that sufferβ€”it could be the users.

 

Dileep Thekkethil

Dileep Thekkethil is the Director of Marketing at Stan Ventures, where he applies over 15 years of SEO and digital marketing expertise to drive growth and authority. A former journalist with six years of experience, he combines strategic storytelling with technical know-how to help brands navigate the shift toward AI-driven search and generative engines. Dileep is a strong advocate for Google’s EEAT standards, regularly sharing real-world use cases and scenarios to demystify complex marketing trends. He is an avid gardener of tropical fruits, a motor enthusiast, and a dedicated caretaker of his pair of cockatiels.

Keep Reading

Related Articles

Link Building Vendor Scorecard
Built from auditing 40+ vendors
⏸️

Wait. You're This Close to Your Score.

You've answered several out of 20 questions. Just a few more and you'll see your full vendor scorecard.

If you leave now, you won't see how your vendor stacks up against industry standards, where your biggest risk gaps are, or what your peers are doing differently. Finish the last few questions to unlock your complete report.