**Cloudflare has rolled out a new Content Signals Policy, giving publishers a way to spell out how their work can be used after it’s been accessed. It’s a small addition to robots.txt, but one with potentially huge implications in a web increasingly powered by AI and dominated by bots.**

![Cloudflare Wants to Put Website Owners Back in Charge of Their Content](https://www.stanventures.com/news/wp-content/uploads/2025/09/Web-Browser-with-Robots.txt-Details.avif)

On September 24, 2025, Cloudflare introduced its [Content Signals Policy](https://blog.cloudflare.com/content-signals-policy/), a new layer of communication for the web. It builds on the decades-old robots.txt file.

[Robots.txt](https://www.stanventures.com/blog/robots-txt-guide/), for all its usefulness, has always been about which parts of a site can or cannot be crawled. What it never offered was guidance on what happens after the content is taken. 

The explosion of AI has exposed how little control publishers actually have once their content is scraped. Cloudflare’s new policy is meant to fill that gap, putting consent and expectations into clearer terms.

The new system introduces machine-readable “signals” that go beyond simple allow/deny access. These signals let a website operator explicitly declare whether their content can be used for search indexing, as input for AI models, or for training those models.

In effect, Cloudflare is giving publishers a way to speak a language machines understand, while keeping humans informed as well.

## Why This Moment Feels Different

Scraping has been part of the internet since search engines first began indexing websites, and most publishers accepted the trade. Let the bots crawl, and in return, you got visitors, credit, maybe even a steady stream of business.

That exchange is fraying in the age of AI. 

Content that once brought people back to the source is now swallowed into training sets, powering tools that answer questions without ever sending a click. The recipe, the tutorial, the poem, all can be repackaged by systems that edge out the creator.

Cloudflare warns that this isn’t a minor annoyance. 

Within a few years, automated traffic is expected to eclipse human visitors entirely, and by the early 2030s, bots alone could surpass the volume of today’s internet. For small sites, the cost of feeding that machine can feel unsustainable.

Publishers describe it as being cornered: close off your work and lose the audience, or leave it open and see it siphoned away. 

Cloudflare’s new policy is pitched as a release valve, a way to mark out limits without shutting the door completely.

## What the New Signals Mean

The Content Signals Policy introduces three categories that publishers can use to declare their preferences:

- **Search** – Can your content be indexed and shown in search results?
- **AI-input** – Can your content be used as live input to AI models that retrieve and repackage material in real time?
- **AI-train** – Can your content be used to train or fine-tune AI models?

The Content Signals Policy has two layers. 

First, Cloudflare provides a block of # comments inside robots.txt. These comments don’t affect bots, they’re ignored, but they give people a plain-English explanation of what each signal means. 

![Cloudflare provides a block of # comments inside robots.txt.](https://www.stanventures.com/news/wp-content/uploads/2025/09/Screenshot-2025-09-25-160556.avif)

Then comes the machine-readable part: the actual Content-Signal lines that bots are expected to follow.

![Content-Signal lines that bots are expected to follow](https://www.stanventures.com/news/wp-content/uploads/2025/09/Screenshot-2025-09-25-160407.avif)

This tells bots: “Index me for search, but don’t use me to train your AI.”

There’s no lock on the door here. Anyone intent on taking content can still do so. But the policy gives publishers something they haven’t had before, that is, a simple way to declare their intentions, in a format that’s easy for people to read and harder for companies to shrug off if challenged.

Not everyone is convinced the signals will carry weight. Search industry analyst Glenn Gabe called the move “big news, but pretty rough right now,” pointing out that Google hasn’t committed to following the instructions. He also raised questions about how the robots.txt directives would be structured.

 

> Big news, but sounds pretty rough right now (and Google hasn’t said they would follow the instructions). Also, what are the robots.txt directives for this?? Be careful -> Cloudflare Sets Up a Fight Over Google’s AI Overviews Access
> “Cloudflare, which says it powers 20% of the… [pic.twitter.com/zSjZwjLbF1](https://t.co/zSjZwjLbF1)
> — Glenn Gabe (@glenngabe) [September 24, 2025](https://twitter.com/glenngabe/status/1970891634104479877?ref_src=twsrc%5Etfw)

 

Lily Ray was even more skeptical:

 

> It’s a nice idea in theory… but… what? [https://t.co/AhyaUkXTm2](https://t.co/AhyaUkXTm2)
> — Lily Ray 😏 (@lilyraynyc) [September 24, 2025](https://twitter.com/lilyraynyc/status/1970963844022596085?ref_src=twsrc%5Etfw)

 

## How Cloudflare Is Rolling It Out

Cloudflare has folded the new signals into its managed robots.txt feature, already used on more than 3.8 million domains. 

By default, those sites will allow search engines to index their pages but block AI training. The “AI-input” signal is left blank, leaving the choice to site owners.

Free-plan customers who don’t have a robots.txt file will now see an explanatory note appear when one is requested. It doesn’t set any rules but lays out what the signals mean, encouraging owners to decide for themselves. 

Paid users can set their own preferences in the Cloudflare dashboard or by copying text from ContentSignals.org.

To push adoption beyond its own network, Cloudflare has released the policy under a CC0 license. That means anyone can use it, even without Cloudflare services. 

The hope is that search engines, AI companies, and regulators will recognize it as a common standard.

## A Return to Old Internet Values

Cloudflare’s framing harks back to the blog era, when the simple act of linking carried real weight. 

Recognition was its own reward, and licenses like Creative Commons codified the principle that sharing was fine if credit stayed attached. That bargain feels broken now. 

Creators see their work pulled into datasets, their names lost, and their output repackaged by tools that may even rival them. 

Cloudflare is pitching its new signals as a way to bring choice back into the mix — not an all-or-nothing decision, but the ability to set boundaries.

## The Catch: Signals Need Respect to Matter

Of course, signals only work if they’re honored. Robots.txt has always relied on voluntary compliance, and the same will be true here. Companies committed to ethical scraping may follow the signals, but there’s no guarantee that everyone will.

That’s why Cloudflare stresses that content signals are preferences, not enforcement mechanisms. If a publisher wants stronger protection, they’ll need to combine signals with other tools, such as bot management or firewalls.

Still, the existence of a standard matters. Even if some ignore it, a clear framework gives lawmakers, industry groups, and courts a reference point. It also creates a baseline expectation: if a crawler disregards an explicit “no,” the violation is harder to defend.

## Why It Could Matter Beyond Cloudflare

The new policy carries weight beyond its lines of code. It raises the question of who controls how content circulates on an internet increasingly shaped by AI. 

Creators may see it as a chance to stay accessible without giving up everything. AI developers are being asked to acknowledge the sources they depend on. 

Lawmakers could view it as a ready-made framework to test in copyright and data disputes. 

Cloudflare is upfront that this is just a first move, and whether it takes hold depends on how many players decide the signals are worth respecting.

## What Publishers Can Do Right Now

If you run a website and want to make use of the new policy, here’s a simple approach:

1. **Check your robots.txt file.** If you don’t have one, create it.
2. **Decide where you stand.** Are you comfortable with search indexing? Do you want AI models touching your content at all?
3. **Generate your signals.** Tools like ContentSignals.org make it easy.
4. **Use layers of defense.** Signals are guidance, not enforcement. Pair them with other protections if needed.
5. **Watch the conversation.** Standards evolve quickly. What feels optional today could become the norm tomorrow.

## Key Takeaways

- Cloudflare has launched a Content Signals Policy, extending robots.txt to include usage preferences.
- The system introduces three signals—search, AI-input, and AI-train—that can be set to yes or no.
- The policy aims to give website owners more control at a time when bot traffic is exploding.
- Compliance is voluntary, so signals work best when paired with other protections.

Cloudflare has made the system free for anyone to adopt, hoping it becomes an internet-wide standard.