Contact Us About Us
Log In
7 min read

9 Robots.txt Rules Googlebot Ignores Even If You Add Them to Your File

A practical breakdown of every directive Googlebot quietly skips β€” and why your rankings are safe even if these lines sit in your file.

If you have ever opened your robots.txt file and wondered whether Googlebot actually obeys every line you wrote, you are not alone. Google supports only four fields in robots.txt: user-agent, allow, disallow, and sitemap. Everything else is silently ignored (Google Search Central documentation).

In April 2026, Google expanded its public list of unsupported robots.txt directives by adding six more frequently-used tags β€” content-signal, content-usage, domain, request-rate, revisit-after, and visit-time β€” alongside older entries such as crawl-delay, noindex, nofollow, noarchive, host, and clean-param. The good news? If you have any of these in your file, Googlebot has been ignoring them all along, and your site has not been penalised for it.

Here are the nine robots.txt rules Googlebot ignores that you genuinely do not need to lose sleep over β€” plus what to use instead when the underlying goal still matters.

1. crawl-delay

What it claims to do: Tells crawlers to wait a specified number of seconds between requests to your server.

Why you don’t need to worry: Googlebot has never officially supported crawl-delay. Google formally clarified this in its documentation in late 2024, and the behaviour has been consistent for years before that. Bing and Yandex still honour it, so leaving it in for those crawlers is fine.

What to do instead: If Googlebot is hammering your server, head to Google Search Console’s Crawl Stats report and adjust the crawl rate there. Better yet, optimise your hosting β€” a server that buckles under Googlebot is a server that will buckle under a traffic spike.

2. noindex

What it claims to do: Asks Google not to index a page directly from the robots.txt file.

Why you don’t need to worry: Google officially retired support for noindex in robots.txt on September 1, 2019. Even before that, it was never documented as a supported directive. Google’s own analysis at the time found that in 99.999% of files using it, the rule was contradicted by other directives β€” meaning it was hurting more sites than it helped.

What to do instead: Use the meta robots tag in the page’s HTML head:

<meta name=”robots” content=”noindex”>

Or send an X-Robots-Tag HTTP header for non-HTML files like PDFs. Critically, the page must be crawlable for Google to see the noindex rule β€” do not block it in robots.txt at the same time.

3. nofollow (in robots.txt)

What it claims to do: Instructs crawlers not to follow links from pages on your site.

Why you don’t need to worry: nofollow as a robots.txt directive was never part of the Robots Exclusion Protocol and Google retired its unofficial support alongside noindex in 2019. It only ever made sense at the link level or page level β€” not in robots.txt.

What to do instead: Use the meta robots tag with content=”nofollow” on a page-by-page basis, or add rel=”nofollow”, rel=”sponsored”, or rel=”ugc” to individual links.

4. host

What it claims to do: A Yandex-specific directive that signals the preferred mirror or canonical hostname for a site.

Why you don’t need to worry: Googlebot has never used the host directive. It is part of Yandex’s extension to the protocol, not the open standard. Even Yandex deprecated it in 2018 in favour of 301 redirects.

What to do instead: Use 301 redirects from non-preferred hostnames (like the www or non-www version) to your canonical domain, and set a clear rel=”canonical” tag in your HTML.

5. request-rate

What it claims to do: Sets a limit such as “1/10s” to indicate one request every ten seconds, sometimes paired with a time window.

Why you don’t need to worry: Google added request-rate to its public unsupported list in April 2026 β€” it has never been part of Google’s implementation, and Googlebot ignores it entirely. It originated in older crawler conventions but never gained mainstream adoption.

What to do instead: Same as crawl-delay: manage Googlebot’s crawl rate from Search Console and use server-side rate limiting or a CDN for any rogue bots that don’t respect any of this.

6. visit-time

What it claims to do: Specifies a UTC time window during which crawlers are “allowed” to visit your site β€” for example, only between 03:00 and 06:00 UTC.

Why you don’t need to worry: Googlebot pays zero attention to visit-time. It crawls when it decides to crawl, based on its own scheduling logic. Google added this directive to its unsupported list in the April 2026 update.

What to do instead: If you have legitimate concerns about server load during business hours, scale your hosting or use a CDN. Trying to schedule Googlebot is a losing game.

7. revisit-after

What it claims to do: Suggests how often a crawler should revisit a page β€” for example, “revisit-after: 7 days”.

Why you don’t need to worry: This was popular advice in mid-2000s SEO articles, but Google has never honoured it. Crawl frequency is determined by Google’s own systems based on signals like page importance, update frequency, and historical change patterns.

What to do instead: Submit an updated XML sitemap with accurate <lastmod> values, use the URL Inspection tool in Search Console to request re-indexing of important updates, and ensure freshly updated pages are surfaced through internal linking.

8. noarchive (in robots.txt)

What it claims to do: Asks Google not to show a cached version of a page in search results.

Why you don’t need to worry: noarchive is supported β€” but only as a meta tag or X-Robots-Tag header, not in robots.txt. Google’s cached link feature has also been retired, making this directive even less impactful than before.

What to do instead: If you still need it for compliance or legal reasons, use the meta tag:

<meta name=”robots” content=”noarchive”>

9. clean-param

What it claims to do: A Yandex-specific directive that tells crawlers to ignore certain URL parameters when indexing, helping consolidate duplicate content.

Why you don’t need to worry: Googlebot completely ignores clean-param. It is exclusive to Yandex. Google retired its own URL parameter handling tool in 2022 and now relies entirely on canonical signals to handle parameter-based duplication.

What to do instead: Use rel=”canonical” tags pointing to your preferred URL version, configure parameter handling in your CMS or framework, and apply 301 redirects where appropriate.

Bonus: Misspellings of β€œDisallow”

Google has confirmed it has plans of expanding the typos it tolerates for the disallow directive. So if your robots.txt contains a stray β€œdissallow” or β€œdisalow”, Googlebot may still parse the intent correctly in the future. Don’t rely on this β€” fix the typo β€” but it is a useful safety net.

What This Means for Site Owners

If your robots.txt contains any of the directives above, you can take a deep breath: Googlebot has been ignoring them all along, which means they have not been silently breaking your SEO. The actual risks come from a different direction:

  • Unsupported directives clutter your file and make audits harder β€” especially during migrations or traffic incidents.
  • Stakeholders may see them and assume a privacy, indexing, or AI-use policy is enforced when it isn’t.
  • Other crawlers (Bing, Yandex, GPTBot) may still honour some of these directives, so deleting them blindly can affect non-Google traffic.

A 5-Minute robots.txt Audit

  • Pull your current robots.txt file from yourdomain.com/robots.txt.
  • Search for any of the nine directives listed above.
  • Decide: keep (for another crawler), delete (no system uses it), or move the goal elsewhere (meta tag, header, sitemap, redirect).
  • Test the file using Google Search Console’s txt report against representative URLs.
  • Add a comment line documenting why any non-Google directive remains, so future teammates don’t panic
Dileep Thekkethil

Dileep Thekkethil is the Director of Marketing at Stan Ventures, where he applies over 15 years of SEO and digital marketing expertise to drive growth and authority. A former journalist with six years of experience, he combines strategic storytelling with technical know-how to help brands navigate the shift toward AI-driven search and generative engines. Dileep is a strong advocate for Google’s EEAT standards, regularly sharing real-world use cases and scenarios to demystify complex marketing trends. He is an avid gardener of tropical fruits, a motor enthusiast, and a dedicated caretaker of his pair of cockatiels.

Keep Reading

Related Articles

Link Building Vendor Scorecard
Built from auditing 40+ vendors
⏸️

Wait. You're This Close to Your Score.

You've answered several out of 20 questions. Just a few more and you'll see your full vendor scorecard.

If you leave now, you won't see how your vendor stacks up against industry standards, where your biggest risk gaps are, or what your peers are doing differently. Finish the last few questions to unlock your complete report.