Google Crawlers and User Agent Strings – 2023 List
By: Dileep Thekkethil | Updated On: May 22, 2023
Table of Contents
Quick Summary
Google uses various crawlers or bots, identified by unique user agent strings, to index web pages and surface content in different products, features, and services, with some specific to images, news, videos, desktop, smartphone, Google Store, and others, and rules for these bots can be specified in a website’s robots.txt file, robots meta tags, and the X-Robots-Tag HTTP rules.
Key Points Discussed
- Google uses crawlers/bots to scan the web and index pages, with the most common being Googlebot.
- Other Google crawlers include Googlebot Image, Googlebot News, Googlebot Video, Googlebot Desktop, Googlebot Smartphone, Google Favicon, Google StoreBot, and more.
- Each Google crawler has a specific user agent string, which can be identified in your website’s log files.
- User agent strings can be spoofed, so it’s important to verify if a visitor is indeed a Google crawler.
- Google crawlers respect the guidelines set out in your website’s robots.txt file, robots meta tags, and the X-Robots-Tag HTTP rules.
- Different types of crawlers are used for specific purposes – Common Crawlers, Special-case Crawlers, and User-triggered Fetchers.
- A list of directives can be used inside robots.txt and robots meta tags to allow or disallow Google crawlers from accessing certain parts of your website.
Introduction
You may be wondering how Google can find trillions of web pages. The truth is Google heavily depends on different spiders or bots that it has built to scan the whole web to identify pages.
How you may ask, and the answer to that is the crawlers (yet another name for the spiders) look into the trails of links that connect one page to another.
The most common Google crawler that visits your website is Googlebot, however, there are multiple other crawlers that Google uses for different products and features.
This blog post will explore the typical Google crawlers you may encounter in your referrer logs. Knowing these bots will help you to specify them in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP rules.
As I said, Google utilizes multiple crawlers to help them understand the type of page so that they can surface in other products, features and services that the search engine giant offers.
By identifying the user agent token, you can specify rules inside the User-agent line in robots.txt of your website. For example, some specific Google crawlers may use more than one token; however, you only need to match one crawler token for a rule to apply in your robots.txt file.
Here is an exhaustive list of Google crawlers and their corresponding full user agent strings you might come across on your site log files.
Caution: The user agent string can be spoofed. Learn how to verify if a visitor is a Google crawler.
List of Google Crawlers and associated Full user agent strings
Common Crawlers
Google’s common crawlers play a crucial role in indexing web content and ensuring that the search engine provides relevant, up-to-date results. These crawlers are designed to respect the guidelines laid out in the robots.txt file, which tells search engines which parts of a website should not be crawled or indexed.
1. Googlebot Image
This crawler is specifically designed to index images on the web.
Full user agent string: Googlebot-Image/1.0
2. Googlebot News
This crawler indexes news content for Google News.
Full user agent string: The Googlebot-News user agent uses the various Googlebot user agent strings.
3. Googlebot Video
This crawler is specifically designed to index video content on the web.
Full user agent string: Googlebot-Video/1.0
4. Googlebot Desktop
This crawler indexes desktop web pages for Google Search.
Full user agent strings:
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36
- Googlebot/2.1 (+http://www.google.com/bot.html)
5. Googlebot Smartphone
This crawler indexes smartphone web pages for Google Search.
Full user agent string: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
6. Google Favicon
This crawler fetches favicons (website icons) associated with web pages.
Full user agent string:
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon
7. Google StoreBot
This crawler is responsible for indexing and analyzing web pages related to Google’s online store.
Full user agent strings:
- Desktop agent: Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36
- Mobile agent: Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36
8. GoogleOther
A generic crawler used by various Google product teams for fetching publicly accessible content from sites for internal research and development.
Full user agent string: GoogleOther
9. Google-InspectionTool
Google-InspectionTool is the newest addition to Google’s list of crawlers. Google has made it official by adding this information to the Google crawler help document.
Google says, “Google-InspectionTool is the crawler used by Search testing tools such as the Rich Result Test and URL inspection in Search Console. Apart from the user agent and user agent token, it mimics Googlebot.
The user agent token for the crawl activity of this latest crawler is either Googlebot or Google-InspectionTool.
It also comes with two full user agent strings, one for mobile and another one for desktop.
- Mobile
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0)
- Desktop
Mozilla/5.0 (compatible; Google-InspectionTool/1.0)
John Mueller, Senior Search Analyst / Search Relations team lead at Google, confirmed,“The update is now complete.”
If you check the bot activity and crawling activity in your log files, you may spot Google-InspectionTool, particularly if you use the Rich Result Test and URL inspection in Google Search Console.
In case you find issues with these tools, chances are, you may be blocking Google-InspectionTool user agent’s access to your website. Make sure you allow access to it.
Special-case Crawlers
Special-case crawlers, as the name suggests, are designed for specific purposes and may not adhere to the general robots.txt rules like common crawlers do. These crawlers are typically used when there is an agreement or understanding between the website owner and the product that utilizes the crawler.
1. APIs-Google
his crawler is used for webmasters and developers to interact with Google’s various APIs.
Full user agent string: APIs-Google (+https://developers.google.com/webmasters/APIs-Google.html)
2. AdsBot Mobile Web Android
This crawler checks the ad quality on Android mobile web pages.
Full user agent string: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)
3. AdsBot Mobile Web
This crawler checks the ad quality on iPhone mobile web pages.
Full user agent string: Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)
4. AdsBot
This crawler checks the ad quality on desktop web pages.
Full user agent string: AdsBot-Google (+http://www.google.com/adsbot.html)
5. AdSense
This crawler is responsible for analyzing web pages to determine the relevance of content for displaying targeted ads.
Full user agent string: Mediapartners-Google
6. Mobile AdSense
This crawler is responsible for analyzing mobile web pages to determine the relevance of content for displaying targeted ads on mobile devices.
Full user agent string: (Various mobile device types) (compatible; Mediapartners-Google/2.1; +http://www.google.com/bot.html)
User-triggered Fetchers
User-triggered fetchers are designed to perform specific functions upon user request, and are not a part of the regular crawling process. Since these fetchers are initiated by users, they generally do not follow the robots.txt rules, as the user’s intent is to access specific information or perform a certain action.
1. Google Site Verifier
This crawler is used to verify website ownership within Google Search Console.
Full user agent string: Mozilla/5.0 (compatible; Google-Site-Verification/1.0)
2. Feedfetcher
This crawler fetches RSS and Atom feeds for Google services like Google Reader and Google Alerts.
Full user agent string: FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)
3. Google Read Aloud
This crawler fetches web pages to generate audio versions of the content, which can be played back using text-to-speech technology.
Full user agent strings:
- Desktop agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)
- Mobile agent: Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML)
4. Google Publisher Center
This crawler fetches and processes feeds supplied by publishers through the Google Publisher Center for Google News landing pages.
Full user agent string: GoogleProducer; (+http://goo.gl/7y4SX)
Where and How to Use User Agents on Your Website
User Agents in Robots.txt
You can allow or block specific Google crawlers from accessing your content, by adding them as a User-Agent inside the robots.txt file.
User-agent: Googlebot
Disallow: /private/
User-agent: Googlebot-Image
Allow: /public/images/
Disallow: /private/images/
User-agent: *
Disallow: /private/
Directives inside robots.txt that Google respects
- User-agent: The User-agent directive specifies the web crawler or user agent that the following rules apply to. It can target a specific crawler (e.g., User-agent: Googlebot) or use the wildcard * to target all crawlers (e.g., User-agent: *).
- Disallow: The Disallow directive tells the web crawler not to access or crawl the specified URLs or URL patterns. For example, Disallow: /private/ would prevent the crawler from accessing any content within the /private/ directory.
- Allow: The Allow directive is used to grant permission for a web crawler to access specific URLs or URL patterns, even if they are within a disallowed directory. This directive is not part of the original robots.txt standard, but it is widely supported by major search engines like Google. For example, Allow: /private/public-content/ would permit a crawler to access the /private/public-content/ directory, even if the /private/ directory is disallowed.
- Sitemap: The Sitemap directive is used to specify the location of your XML sitemap, which helps search engines discover your site’s content more efficiently. For example, Sitemap: https://www.example.com/sitemap.xml.
User Agents in Robots Meta Tags
Some pages use multiple robots meta tags to specify rules for different crawlers. Google will use the sum of the negative rules, and Googlebot will follow both the noindex and nofollow rules. For Example:
<!– Example: Allowing Googlebot to index, but not follow links –>
<meta name=”googlebot” content=”index, nofollow”>
<!– Example: Disallowing all crawlers from indexing and following links –>
<meta name=”robots” content=”noindex, nofollow”>
Directives Inside Robots Meta Tags
- Meta name: The meta name attribute is set to “robots” to specify that this meta tag provides instructions for web crawlers. For example: <meta name=”robots” content=”…”>.
- Content: The content attribute contains the instructions for the web crawlers in the form of directives. These directives are separated by commas when there are multiple instructions. Some common directives include:
-
- index or noindex: Tells the crawler whether the page should be indexed or not. For example: <meta name=”robots” content=”noindex”> prevents the page from being indexed.
- follow or nofollow: Instructs the crawler whether or not to follow the links on the page. For example: <meta name=”robots” content=”nofollow”> tells the crawler not to follow any links on the page. However, Google doesn’t consider it as a directive but just as a hit and based on circumstances, it may decide to take it into account or not.
- archive or noarchive: Determines if the search engine should display a cached version of the page. For example: <meta name=”robots” content=”noarchive”> prevents the search engine from displaying a cached version.
- Specific user agent: Instead of targeting all crawlers using the “robots” meta name, you can target specific user agents by replacing “robots” with the name of the user agent. For example: <meta name=”googlebot” content=”…”> targets only the Googlebot crawler.
Are you ready to skyrocket your
business growth?
We specialize in driving traffic and propelling businesses just like yours to new heights. Are you ready to be our next success story?
You May Also Like
Helpful Content Update 2023: What Does It Mean to You?
One reason I decided to switch my writing career to core SEO was frequent altercations with SEO teammates. In a way, the hate for SEO made me learn more about it.
Google Algorithm Updates 2023: Helpful Content Update
Google will be rolling out an incremental update to its helpful content system in the coming months.
Passage Indexing: A New Ranking Algorithm from Google
Your Google search experience will reach new vistas starting 2021 as the search engine giant has rolled out a ranking factor – Passage Indexing.
Comments