Sitemaps – A Complete Blueprint for SEO Professionals
By: Dileep Thekkethil
August 30, 2022
Table of Contents
You have built a fantastic website. Congratulations on this feat.
So, how is it fairing on Google search?
If your answer is “not that great,” you have already joined the slew of millions of website owners who share the same woe.
Many website owners are disillusioned to find their “top-notch website” failing to hit the sweet spot on Google.
If you’re among them, the time and investment you have put in will not go in vain!
Most websites fail to show up on Google because the site doesn’t have a sitemap file.
You may be throwing tantrums at Google for not displaying your website.
But the poor chap is as naïve as any other person who doesn’t know about your website.
You have to shout it loud to ensure that the world’s best search engine hears you.
(Ironical, but officially, Google insists on doing this.)
The medium to do this big shout-out is a sitemap file that you submit to the search console.
What is Sitemap in SEO?
Your website is either an information hub or the best source for the targets to find the solution they are looking for.
What makes it great is the different types of content you offer the users.
Text, images, videos, PDFs, we all know these are the building blocks of a good site.
Think it this way. You have all these types of content scattered across a page. For any user who visits that page, it would look haphazard.
The same happens when Google robots come to crawl your website.
The bots initially are unable to understand the different types of content on your site, and they take time to understand each of these.
However, if you make this process more natural to the bots, there are better chances to crawl more items within your site and help you rank better.
What you, as a webmaster, have to do is organize all these different types of content into a list that is understandable to the Google bot or for any other search engine bot.
With an error-free sitemap, the Google crawlers can effectively find and decipher the content within your site.
A properly set up sitemap file helps Google identify important pages on your website and find the relation between each entity.
For example: If you have recently updated a blog with more information, the sitemap file gives information to the search engine bots about the last time you updated the content, the name of the author, and how frequently the page is being refreshed.
Can Sitemaps Help Search Engines Crawl Pages Faster?
Sitemaps on your website give signals to search engine bots to crawl your site in an intelligent manner.
When you update a piece of content on your website, the sitemap automatically updates the data stored with regards to the particular page.
The next time Google crawler visits your website, the sitemap puts across the updated page for immediate recrawl. Thus, the new content gets updated on Google’s index.
Google may even rank your content higher after the recrawling if you have made a substantial improvement with the quality of the content.
The experiment is still relevant as Google is now putting more emphasis on sitemaps than never before.
Casey Henry’s Site Map Experiment
Like many of you reading this blog, Casey was bewildered about the need for a sitemap. He wasn’t too convinced with the information and the best practices provided by Google.
So, he decided to check it by himself.
To start with, he contacted a client and requested to insert a tracking code on their WordPress websites that used Google XML Sitemaps Generator.
The client was asked to continue posting the article as early. Over time, 12 posts were added to the website, out of which Henry updated the sitemap with 6 posts and left the other six out.
After the experiment, Henry was awestruck to find that it took just about 14 minutes for the URLs submitted through the sitemaps to get indexed in Google.
The other 6 URLs that were not added to the sitemap took 1375 minutes or 22 hours to get crawled and indexed.
The data clearly shows that Google’s crawlers are faster than those of Yahoo’s when it comes to indexing new pages without a sitemap.
Considering that this is the data from 2009 and Google bots have become more sophisticated, the crawl time without submitting the sitemap must have come down by 50%.
When the sitemap was added, Google bots were able to crawl the webpages within 14 minutes. That said, there are a few other factors that may have impacted the crawl rate.
Once Google identifies a website actively publishing content, that too on a daily basis, the bots will come visiting the site more often to index new pages.
In addition to this, there are news websites that get the articles indexed on Google in real-time.
What contributes to faster indexing of news and magazine websites is the RSS feed added to the Google News dashboard and the Google News sitemap.
Now that you know the importance of the sitemaps for faster indexing, let’s see how sitemaps help in better crawl budget optimization.
Can Sitemaps Help in Crawl Budget Optimization?
Yes. Keeping your sitemap neat and clean will ensure that search engine bots will spend more time crawling important pages on your website.
However, this is not practiced by most webmasters! The result is that Google bots spend time crawling and indexing pages that are either canonicalized or labeled with no index.
The gravity of such haphazardly created sitemaps is seldom talked about. These kinds of cluttered sitemaps can result in search engine bots spending more time on irrelevant pages.
As a result, more important pages on the site are left out by the crawlers, resulting in critical pages taking more time to appear on Google SERP.
To avoid such loss of bandwidth, ensure that only pages worth showing in SERP are added to the sitemap in addition to removing no index, canonicalized, and 404 error pages.
When the bots find a string of pages with errors, it may skip the rest to save the crawl time. Optimizing the sitemap file with focus on the exceptions added to robots.txt file must be in your priority list.
What Are the Different Sitemap Formats Accepted by Google?
Google has confirmed that it accepts various formats of sitemap files. However, Google has restricted the size of a single sitemap to 50 MB.
But you don’t have to worry unless you’re an enterprise website owner. The 50MB size is enough to include at least 50,0000 URLs.
If you go beyond the 50 MB benchmark, the sitemap file has to be split into smaller clusters and then added to the search console separately as Google supports multiple sitemaps.
Sitemap Formats Supported by Google
- XML Sitemaps
XML or Extended Markup Language-based sitemaps are the most common ones that you find on the web.
If you’re running a WordPress website and using plugins to generate sitemaps, this is the format that you can generate.
Example of a basic XML sitemap structure:
<?xml version="1.0" encoding="UTF-8"?> <urlset > <url> <loc>https://www.example.com/foo.html</loc> <lastmod>2018-06-04</lastmod> </url> </urlset>
- Sitemap of Sitemaps
This is most appropriate for news and shopping websites that have millions of URLs already added.
The XML format for the sitemap index file is similar to that of the normal sitemap. However, the elements used within the XML will change.
The parent tag surrounding the sitemap will be “sitemapindex,” whereas the parent tag of each child sitemap listed will be “sitemap.” The location attributed must be directed to the URL of the child sitemap.
Here is an example:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex > <sitemap> <loc>https://www.example.com/sitemap1.xml.gz</loc> </sitemap> <sitemap> <loc>https://www.example.com/sitemap2.xml.gz</loc> </sitemap> </sitemapindex>
- RSS, mRSS, and Atom 1.0 Sitemaps
This is the best format for news and magazine websites as they are already using the RSS feeds for multiple purposes.
All you have to do is to add the RSS feed URL to the sitemap section in the search console. Google will be happy to index the URLs within the RSS feed.
This is the same mechanism that works for Google news publishers. The publishers are asked to submit the RSS feeds in the Google News Publisher Central Tool.
As and when the feeds are updated, the Google News Crawler indexes the new pages added. That too, almost instantly.
- Plain Text Sitemap
So now you have a fair idea on how to create the formats mentioned above.
However, there are many people out there who don’t understand the technicalities.
Google, as we know, caters to all strata and can’t limit the features.
This is why Google has newly announced that the sitemap file can be even a plain text file with the list of URLs that need to be indexed.
All you have to do is create a plain text file with one URL added per line. Upload it to Search Console, and you are done.
It’s as easy as that.
Here are a few things to keep in mind while creating a text sitemap file.
- Use UTF-8 encoding while saving the file.
- Ensure that there is nothing other than the URLs in the file.
- There are no constraints with the file name. However, ensure that the extension is .txt
- Different Types of Sitemap Extensions
During the Web 1.0 era, the content implied only text.
However, over the last ten years, the web has transformed in ways that even Sir Tim Berners-Lee, the founder of the World Wide Web, never even thought about.
Your website is now a collection of text, images, and videos, and it is important to organize each of these by establishing the context.
A sitemap is currently the best option to organize these varied multi-media content so that search engines can easily find them and index.
Additionally, some pages require the immediate attention of Google and its crawlers.
News websites fall in this list as the content published is for immediate dissemination, and any delay may result in the news becoming stale.
- Video Sitemap
Videos are trending on the web and many websites are using videos to keep their target audience engaged with the content.
This is exactly the purpose served by the video sitemaps.
Google crawlers may not be able to understand the video content on websites, which is something the search engine giant has admitted.
However, with a video sitemap, which includes the links of all pages and the links to the relevant video file uploaded on your website, there are fewer chances for the crawlers to miss out on the videos.
Example of Video Sitemap
<urlset xmlns_video="https://www.google.com/schemas/sitemap-video/1.1"> <url> <loc>https://www.example.com/videos/some_video_landing_page.html</loc> <video:video> <video:thumbnail_loc>https://www.example.com/thumbs/123.jpg</video:thumbnail_loc> <video:title>Grilling steaks for summer</video:title> <video:description>Alkis shows you how to get perfectly done steaks every time</video:description> <video:content_loc> https://streamserver.example.com/video123.mp4</video:content_loc> <video:player_loc> https://www.example.com/videoplayer.php?video=123</video:player_loc> <video:duration>600</video:duration> <video:expiration_date>2021-11-05T19:20:30+08:00</video:expiration_date> <video:rating>4.2</video:rating> <video:view_count>12345</video:view_count> <video:publication_date>2007-11-05T19:20:30+08:00</video:publication_date> <video:family_friendly>yes</video:family_friendly> <video:restriction relationship="allow">IE GB US CA</video:restriction> <video:price currency="EUR">1.99</video:price> <video:requires_subscription>yes</video:requires_subscription> <video:uploader info="https://www.example.com/users/grillymcgrillerson">GrillyMcGrillerson </video:uploader> <video:live>no</video:live> </video:video> </url> </urlset>
- Image Sitemap
Like videos, images are now integral to webpages, and users are so accustomed to seeing images that any page that lacks images are looked down upon.
The image sitemap, similar to the video sitemap, serves the purpose by giving search engine information to discover images and the pages linked to it.
The images on the sitemap are later used for showing up on Google Image search and in some of the SERP features as well.
Example of Image Sitemap
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns_image="https://www.google.com/schemas/sitemap-image/1.1"> <url> <loc>https://example.com/sample.html</loc> <image:image> <image:loc>https://example.com/image.jpg</image:loc> </image:image> <image:image> <image:loc>https://example.com/photo.jpg</image:loc> </image:image> </url> </urlset>
- News Sitemap
If you’re a news publisher, the News Sitemap extension is something that Google recommends you to maintain.
The News Sitemap becomes all the more important if you’re a publisher who has received the Google News approval.
Unlike the other sitemaps that we discussed until now, the News Sitemap is not constant. The URLs that appear on the news sitemaps are updated every two days.
Any URL within the News Sitemap that is more than two days old has to be removed. However, some free tools and plugins can help publishers to do this.
Even after your URLs are removed from the News Sitemap, they will “remain in the index for the regular 30-day period,” says Google.
Google News Crawler frequently crawls these URLs, and often visit Google news publishers.
We will introduce these tools in the next section of this blog.
Example of Google News Sitemap
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns_news="https://www.google.com/schemas/sitemap-news/0.9"> <url> <loc>https://www.example.org/business/article55.html</loc> <news:news> <news:publication> <news:name>The Example Times</news:name> <news:language>en</news:language> </news:publication> <news:publication_date>2008-12-23</news:publication_date> <news:title>Companies A, B in Merger Talks</news:title> </news:news> </url> </urlset>
How to Build a Sitemap?
Until now, we were talking about the technicalities of a sitemap. Now it’s time to dive into building one for your website.
Most of the websites currently use WordPress as their CMS. Thanks to the popularity of the DIY CMS, more people have come online with their businesses.
Any website that uses CMS can build the site map within a few minutes. But have your built one for you?
If you haven’t, here is a three-step process to build a sitemap that is search engine crawler-friendly.
Decide on the Pages to Be Added to the Sitemap
Your website may be new or quite old. In both cases, you may have a few junk pages with the site that are either canonicalized or redirected.
However, when you add these to the sitemap, the search engine crawlers may try to index it only to find it worthless.
This leads to the search engine spending more time crawling webpages that are non-indexable. If you want to avoid such wastage of crawler resources, ensure that you add pages that are important for you.
You can identify these pages by running your website through Screaming Frog Tool or by using plugins like Yoast.
Using Screaming Frog SEO Spider Sitemap
Step 1: Add your website URL
Step 2: Click on Sitemap and then XML sitemap
Step 3: Click on Next and download the sitemap
Screaming Frog, unlike other online tools, is a desktop software that is popular among the SEO community for intense research on websites.
Don’t worry if you’re not an avid user. Screaming Frog has a free version for you to can crawl websites with less than 500 pages and generate an error-free sitemap.
This tool allows you to find pages that are recently modified and also segregate them based on the crawl depth, type, and Hreflang of the pages.
As soon as you select the options that best suits you, all you have to do is download the file and it’s ready to be uploaded into Google Search Console.
Creating Yoast SEO sitemap
The popular WordPress SEO plugin Yoast, which has 5+ million active installations, is the go-to sitemap generator tool if WordPress CMS powers your website.
So, how easy is it to build a sitemap using Yoast? The answer is very.
Yes, you heard it right. The moment you install the Yoast plugin for your website, a sitemap is automatically generated.
All you have to do is to go to the Yoast Setting, and inside the general settings, you can find the different features offered by the Yoast plugin.
Within the Features tab, you can find the option XML Sitemap on by default.
However, you may not get the exact URL of the sitemap until you click on the more info.
On clicking the More Info tab, you fill out the Sitemap URL to be added into the Search Console.
By default, Yoast uses the permissions used within each post and pages while generating the XML sitemap. This ensures the crawl budget is optimally used.
How to Create Sitemap in SEO-Friendly Way Using Online Tools
Imagine you’re maintaining a static website with a few pages. You don’t require additional codes or embeds to create a Sitemap.
There are tons of free tools out there that can help build a static XML and be uploaded to your web server and later to the search console.
Here are a few such free tools that we have tried in the past to build a Sitemap file:
XML Sitemaps: Using this online sitemap generator, you can create an error-free sitemap of up to 500 pages. However, if you have more than 500 URLs, you may have to use the paid version of this tool.
Additionally, the tool also has an Installable version, which is a PHP file that can build sitemaps for sites on the go. But this feature also comes at a price.
XML Sitemap Generator: This is a free tool that is available both as a plugin and as a free online sitemap generator.
The online sitemap generation of this tool works almost like the plugin.
However, the website details, including the Modified Date, Change Frequency and Priority can be set using the online tool.
My Sitemap Generator: This, yet again, is a free online tool that creates sitemaps on the go. This online sitemap generator tool can create sitemaps for a website with less than 500 URLs for free.
However, if the website has more than 500 URLs, you can use the paid version of Static Pro and Dynamic to build your sitemap.
The Static Pro version can crawl up to 1 million URLs or import up to 3 million URLs. It also allows the enabling of Google Sitemap extensions (Video, Image, News) and sitemap creation for supports multi-lingual websites.
Creating SEO-Friendly Sitemap For Shopify website
If you’re maintaining a Shopify website, questions like how to create sitemaps and which pages to include in the sitemap shouldn’t bother you.
Thanks to the amazing team at Shopify, the sitemap files come by default, and it’s updated as you add or delete a product.
If you’re yet to find the sitemap inside your Shopify website, just type www.yourdomain/sitemap.xml.
Shopify automatically creates separate sitemaps for products, collections, blogs, and other webpages.
All you have to do is to add the file to the search console.
How to Submit an SEO Sitemap on Google Search Console?
Now that you have created a sitemap file for your website, you need to make sure that Google can access it without taking the trouble of having to crawl the rest of the pages to find it.
Google Search Console is one place where the website owner can upload the sitemap and ensure Google’s crawlers sees it by default.
Steps to Submit Sitemap to Search Console
Step 1: Sign in to Google Search Console
Step 2: Select the Sitemaps option inside the Index tab
Step 3: Enter the URL of your sitemap and press submit
Step 4: Check the Status
Step 5: Fix the Errors and Resubmit
What is SEO Sitemap Priority?
Those of you who have built sitemaps using the tools mentioned above may have come across a sitemap attribute “Priority.”
The <priority> tag is used to provide Google information about the importance of pages that appear within the sitemap. The pages’ priority can be determined based on different factors, including the change frequency and business interests.
The priority attribute supports values ranging from 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, with 0.0 being the lest priority one to 1.0 being the top priority pages.
It’s generally believed that URLs with high priority scores will get crawled in quick succession. However, there is still debate going on about whether Google crawlers consider this attribute when indexing the pages.
The official Google Webmasters blog has categorically stated that it does not consider the <priority> score while crawling and indexing the sitemap file.
“Google supports several sitemap formats described here. Google expects the standard sitemap protocol in all formats. Google does not currently consume the <priority> attribute in sitemaps,” says Google on the official page about sitemaps.
According to Google, it considers the updated date within the sitemap while crawling as it’s an indication that there is an update to the existing cache stored in the Google server.
Google’s John Mueller made the confirmation about the same in a recent webmaster’s hangouts session, saying that by adding the updated date of the content on the XML sitemap will help Google to crawl important and recently added pages even faster.
Now that you have uploaded the sitemap on Google Search Console, you will get an instant message on whether the site map can be indexed by Google crawler. If the search console finds an error on your sitemap, it will display an error message.
After submitting a sitemap in Google Search Console, you can find the following details:
Type of Sitemap Submitted: As we said earlier, there are different types of sitemaps that you can create. Soon after you submit the sitemap on Google, it will display the type of sitemap uploaded.
If Google is unable to process the type of sitemap that you have uploaded, it will show an error message, “Unknown.” This means Google cannot process the sitemap and requires an immediate fix.
If the file that you submitted in the search console is error-free, you can see a success message.
Last read: Google bot is busy crawling and indexing millions of pages on the internet. The crawler may not visit your sitemap file frequently.
However, this depends on many factors, including the pattern in which you update the content regularly. The data displayed in the last read section indicates the last time Google crawler visited your sitemap.
By now, we have covered all the important aspects of creating and submitting a sitemap on Google for better search visibility. If you have any questions about generating a sitemap for your website, just drop a comment below.