What is Duplicate Content?
Any content that appears in more than one place on the internet is deemed as duplicate content.
So if you find the same content present in two or more websites, consider it as duplicate.
There are also instances where the same content might appear inside multiple pages within the same site.
Such content also comes under the ambit of duplicate content as Google gets confused about which page to rank on SERPs.
How to Find Duplicate Content on a Website?
There are several tools through which you can detect duplicate content.
We’ve listed a few below:
Siteliner is a freemium SEO tool that lets you analyze duplicate content on your site.
Content duplication can take place unknowingly and hamper your site’s ranking.
A common situation where content duplication frequently takes place is when you choose to show an entire blog on your blog’s homepage instead of the excerpt.
This leads to the presence of the same blog post on at least two different pages on the site.
Google cannot differentiate between the actual blog post and the category/tag page of your blog if you choose to show the entire content on a category page.
The free version allows you to check 250 pages every 30 days.
However, the premium version doesn’t come with any limitations.
On the overview page of the tool, you’ll find the internal duplicate content percentage on the top left.
CopyScape is a popular tool to detect duplicate content.
The free version compares the total percentage of your content that matches with already published content elsewhere.
This tool checks the originality of a piece of content.
Registered users can do up to 50 searches per day.
Plagiarism Checker by SmallSEOTools helps you detect duplicate parts within a piece of content.
Grammarly checks your grammar and spellings besides detecting plagiarized content.
Which Type of Content is Duplicate Content?
There are different types of duplicate content, all of which may not happen deliberately. Some content duplication is the result of certain technical aspects of a website.
Boilerplate content is the content that is present in different web pages of a website. For example, the homepage of any website consists of three main elements- the header, the footer, and the sidebar or navigation bar. In addition to these, some websites also show recent posts on their homepages. When the Google bot crawls this website, they might find this new blog posts present in more than one place on the website, so it becomes a duplicate content.
Copied Content/Scraped Content
Copying content from a site without the permission of the owner is known as copied content. Content scraping is extracting information from the website using a computer software technique. There’s still much confusion about content scraping, and Google practices it as well by showing content as featured snippets. However, with the Panda update, any type of scraping activity is liable to be penalized.
Content curation is taking information from the web and writing a piece of content using the stats and information received from them. Google doesn’t consider this as spam as long as you rewrite the content in your own words or provide the source of the original content from where it is taken.
Content syndication is the method of pushing content to third-party sites as snippets, links, or full content pieces. Sites that syndicate content allow them to be published on multiple sites. This means for a syndicated post, there are several copies available on the web. Sites like HuffingtonPost and Medium allow content syndication.
Does Duplicate Content Affect SEO?
For search engines like Google and Bing, duplicate content can give rise to certain issues like creating confusion for the search engine regarding which version of the content to consider original and rank for search queries. This also creates confusion among search engines in determining whether to direct link metrics like trust authority, link equity, etc., to one page or distribute it among multiple versions.
When a site contains duplicate content, site owners can suffer from poor rankings due to traffic losses. This happens mainly due to search engines being confused about multiple versions of the same content and showing only one of them, thus diluting the visibility of each of the duplicates.
Duplicate content also affects the link equity as other sites need to choose any one of the versions of the content. This leads to the inbound links being divided among multiple sites. As inbound links are a ranking factor, it can impact the online visibility of duplicate content for all the websites where it exists. The net result is the inability of the content to rank in the SERP.
What Causes Duplicate Content?
Duplicate content can happen due to many reasons, the main one being technical. Let us take a look at the common causes below:
Misunderstanding the Concept of URL
In the CMS database that powers a website, there’s probably only a single article, but the website’s software may allow the same article in the database to be retrieved through more than one URL. For the CMS, the article is identified by a unique ID in the database, but for search engines, the URL acts as an identifier. Hence, with multiple versions of the same content present in different URLs, the issue of duplicate content arises.
Session IDs are used to track your visitors on the site and allow them to store items in their wishlist or shopping cart. To do that, you need to give these users individual sessions. A session is a brief history of the activities that visitors perform on your site. The most common way to store these session IDs is in the form of cookies. However, most search engines don’t store cookies. Due to this, some systems come back to using session IDs in the URL. This means every internal link on the website gets that session ID added to its URL. As that session ID is unique to that particular session, it creates a new URL, resulting in duplicate content.
URL Parameters Used for Tracking & Sorting
Another technical cause for duplicate content is the use of URL parameters that do not change the content of a page. For example, when you look for http://www.example.com/keyword-x/ and http://www.example.com/keyword-x/?source=rss, both of them are different URLs to the search engine. With the latter URL, it might be easier for you to track the source from which your visitors came to the site, but for search engines, it’s a case of duplicate content.
Scrapers & Content Syndication
Sometimes, websites use content from a given site and don’t mention the source. In that case, the search engines become unsure about which version to consider original and show in the search results. This type of content scraping can affect both types of sites- the one that is scraping content and the one from where it is scraped.
Order of Parameters
CMS don’t always use proper URLs but set them based on category and ID, such as /?id=1&cat=2. For other website systems, if you enter /?cat=2&id=1, instead of /?id=1&cat=2, they will show you the same result, but for search engines, these are two entirely different URLs. If your site serves duplicate content to different URLs without using any parameters, you should define canonical distribution than blocking crawling for them.
CMS, like WordPress, have the option for pagination of comments. This leads to the content being duplicated across an article URL and comment pages.
WWW vs. Non-WWW
This is one of the prevalent causes of duplicate content across a website. When your content is accessible in both www and non-www versions, the search engine will consider it as a duplicate content. The same problem arises with HTTP and HTTPS content as well.
Is There a Penalty for Duplicate Content on a Website?
Duplicate content is different from copied content when it comes down to context. While copying content is done consciously, duplicate content may arise due to technical faults, as mentioned above. Google’s John Mueller stated that the search engine doesn’t penalize a site for duplicate content, but if you have millions of such pages on your site, then you’re calling in for risks.
Google always rewards websites with high-quality original content. If you try to manipulate existing content by republishing it on your site, altering a few sentences, or using a few new keywords, it will still not add any value to the users. The safest thing to do as a website owner to boost your SEO rankings is to avoid copying content from other sites or to repeat content from your own website.
How Much Duplicate Content is Acceptable?
According to Matt Cutts, 25% to 30% of the web consists of duplicate content. According to him, Google doesn’t consider duplicate content as spam, and it doesn’t lead your site to be penalized unless it is intended to manipulate the search results. The only problem you face with duplicate content is even though your site might have published it initially, other websites that have blindly copied the content may show up in the result for related search queries.
To prevent someone from using a copied version of your content, you can file a request for removal under the Digital Millennium Copyright Act. While Google tries to find the original source of the content to show up in the search results, blocking access to duplicate content pieces might hinder the search engine’s ability to crawl all the versions and filter the best results.
Does Duplicate Content Within a Single Page Affect SEO?
Duplicate content within the same page doesn’t affect SEO unless it hampers the user experience.
If users bounce back from your site due to duplicate content or don’t navigate to other pages, then it might be an issue.
It is best to keep an eye on some metrics like the average time on site, bounce rate, and exit rate.
These can help you to analyze whether the user experience is affected due to the presence of duplicate content within a single page.
Can Duplicate Content Outrank Original?
Yes. In rare cases, duplicate content can outrank original if the webpage or website has high authority.
Given below are some ways to fix duplicate content.
How to Deal With Duplicate Content: Google Recommended Solutions
Here are some practical ways to tackle content duplication on the web:
If your site has been restructured, use 301 redirects in your .htaccess files to redirect users, Google bots, and other spiders. This will give a signal to the search engine regarding which URL to prioritize over others.
Be Consistent & Use Top Level Domains
Try keeping your internal linking as consistent as possible. To help Google offer the most appropriate version of a piece of content, using top-level domains is highly recommended to handle country-specific content.
If you syndicate your content on other sites, Google will always show the version they think is most appropriate for users, which may not agree with the version you personally prefer. It’d be helpful if your content is syndicated on different sites with a link back to the original article. You can request those using the syndicated content to use noindex meta tags to prevent search engines like Google from indexing their content.
Avoid Publishing Stubs
Users don’t like to see blank pages with no content on them. This ruins their time and affects the user experience, which is something that Google considers to be very important. Hence, don’t publish pages on your website without content in them. In case you publish such pages, prevent them from being indexed using the noindex meta tag.
Understand Your CMS
Get familiar with your Content Management System and understand how content is published on your site. Blogs and forums often tend to show the same content in more than one format. For example, a new blog post may appear on the homepage of a website and also under the category page.
Minimize Content Similarity
If you have more than one page that is similar, consider making each piece of content unique by adding valuable content or merging them into one wherever possible.
How to Fix Issues With Duplicate Content on Product & Category Pages
Category pages are the top-level pages that list all the products that come under them on a website.
Users can click on a particular product link from the category page to visit the product page.
The problem arises when a merchant uses identical descriptions on the product and category pages.
When someone searches for something within the identical text snippet, your category and product pages compete against each other.
It might lead Google to direct more traffic to the category page instead of the product page where you actually want your customers to land.
According to John Mueller, it is always a good idea to use unique descriptions in both category and product pages to help Google differentiate between the two.
Your category page can have a general description of a product while the product page is where you’ll provide the complete detail.
Duplicate content is widespread on the web. You should keep an eye on your website to avoid duplicate content issues on your site. For content copied from your site to another, you can always take legal actions under the Copyright Act. You will notice a huge difference in your website ranking and performance just by getting rid of duplicate content issues. So don’t take a risk but focus on developing quality content for your website.