Blog

duplicate content seo

The Real Story Behind Duplicate Content and Penalizations

Duplicate content is a cardinal sin that will drive search engines to de-index your entire website. This is what many came to believe following a flurry of algorithm updates from May of 2015 to March of 2017. The truth is more complicated. Although content quality is a major factor in Google’s algorithm, many claims surrounding duplicate content penalizations are unwarranted and blatantly untrue. For that reason, I’d like to provide some best practices regarding duplicate content and how you can better optimize your content.

What Types of Duplicates Can Be Penalized?

Using duplicate content is inevitable, and almost every website has instances of it. According to Matt Cutts, Google’s algorithm expert, approximately 25–30% of content throughout the internet is duplicated from another web property.

We constantly see eCommerce retail websites using manufacturers’ product descriptions, branded boilerplate content, and legal information. And in many situations, that’s required by the manufacturer—so it’s often unavoidable. Google knows that, and they’ve explicitly stated they will not penalize a website for duplicate content UNLESS the content is being used in a manipulative, intrusive, or misleading way.

What does Google mean by manipulative or misleading? For example, some eCommerce companies create a number of microsites with entirely duplicated content in order to rank in multiple positions for the same search query. Another example of a misleading strategy is when a blogger aggregates content from other websites instead of creating his/her own content. These types of sites could easily be removed from Google’s index as a penalization, but most instances will go unnoticed.

That said, there is always a caveat. While Google won’t necessarily penalize your website for non-egregious uses of duplicate content, they do encourage and reward unique content. Google groups all of the sites that have similar content into a cluster. From that cluster, Google’s SERPs display content from the URLs with the highest domain authority. The websites with lower domain authority won’t be removed from Google’s index, but it will be challenging for them to rank higher than the sites that created unique content.

How to Address Duplicate Content

There are countless reasons you might accidentally create duplicate content without manipulative intentions. For that reason, here is an outline of the most frequent scenarios with duplicates and how to resolve them.

Problem #1 – Separate URLs exist for different variations of the same product.

We often see this problem arise on large eCommerce or retail websites. For example, a used handbag retailer might create multiple pages for the same product if they have the item in different conditions (mint, used, poor, etc). Or, you might see a website that has the same product page listed within different categories, so the URL is technically duplicated within multiple subfolders.

Solution – Canonical Tags

If it makes sense from a design/UX standpoint to create multiple URLs for one product or one page, it’s paramount to tell Google this. Otherwise, you might not see any of the URLs rank well or even get indexed. In this scenario, it makes sense to utilize a canonical tag. A canonical tag tells Google there are multiple pages that feature the same content, but you only want one of the URLs to be indexed.

A canonical tag looks like this, and should be placed directly in a page’s header:

<link rel="canonical" href="http://usedhandbags.com/leather-bag/">

Let’s apply this tag to the scenario above. The tag would be inserted into the header of all the duplicate pages, and it would tell Google that the href URL (http://usedhandbags.com/leather-bag/) is the only URL you want indexed. None of the other URLs will be included in Google’s search results, but you’ll at least be able to rank well for one URL, as opposed to none. If you encounter this situation, it’s also important to ensure the non-indexed URLs are easily accessible from the URL that is indexed.

Problem #2 – There are multiple URLs for one page.

This issue is one of the most common causes of duplicate content. After implementing an SSL tag or modifying permalinks, you might realize that there are a lot of new duplicates for one URL.  I’ve created an example below:

http://www.example.com

https://www.example.com

https://example.com

http://example.com

Solution – 301 Redirects

In this scenario, we see four duplicates of a website’s homepage. While canonical tags could prevent these URLs from being considered “duplicate”, it’d be more beneficial to use 301 redirects. Here’s a general rule of thumb: If you have the choice of doing a 301 redirect or setting a canonical, you should always do a redirect unless there is a technical reason not to do so. 301 redirects will pass the backlinks from one URL to another, while canonicals will simply de-index duplicate URLS without passing link juice. The only situations where a 301 wouldn’t be applicable would be temporary redirects, in which case you would want to utilize a 302 or 307. Or, as discussed in the previous scenario, a canonical tag can be used in a situation where a URL needs to remain on a website but you don’t want it to be indexed by Google.

Problem #3 – Pagination is causing duplicate content.

For those who aren’t familiar with pagination, it’s defined as “the process of separating print or digital content into discrete pages.” This is a scenario we often see throughout large eCommerce websites, specifically within category pages. If a category has too many items to fit into one page, product managers will often create a series of paginated URLs all with similar content and URL structure.

As an example, take a look at Lowes.com. Here we see a series of paginated URLs that were created in order to feature all of their available bathroom faucets. If every faucet was included on one URL, it would be far too large. Each one of these URLs has the same exact meta title, description, & body content. As a result, Google could easily identify them as duplicates and choose to filter them out from search results.

Solution – Use rel=”prev”/”next” tags

If you want to avoid drops in your organic rankings, you’ll need to tell Google why these pages are so similar, and also how they relate to one another. This can be done with rel=”prev”/”next” tags, which tell Google this is a series of URLs, not duplicates.

Let’s review an example of how to properly implement one of these tags. In the hypothetical URL, www.example.com/shoes/page2, we’re on page 2 of a series of paginated URLs. In order for Google to understand this is not a duplicate of the first page in the series, we’ll need to add the rel=”prev”/”next” tags to the URL’s header. The tags would look like this:

<link rel="prev” href=”www.example.com/shoes/page1”/>
     *This tag tells Google which page came before the current URL
<link rel="next” href=”www.example.com/shoes/page3”/>
     *This tag tells Google which page comes after the current URL

After Google crawls this site, they will understand only one URL in the paginated string should be indexed. Without implementing the tags, Google would likely filter out every version of the URL and rank them at a lower position.

Problem #4 – You’ve copied content from another site.

Whether intentional or accidental, you might realize there’s content on your site that is copied directly from a manufacturer, competitor, informational website, or any other web property. This could be a lengthy quote, branded copy or even legal documentation. We’re only human, and sometimes a copy-and-paste is easier than writing your own unique content. Still, it’s poor practice and it will never help your organic presence to feature duplicated content.

Solution – Write your own content.

While it likely won’t cause your site to be removed from Google’s index, marketers and web managers must always view duplicated content as a last resort. If you really want to bump up your position within Google’s SERPs, take the time to write rich, digestible, and unique content.

For any questions regarding duplicate content or canonical tags, feel free to reach out and contact us. If you’d like a higher-level view of web content strategy, continue learning with this post from our recent Thirst for Knowledge on Content Strategy for Large Organizations.