So what's the deal with duplicate content?

Dealing with duplicate content is one of the most irksome aspects of being a writer, and in terms of being an online writer, you also have to deal with the ramifications of copied content. Whether it’s been done in a scheming ‘black hat’ way, or whoever’s responsible for the copying did it completely innocently, duplicate site content can be a big issue.

However, there’s still a large grey area over precisely what duplicate content can do to your website. It’s as if duplicate content is Google’s own personal bogeyman, scaring website owners and SEOs into ensuring each and every word written on a website is wholly original, lest they be blasted with a ‘penalty’.
But isn’t this how it ought to be? Shouldn’t we all strive to create content on the internet that’s original, thought-provoking and interesting? Of course. The real question, however, is whether or not having a small amount of copied content on your website (for entirely honest reasons) will affect your rankings, and how search engines rate duplicate content on a scale of ‘badness’.
If you’ve ever been at all confused about why you can’t copy content, or you want to know how much you can get away with, now’s your chance to find out.

What is duplicate content?

Let’s start out with the basics. Many people are still unaware of what duplicate content actually is, despite the fact that search engines have been shouting about it for years now. From a writer’s perspective, duplicate content is essentially plagiarism – it’s the intentional stealing of someone else’s written word for your own gain.
That may sound entirely melodramatic, but in most cases, content is copied when one website wants to utilise well-written content from another site, rather than writing anything original. This may not be done maliciously (usually, it’s because people haven’t realised that it’s frowned upon), but it can still have an impact.
This is where we start getting into the debate of whether or not duplicate content ought to be ‘allowed’, so long as it hasn’t been done to manipulate search engine results.
The difference between black hat and white hat
There are two aspects of SEO that you need to be aware of if you’re going to be successful, otherwise you may end up picking up some bad habits that can do you harm in the long run.
Google, and all other search engines for that matter, promote natural SEO techniques which do not manipulate results. This means that, rather than using tactics that essentially force search engines to rank your website higher than competitors, you do everything by the book and so are seen as a more authoritative website by search engines. Thanks to this, you’re then promoted above your competitors, with long-lasting results.
Content makes up a very large part of these techniques, and there are black hat and white hat techniques that you can use to make your content more noticeable.
Black hat:
Duplicating content from pre-existing websites that rank very highly, in order to make search engines see their website as similarly authoritative. Also, writing content specifically for search engines, by using tactics such as keyword stuffing.
White hat:
Ensuring all of the content on your website is completely original and has been written with the reader in mind. White hat SEOs ensure their content is as informative and natural as possible.
What does Google think of duplicate content?
Duplicate content was first properly highlighted by Google with the Google Panda algorithm update of February 2011. Many online companies were completely decimated by the update, which lowered the search engine rankings of websites that were deemed as low quality. One consistent factor in what makes a website low-ranking is often the existence of duplicate content, and so websites with copied content were instantly hit.
Google has said that it only takes a small amount of duplicate content to affect a site’s ranking, although it isn’t just duplicate writing that can drag a site down.
Other factors can include:

‘Shallow’ content that contains no real information
Content that is badly written, and so cannot be useful

So, just a few pages of copied content on your website could be affecting its rankings. Sounds easy to fix, right?
Here’s where it can get a little bit confusing.
Google’s Matt Cutts has also said that around 25% to 30% of all content on the internet is duplicated, but that this isn’t a bad thing because not all duplicate content is viewed as spam. If this were the case, one quarter of the entire indexed internet would be blocked from search engine results. Instead of this, Google’s algorithm is designed to be able to tell the difference between genuine spam (where a user has duplicated content with malicious intent) and accidental or legitimate duplication.
According to Cutts, so long as you don’t use the same boilerplate text across your entire website, for example, a little bit of duplicate content isn’t going to hurt you.
See? Google can be quite contradictory when it wants to be!
What this really means though, is that it’s less about the amount of content you have on your website which has been copied from other sources, and more about your intent.

Will duplicate content affect my website then?
No matter the reason for your site’s duplicate content, there’s a very good chance that it will be affected in the long term. However, the idea that Google penalises websites for the use of duplicate content isn’t totally accurate.
Google handles duplicate content by grouping pages with similar content together, and then only showing one of these pages in search results. This ensures that a search engine results page (SERP) isn’t swamped with websites all displaying the same content and, again, goes to the heart of Google’s wish to be useful to online users.
This doesn’t mean that your website has been penalised directly, it’s just that this is a symptom of duplicate content. You will, however, probably find that your search impressions (the number of times your website is seen on SERPs for a query) decrease, and that your traffic (the number of people that visit your site) also dips as a result of this. Google still offers an option to remove this filter from SERPs, so users can unfilter their searches to see all pages, but how often do people really do this?
Google does, however, sometimes penalise websites that are duplicating content in a spammy manner. It’s totally within its rights to do this, and it also allows other users to report websites that use these spammy techniques.
What’s the difference between internal and external duplicate content?
An increasing number of people now understand that duplicating content from other websites is bad, but yet more haven’t made the connection to internal duplication. If you’re copying content across your website, and this content is actually being copied from your own website, it’s just as iffy in terms of how Google views it.
Internal duplicate content can happen for a number of reasons, including:

Using the same boilerplate text on all of your site’s pages
Your site structure could end up making duplicate pages of content by accident. For instance, if you have an ecommerce website and your category section allows users to filter products by category, you could end up with multiple pages for the same product which appears in many different categories
Having multiple pages with trailing slashes in URLs (www.site.com/page as well as www.site.com/page/). Each of these URLs will be viewed by Google as individual pages, therefore making the content duplicated
Copying content from the homepage across ‘About Us’ or ‘Why Use Us’ pages

As you can see, it’s easy to do, but it could still leave you open for a reduction in impressions and traffic, leading to a lack of conversions.

How can I tell if I have duplicate content on my site?
A lot of website owners are genuinely surprised when they discover their site features duplicate content, because it may be the case that others have utilised their high-ranking content in order to get ahead. Others honestly didn’t think it wasn’t allowed, and so have always copied content.
If you want to find out whether or not you have any copied content on your site, you’ve got a number of options.
Firstly, you could use a tool such as Copyscape, which detects content on your site by searching the internet for similar content. It can be a huge help in the hunt for copied content.
The other option is to manually search for content, which can take a fair amount of time but which ought to be part of your regular spot-checks. As well as uncovering content problems, it’ll also give you an intimate knowledge of your website and will allow you to uncover any other underlying issues in the process, such as un-linked pages or bad page titles and meta descriptions.
How can I remove duplicate content?
Google is very clear on what you should do if you have duplicate content – rectify the problem. From completely rewriting content to using various techniques to hide the content from Google’s prying eyes, there’s always something you can do.
Such as the following:

Rel=“canonical” – Using this tag in your page’s code will allow you to set one page of duplicate content as the ‘preferred’ page which will be displayed in search engines, while any other pages will be omitted. However, it’s an incredibly powerful fix, as it can take months for a page to be re-indexed in Google if you remove the tag from the code
Meta robots tag – If you want to stop a page from being indexed by Google altogether, you can use the code <emon duplicate pages
Paginated content – A common cause of duplicate content is paginated content, where products are split over numerous pages to make them easier to browse. This can create seemingly spammy URLs, and if each page uses the same boilerplate text it will be seen as duplicated. One way to fix this is to ensure the pages are dynamic, with content only on the first page

While you don’t necessarily need to completely rewrite your content should it be duplicated, it can be a good idea to refresh it on a regular basis. This can be the case whether or not it’s been copied, as it can help you to keep your website ‘fresh’. Creating additional content, editing older content and refreshing content with new points of view can help your site to become more useful and interesting to visitors.
Duplicate content has been made to appear as a big issue for SEOs in recent years, ever since the Google Panda update took place. However, in many instances, a little bit of duplication won’t ruin your site. Copied content is a natural aspect of the internet, and is something that will never go away.
That said, staying vigilant, making sure you aren’t lazy with your content creation, and being certain that you always follow Google’s best practices to ensure you stay within the lines will pay dividends in the end. If you’re copying content for black hat reasons though, you can be sure Google will take action in the end, whether this be by omitting your pages from search results or by taking things further.
What are your views on duplicate content? Do you think it matters, or is it something you never worry about? What do you think the best fix for it is, as well? Let us know by getting involved on Twitter at @theukseo.