We have not discussed duplicate content recently. Still there are mixed reports as to whether duplicate content is still that big of an issue or not. The fact is is that if you utilize the same content that exists elsewhere and are malicious with regards to search, you can be "penalized" for this. While Google says that you may not get penalized but a "filter" might be applied. Penalty... filter, let's be honest, Google will control your online destiny, at least in terms of Search.
When you are a company that is trying to organize the world’s information (Read: Google
), relevancy and duplication are big concerns. With all due respect to Google’s advertising platform, Google is in the business of Search and all about providing the world’s informaiton to those in search of it. Google’s mission: to organize the world’s information and make it universally accessible and useful. If you have been in the Search industry for some time, you will of no doubt heard the proverbial “content is king” statement. A bit of a more accurate statement might read “unique content is king”.
Some of you might remember Google’s Supplemental index where Google placed older pages or perhaps “duplicate pages” in limbo in a separate index from their main index. The supplemental index still exists, Google just does not communicate to us the supplemental results. Is this where duplicate content ends up? Perhaps, but over recent years, there has been conflicting reports about Google and their duplicate content penalties. Google more recently has stated that they do not penalize sites for duplicate content. So what’s the deal with duplicate content?
Myth #1: Google penalizes sites for having duplicate content
Technically Google does not penalize a site if there is duplicate content. According to Greg Grothaus of Google’s Search Quality Team, “this is not the case. That’s not to say that duplicate content can’t have a negative impact on your rankings, but Google itself is not penalizing you for it. “ Having said that if you are blatantly scraping content from another site or resource, Google will deal with it. The algorithm is smart enough to factor in the age of the content or longevity (based on link popularity etc.) as to the authority of the content. Chances are if there is duplicate content, Google will pick an authority and that authoritative piece of content is the content that will be placed in the index and be found in the search results for a given keyword query.
Myth #2: There is a certain percentage of duplicate content that is acceptable
Think of it this way, Google is like a library and they are trying to organize books (i.e. web pages) by topic (i.e. relevancy). A duplicate entry does not necessarily provide a richer experience, so Google rather than serving up multiple copies of the books decides to serve up a single copy that is the most relevant to the user. With Google, duplicate content is simply a factor on a “by query” basis. Experience suggests that if you have multiple web properties with the exact same content, only one of the web properties will show up in the search results for a given query. However that’s not to say that if you modify your query, your other web property may in fact show up. The fact is duplicate content creates “noise” and pollution on the Web. There is no magic percentage as to what amount of content could be duplicated between your web properties. The fact is that if you want users to find your site, make your content:
By addressing these three issues you will be able to provide your audience with the information that they may be looking for. In addition, other sites will start linking to your content which can only help from an SEO perspective.
Yes it is true that Google has certain guidelines that they would like Webmaster’s and site owners to follow. In fact you can view those guidelines here: http://www.google.com/support/webmasters/bin/answer.py?answer=35769
. The fact is that duplicate content can be caused inadvertently and some site owners may not even be aware of it. According to Google, duplicate content can be defined as:
… substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.
Myth 3: Google will not remove a site from their Index for duplicate content practices
Here’s where some of the confusion often arises. Contrary to myth #1, Google will remove a site from the Index for duplicate content practices if after Google’s review of your site indicated that you engaged in deceptive practices, Google reserves the right to remove your site from the Index. According to Google’s Webmaster Guidelines on duplicate content:
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.
However, if our review indicated that you engaged in deceptive practices and your site has been removed from our search results, review your site carefully.
Google also suggests that there are times when duplicate content is not deceptive in nature:
- Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
- Store items shown or linked via multiple distinct URLs
- Printer-only versions of web pages
A common example used when describing duplicate content is with a homepage. A while back, Greg Gothaus from Google used the following to illustrate examples of URLs that are all different (and thus duplicates) in the eyes of Google.
Dealing with Duplicate Content
A common issue we come across with clients, is large stores featuring content generated from a shared database or duplicate content generated by a CMS system. Frankly there are a number of ways to deal with duplicate content.
- Create Unique Content – authoritative, informative and useful
- Utilize a separate database for specific web properties – avoid populating the same content from the same database on different sites
- Canonicalization – informing Google and other engines of your preferred URL using canonical tags
- Using Robots.txt to block content from the Index – guide the engines to the content that you want treated as the authority
- Use 301 (permanent) Redirection
- Consistent Interlinking
- Set a preferred domain in Google Webmaster Tools
- Utilize a search engine friendly CMS
Google’s Webmaster Guidelines on Duplicate Content: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=66359
Google on Duplicate Content and Multiple Site Issues: http://googlewebmastercentral.blogspot.com/2009/09/duplicate-content-and-multiple-site.html
Labels: duplicate content