Avoiding duplicate content with your site or blog
One of the most important rules in SEO (Search engine optimization) is avoiding duplicate content. Google has some information on their page about how they handle duplicate content. Unfortunately, the Googlebot is rarely smart enough to know which content is original. Google wants to avoid users that copy and/or republish someone else's work simply to get content for their site.
You also want Google find pages on your site that have substance, and that are not just a copy of content from one of your other pages.
So how do you avoid it on your site? The first step is to identify potential pages that have duplicate content. It's probably happening without you even being aware of it.
Type this into Google: site: http://www.yoursite.com
I'm using blogger, and by default here are some pages that are indexed that should not be:
Now that we've identified the offending pages, we can create or modify our robots.txt file, at the root of our site.
Here is what I could add to my robots.txt to block those pages:
Disallow: /*? Disallow: /*_archive.html$
There is one big problem. If you're using a service like Blogger (like this blog), you can't edit your robots file. There has been talk of adding support, but we have to deal with what is available.
The best I've been able to come up with, is adding this into the head (look for) of my template code:
<b:if cond='data:blog.pageType == "archive"> <meta name="robots" content="noindex, nofollow" /> </b:if>
This adds a noindex and nofollow meta tag to the generated archive pages. I have not yet figured out how to remove pages that contain parameters (?param=value). If anyone has a way to do it, please let me know! I've actually been considering removing the archive widget to solve it.
Like this post? Please share it!