Stay organized with collections
Save and categorize content based on your preferences.
Monday, June 09, 2008
Since duplicate content is a hot topic among webmasters, we thought it might be a good time to
address common questions we get asked regularly at conferences and on the
Google Webmaster Help Group.
Before diving in, I'd like to briefly touch on a concern webmasters often voice: in most cases a
webmaster has no influence on third parties that scrape and redistribute content without the
webmaster's consent. We realize that this is not the fault of the affected webmaster, which in
turn means that identical content showing up on several sites in itself is not inherently regarded
as a violation of our
webmaster guidelines.
This simply leads to further processes with the intent of determining the original source of the
content—something Google is quite good at, as in most cases the original content can be
correctly identified, resulting in no negative effects for the site that originated the content.
Generally, we can differentiate between two major scenarios for issues related to duplicate
content:
Within-your-domain-duplicate-content, that is, identical content which (often unintentionally)
appears in more than one place on your site
Cross-domain-duplicate-content, that is, identical content of your site which appears (again,
often unintentionally) on different external sites
With the first scenario, you can take matters into your own hands to avoid Google indexing
duplicate content on your site. Check out Adam Lasnik's post
Deftly dealing with duplicate content
and Vanessa Fox's
Duplicate content summit at SMX Advanced,
both of which give you some great tips on how to resolve duplicate content issues within your
site. Here's one additional tip to help avoid content on your site being crawled as duplicate:
include the preferred version of your URLs in your Sitemap file. When encountering different pages
with the same content, this may help raise the likelihood of us serving the version you prefer.
Some additional information on duplicate content can also be found in our comprehensive
Help Center article
discussing this topic.
In the second scenario, you might have the case of someone scraping your content to put it on a
different site, often to try to monetize it. It's also common for many web proxies to index parts
of sites which have been accessed through the proxy. When encountering such duplicate content on
different sites, we look at various signals to determine which site is the original one, which
usually works very well. This also means that you shouldn't be very concerned about seeing
negative effects on your site's presence on Google if you notice someone scraping your content.
In cases when you are syndicating your content but also want to make sure your site is identified
as the original source, it's useful to ask your syndication partners to include a link back to
your original content. You can find some additional tips on dealing with syndicated content in a
recent post by Vanessa Fox,
Ranking as the original source for content you syndicate.
Some webmasters have asked what could cause scraped content to rank higher than the original
source. That should be a rare case, but if you do find yourself in this situation:
Check if your content is still accessible to our crawlers. You might unintentionally have
blocked access to parts of your content in your robots.txt file.
You can look in your Sitemap file to see if you made changes for the particular content which
has been scraped.
Check if your site is in line with our webmaster guidelines.
To conclude, I'd like to point out that in the majority of cases, having duplicate content does
not have negative effects on your site's presence in the Google index. It simply gets filtered
out. If you check out some of the tips mentioned in the resources above, you'll basically learn
how to have greater control about what exactly we're crawling and indexing and which versions are
more likely to appear in the index. Only when there are signals pointing to deliberate and
malicious intent, occurrences of duplicate content might be considered a violation of the
webmaster guidelines.
If you would like to further discuss this topic, you can visit our
Webmaster Help Group.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eGoogle can typically identify and prioritize original content, even when duplicated on other sites, so webmasters generally shouldn't worry about negative impacts.\u003c/p\u003e\n"],["\u003cp\u003eDuplicate content issues can occur within a single website or across multiple websites, and Google offers resources to address both scenarios.\u003c/p\u003e\n"],["\u003cp\u003eWebmasters can utilize tools like robots.txt, Sitemaps, and syndication guidelines to manage duplicate content and ensure their preferred versions are indexed.\u003c/p\u003e\n"],["\u003cp\u003eWhile rare, if scraped content outranks the original, webmasters should verify crawler access, Sitemap entries, and adherence to webmaster guidelines.\u003c/p\u003e\n"],["\u003cp\u003eIn most cases, duplicate content is filtered rather than penalized, and negative consequences primarily arise from deliberate, malicious duplication attempts.\u003c/p\u003e\n"]]],["Google addresses duplicate content issues, differentiating between internal and external occurrences. For internal duplicates, webmasters should use Sitemaps and follow provided tips to control indexing. For external duplicates, Google identifies the original source, mitigating negative impacts on the originating site. When syndicating content, webmasters should request backlinks from partners. Scraped content ranking higher is rare and can be due to crawling issues or site guideline violations. Generally, duplicate content is filtered without negative effects, unless malicious intent is apparent.\n"],null,["Monday, June 09, 2008\n\n\nSince duplicate content is a hot topic among webmasters, we thought it might be a good time to\naddress common questions we get asked regularly at conferences and on the\n[Google Webmaster Help Group](https://support.google.com/webmasters/go/community).\n\n\nBefore diving in, I'd like to briefly touch on a concern webmasters often voice: in most cases a\nwebmaster has no influence on third parties that scrape and redistribute content without the\nwebmaster's consent. We realize that this is not the fault of the affected webmaster, which in\nturn means that identical content showing up on several sites in itself is not inherently regarded\nas a violation of our\n[webmaster guidelines](/search/docs/essentials).\nThis simply leads to further processes with the intent of determining the original source of the\ncontent---something Google is quite good at, as in most cases the original content can be\ncorrectly identified, resulting in no negative effects for the site that originated the content.\n\n\nGenerally, we can differentiate between two major scenarios for issues related to duplicate\ncontent:\n\n- Within-your-domain-duplicate-content, that is, identical content which (often unintentionally) appears in more than one place on your site\n- Cross-domain-duplicate-content, that is, identical content of your site which appears (again, often unintentionally) on different external sites\n\n\nWith the first scenario, you can take matters into your own hands to avoid Google indexing\nduplicate content on your site. Check out Adam Lasnik's post\n[Deftly dealing with duplicate content](/search/blog/2006/12/deftly-dealing-with-duplicate-content)\nand Vanessa Fox's\n[Duplicate content summit at SMX Advanced](/search/blog/2007/06/duplicate-content-summit-at-smx),\nboth of which give you some great tips on how to resolve duplicate content issues within your\nsite. Here's one additional tip to help avoid content on your site being crawled as duplicate:\ninclude the preferred version of your URLs in your Sitemap file. When encountering different pages\nwith the same content, this may help raise the likelihood of us serving the version you prefer.\nSome additional information on duplicate content can also be found in our comprehensive\n[Help Center article](/search/docs/advanced/guidelines/duplicate-content)\ndiscussing this topic.\n\n\nIn the second scenario, you might have the case of someone scraping your content to put it on a\ndifferent site, often to try to monetize it. It's also common for many web proxies to index parts\nof sites which have been accessed through the proxy. When encountering such duplicate content on\ndifferent sites, we look at various signals to determine which site is the original one, which\nusually works very well. This also means that you shouldn't be very concerned about seeing\nnegative effects on your site's presence on Google if you notice someone scraping your content.\n\n\nIn cases when you are syndicating your content but also want to make sure your site is identified\nas the original source, it's useful to ask your syndication partners to include a link back to\nyour original content. You can find some additional tips on dealing with syndicated content in a\nrecent post by Vanessa Fox,\n[Ranking as the original source for content you syndicate](https://www.vanessafoxnude.com/2008/05/14/ranking-as-the-original-source-for-content-you-syndicate/).\n\n\nSome webmasters have asked what could cause scraped content to rank higher than the original\nsource. That should be a rare case, but if you do find yourself in this situation:\n\n- Check if your content is still accessible to our crawlers. You might unintentionally have blocked access to parts of your content in your robots.txt file.\n- You can look in your Sitemap file to see if you made changes for the particular content which has been scraped.\n- Check if your site is in line with our webmaster guidelines.\n\n\nTo conclude, I'd like to point out that in the majority of cases, having duplicate content does\nnot have negative effects on your site's presence in the Google index. It simply gets filtered\nout. If you check out some of the tips mentioned in the resources above, you'll basically learn\nhow to have greater control about what exactly we're crawling and indexing and which versions are\nmore likely to appear in the index. Only when there are signals pointing to deliberate and\nmalicious intent, occurrences of duplicate content might be considered a violation of the\nwebmaster guidelines.\n\n\nIf you would like to further discuss this topic, you can visit our\n[Webmaster Help Group](https://support.google.com/webmasters/go/community).\n\nWritten by Sven Naumann, Search Quality Team"]]