Google, duplicate content caused by URL parameters, and you
Stay organized with collections
Save and categorize content based on your preferences.
Wednesday, September 12, 2007
How can URL parameters, like session IDs or tracking IDs, cause duplicate content?
When user and/or tracking information is stored through URL parameters, duplicate content can
arise because the same page is accessible through numerous URLs. It's what Adam Lasnik referred
to in
"Deftly Dealing with Duplicate Content"
as "store items shown (and—worse yet—linked) via multiple distinct URLs." In the
example below, URL parameters create three URLs which access the same product page.
Why should you care?
When search engines crawl identical content through varied URLs, there may be several negative
effects:
Having multiple URLs can dilute link popularity. For example, in the diagram above, rather than
50 links to your intended display URL, the 50 links may be divided three ways among the three
distinct URLs.
Search results may display user-unfriendly URLs (long URLs with tracking IDs, session IDs)
Decreases chances of user selecting the listing
Offsets branding efforts
How we help users and webmasters with duplicate content
We've designed algorithms to help prevent duplicate content from negatively affecting webmasters
and the user experience.
When we detect duplicate content, such as through variations caused by URL parameters, we group
the duplicate URLs into one cluster.
We select what we think is the "best" URL to represent the cluster in search results.
We then consolidate properties of the URLs in the cluster, such as link popularity, to the
representative URL.
Consolidating properties from duplicates into one representative URL often provides users with
more accurate search results.
If you find you have duplicate content as mentioned above, can you help search engines understand
your site?
First, no worries, there are many sites on the web that utilize URL parameters and for valid
reasons. But yes, you can help reduce potential problems for search engines by:
Removing unnecessary URL parameters—keep the URL as clean as possible.
Submitting a Sitemap
with the canonical (that is, representative) version of each URL. While we can't guarantee that
our algorithms will display the Sitemap's URL in search results, it's helpful to indicate the
canonical preference.
How can you design your site to reduce duplicate content?
Because of the way Google handles duplicate content, webmasters need not be overly concerned with
the loss of link popularity or loss of PageRank due to duplication. However, to reduce duplicate
content more broadly, we suggest:
When tracking visitor information, use 301 redirects to redirect URLs with parameters such as
affiliateID, trackingID, etc. to the canonical version.
Use a cookie to set the affiliateID and trackingID values.
If you follow this guideline, your webserver logs could appear as:
Please be aware that if your site uses cookies, your content (such as product pages) should remain
accessible with cookies disabled.
How can we better assist you in the future?
We recently published ideas from
SMX Advanced
on how search engines can help webmasters with duplicate content. If you have an opinion on the
topic, please join our conversation in the
Webmaster Help Group
(we've already started the thread).
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[[["\u003cp\u003eDuplicate content can arise from URL parameters that track user or session information, leading to the same content being accessible via multiple URLs.\u003c/p\u003e\n"],["\u003cp\u003eSearch engines may consolidate duplicate URLs into a cluster and select a representative URL for search results to enhance user experience.\u003c/p\u003e\n"],["\u003cp\u003eWebmasters can reduce duplicate content issues by minimizing URL parameters, using sitemaps to indicate canonical URLs, and utilizing 301 redirects or cookies for tracking purposes.\u003c/p\u003e\n"],["\u003cp\u003eGoogle's algorithms aim to mitigate the negative impacts of duplicate content on link popularity and search result display.\u003c/p\u003e\n"],["\u003cp\u003eWhile duplicate content is a common issue, webmasters are encouraged to optimize their sites to minimize potential problems and improve search engine understanding.\u003c/p\u003e\n"]]],["URL parameters, such as session or tracking IDs, can create duplicate content by making the same page accessible through multiple URLs. Search engines group these duplicate URLs into a cluster and select a representative URL, consolidating link popularity to it. To reduce duplicate content, users can remove unnecessary parameters, submit a Sitemap with canonical URLs, and use 301 redirects for tracking parameters. Cookies can also be employed for storing affiliate or tracking IDs while ensuring content is accessible with cookies disabled.\n"],null,["Wednesday, September 12, 2007\n\nHow can URL parameters, like session IDs or tracking IDs, cause duplicate content?\n\n\nWhen user and/or tracking information is stored through URL parameters, duplicate content can\narise because the same page is accessible through numerous URLs. It's what Adam Lasnik referred\nto in\n\"[Deftly Dealing with Duplicate Content](/search/blog/2006/12/deftly-dealing-with-duplicate-content)\"\nas \"store items shown (and---worse yet---linked) via multiple distinct URLs.\" In the\nexample below, URL parameters create three URLs which access the same product page.\n\nWhy should you care?\n\n\nWhen search engines crawl identical content through varied URLs, there may be several negative\neffects:\n\n1. Having multiple URLs can dilute link popularity. For example, in the diagram above, rather than 50 links to your intended display URL, the 50 links may be divided three ways among the three distinct URLs.\n2. Search results may display user-unfriendly URLs (long URLs with tracking IDs, session IDs)\n - Decreases chances of user selecting the listing\n - Offsets branding efforts\n\nHow we help users and webmasters with duplicate content\n\n\nWe've designed algorithms to help prevent duplicate content from negatively affecting webmasters\nand the user experience.\n\n1. When we detect duplicate content, such as through variations caused by URL parameters, we group the duplicate URLs into one cluster.\n2. We select what we think is the \"best\" URL to represent the cluster in search results.\n3. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL.\n\n\nConsolidating properties from duplicates into one representative URL often provides users with\nmore accurate search results.\n\nIf you find you have duplicate content as mentioned above, can you help search engines understand\nyour site?\n\n\nFirst, no worries, there are many sites on the web that utilize URL parameters and for valid\nreasons. But yes, you can help reduce potential problems for search engines by:\n\n- Removing unnecessary URL parameters---keep the URL as clean as possible.\n- Submitting a [Sitemap](/search/docs/crawling-indexing/sitemaps/overview) with the canonical (that is, representative) version of each URL. While we can't guarantee that our algorithms will display the Sitemap's URL in search results, it's helpful to indicate the canonical preference.\n\nHow can you design your site to reduce duplicate content?\n\n\nBecause of the way Google handles duplicate content, webmasters need not be overly concerned with\nthe loss of link popularity or loss of PageRank due to duplication. However, to reduce duplicate\ncontent more broadly, we suggest:\n\n1. When tracking visitor information, use `301` redirects to redirect URLs with parameters such as `affiliateID`, `trackingID`, etc. to the canonical version.\n2. Use a cookie to set the `affiliateID` and `trackingID` values.\n\nIf you follow this guideline, your webserver logs could appear as: \n\n```\n127.0.0.1 - - [19/Jun/2007:14:40:45 -0700] \"GET /product.php?category=gummy-candy&item=swedish-fish&affiliateid=ABCD HTTP/1.1\" 301 -\n127.0.0.1 - - [19/Jun/2007:14:40:45 -0700] \"GET /product.php?item=swedish-fish HTTP/1.1\" 200 74\n```\n\nAnd the session file storing the raw cookie information may look like: \n\n```\ncategory|s:11:\"gummy-candy\";affiliateid|s:4:\"ABCD\";\n```\n\n\nPlease be aware that if your site uses cookies, your content (such as product pages) should remain\naccessible with cookies disabled.\n\nHow can we better assist you in the future?\n\n\nWe recently published ideas from\n[SMX Advanced](/search/blog/2007/06/duplicate-content-summit-at-smx)\non how search engines can help webmasters with duplicate content. If you have an opinion on the\ntopic, please join our conversation in the\n[Webmaster Help Group](https://groups.google.com/group/Google_Webmaster_Help-Requests/topics)\n(we've already started the thread).\n| **Update:** for more information, please see our [Help Center article on canonicalization](/search/docs/crawling-indexing/consolidate-duplicate-urls).\n\nWritten by [Maile Ohye](/search/blog/authors/maile-ohye)"]]