I have never worked with an e-commerce platform that was completely free of duplicate content. Some platforms are better at containing the spread. But a change of settings or code can accidentally produce duplicates – ie different pages with different URLs for the same content.
The effects of bots discovering and indexing these duplicates are wasted average capital, slow discovery of new content and shared link authority.
Depending on the platform and implementation, any of the following seven problems can produce duplicate content.
I have never worked with an e-commerce platform that was completely free of duplicate content …
Keyword URLs mask the unfriendly versions that an e-commerce platform would otherwise create. The system still generates the unfriendly URL for its own purposes and then maps your keyword URL to it. For example:
- Original, Unfriendly URL: /shop/en/US845US845/69i57j0l72750j1j1.htm
- Keyword URL: /house/widget.htm
The good news is that unfriendly URLs have no consumer value. To fix, 301 redirect URLs redirect to their friendly variant.
Products that are in several categories or sub-categories may have different URLs. For example, a widget in the "House" and "Apartment" categories may have identical pages with separate URLs that reflect the category, as in:
Buyers need to see all of the product page variation as they navigate, so we can't redirect one to the other. We have two choices to fix.
First, create unique product pages for each category. This is suitable for products that have several cases. Conversely, assign a category's product URL as primary for one-time products and apply a canonical tag to each secondary URL. (Canonical tags are snippets of HTML that specify a different URL as the primary or "canonical" version for that page.)
If your faceted navigation is indexable, visitors may be able to take different click paths to arrive at the same content. Displaying the click path in the URL provides rich keyword signals for both visitors and search engines, but it also produces duplicate content, such as:
To fix, insert a canonical tag. As long as the facets are represented in the URL, it doesn't matter.
Lowercase and lowercase letters are different characters and can therefore host different content sections, which search engines can index separately. For example:
Some of these would only be indexed if someone accidentally links to them, for example from typos in URLs.
Visitors do not need to see different fall variants. Therefore, you can apply canonical tags or 301 redirects to a URL, usually with small versions.
Protocols and subdomains
The protocol (http against. https) and subdomain (with and without www) URL variations also introduce duplicate content. For example, the following URLs can generate four pages of identical content that search engines can index:
To fix, redirect 301 to the canonical version.
The sorting feature that allows visitors to organize products by price, popularity, rating or other criteria is crucial for usability. But it generates many pages with low search value. For example, only the first one below should be indexed.
To fix, 301 redirects all versions to the canonical version. Alternatively, you encode the sorting function using a form of AJAX that search engines cannot crawl, which removes the possibility of duplicate content.
Previous versions of a website
Modifying your site in a way that influences how products are categorized or labeled can change the URLs for those pages. Many platforms just abandon the old URLs and move on to new ones. For example, say your website had a "Home and Garden" category with the URL /Home/. Separating into separate categories would give two new URLs: /Home/ and /garden/.
Later, if you renamed the "Home" category to "House", the URL can be changed accordingly to /House/. The result can be two orphaned URLs:
To fix, 301 redirects the two orphaned pages or, alternatively, returns a 404 error. However, a 301 is better because it will eventually cause the search engines to re-index the URL and, more importantly, attribute the orphaned URL's accumulated link access to the landing page.