The Most Common Causes of Duplicate Content on News Media Sites

Although search engines keep improving on their ability to deal with duplicate content this continues to be one of the main SEO issues facing news and content sites. Even when the engines do a reasonable job of filtering out duplicates in their results, sites are essentially shooting themselves in the foot by splitting internal and inbound links to a particular piece of content across multiple URLs. Therefore it is critical for publishers to diagnose and eliminate (or at least mitigate) the duplicate content issues on their sites.

Here are the most common causes of duplicate content on news media sites:

  • Tracking codes. Appending tracking codes to URLs (e.g. ?xid=rss or ?cid=top-stories) results in the same piece of content existing on multiple URLs – in some cases quite a large number of URLs. The canonical URL tag is a good way to mitigate this issue.
  • Publishing the same content in multiple sections. An article can be linked from as many sections and locations on the site as desired, but it should only exist on one unique, permanent URL.
  • Repurposing content in new packages. Media sites often pull existing content into new features/packages, typically to create attractive options for advertisers. For example a selection of movie reviews (that also exist in the film section) will be duplicated on a different template in a “What’s Hot This Summer” feature. Since the pages are not exactly the same the canonical URL tag is not the ideal solution, and publishers typically resist consolidating through permanent 301 redirects because they want the content to also exist in its original location. From a SEO perspective, the best approach is to avoid this practice altogether.
  • Syndication. Syndicating content is a common practice and an important revenue stream for publishers. But when the search engines encounter the same article on multiple sites it is likely that one version will be given prominence, and it may not always be the original. My post on syndication best practices covers ways to reduce the risk of being outranked for your own content.
  • CMS issues. Although content management systems have become more SEO friendly over the years most still cause a number of SEO problems, including duplicate content issues. The most common is printer-friendly pages. Or for example in photo galleries the first slide may appear on a different URL when you go back to it via the “previous” button. Conduct a comprehensive SEO site audit to identify CMS and site architecture issues (as well as editorial and marketing issues).

A few other notes on dealing with duplicate content:

Leave a Reply

Your email address will not be published. Required fields are marked *