Adam Sherk

  • About
  • SEO
  • Social Media
  • PR
  • Publishing

The Most Common Causes of Duplicate Content on News Media Sites

October 8, 2009 by Adam Sherk

Although search engines keep improving on their ability to deal with duplicate content this continues to be one of the main SEO issues facing news and content sites. Even when the engines do a reasonable job of filtering out duplicates in their results, sites are essentially shooting themselves in the foot by splitting internal and inbound links to a particular piece of content across multiple URLs. Therefore it is critical for publishers to diagnose and eliminate (or at least mitigate) the duplicate content issues on their sites.

Here are the most common causes of duplicate content on news media sites:

  • Tracking codes. Appending tracking codes to URLs (e.g. ?xid=rss or ?cid=top-stories) results in the same piece of content existing on multiple URLs – in some cases quite a large number of URLs. The canonical URL tag is a good way to mitigate this issue.
  • Publishing the same content in multiple sections. An article can be linked from as many sections and locations on the site as desired, but it should only exist on one unique, permanent URL.
  • Repurposing content in new packages. Media sites often pull existing content into new features/packages, typically to create attractive options for advertisers. For example a selection of movie reviews (that also exist in the film section) will be duplicated on a different template in a “What’s Hot This Summer” feature. Since the pages are not exactly the same the canonical URL tag is not the ideal solution, and publishers typically resist consolidating through permanent 301 redirects because they want the content to also exist in its original location. From a SEO perspective, the best approach is to avoid this practice altogether.
  • Syndication. Syndicating content is a common practice and an important revenue stream for publishers. But when the search engines encounter the same article on multiple sites it is likely that one version will be given prominence, and it may not always be the original. My post on syndication best practices covers ways to reduce the risk of being outranked for your own content.
  • CMS issues. Although content management systems have become more SEO friendly over the years most still cause a number of SEO problems, including duplicate content issues. The most common is printer-friendly pages. Or for example in photo galleries the first slide may appear on a different URL when you go back to it via the “previous” button. Conduct a comprehensive SEO site audit to identify CMS and site architecture issues (as well as editorial and marketing issues).

A few other notes on dealing with duplicate content:

  • This week Google specifically recommended against using robots.txt to block duplicates, which is a change from previous recommendations.
  • In the duplicate content session at SMX East, Google also announced that the canonical URL tag will work across domains by the end of the year (currently it only works with URLs on the same domain). As interesting, Yahoo and Bing admitted that they are still not supporting the current version of the tag but are hoping to by the end of year.
  • In September Google added a function to Webmaster Tools called Parameter Handling that allows sites to specify certain URL parameters that can be ignored during crawling. There is a good writeup on the duplicate content implications of this on Search Engine Land: Google Lets You Tell Them Which URL Parameters To Ignore

Related posts:

  1. Will Publishers Add Cross-Domain Rel=Canonical to Syndication Deals?
  2. Syndication Best Practices: Reduce the Risk of Being Outranked for Your Own Content
  3. Should Browsers Display Only Canonical URLs to Users?
  4. Publishers: Solve Tracking Code, Duplicate Content Issues with Rel=Canonical
  5. Which News Sites Have the Most Keywords in Common with eHow?

Leave a Reply

Your email address will not be published. Required fields are marked *

Please confirm you are a real person: * Time limit is exhausted. Please reload CAPTCHA.

Subscribe and Connect

Follow @adamsherk

About Adam Sherk

Adam Sherk is an SEO and PR consultant helping publishers with digital strategy and audience development, including enterprise SEO, public relations and social media marketing.

Adam is VP SEO and Digital Strategy for Define Media Group.

Recent Posts

  • 8 Critical SEO and Digital Marketing Opportunities for Publishers
  • How Much Google News Traffic Do Publishers Get? Here’s Data on 80 News Sites.
  • 10 Things That Expose Good Sites to Google Panda
  • Mobile Visits Account for 40% of All Traffic for Publishers
  • The Best Tools for Keyword Research, Competitive Analysis and Trending Topics

Categories

  • Public Relations
  • Publishing
  • SEO
  • Social Media

Popular Posts

  • The Most Overused Buzzwords and Marketing Speak in Press Releases
  • 5 Ways that Social Media Impacts SEO
  • Media Relations Gone Wrong: How Not to Pitch a Journalist (Video)
  • The Most Common Google News Errors and How to Avoid Them
  • SEO Guidelines for Sponsored Content and Partner Links
  • 8 Social Media Questions Publishers Should Be Asking Themselves
  • Top Link Building Tactics for Publishers
  • The Best Free (or Cheap) Tools for Blogger and Influencer Outreach
  • A Simple Workflow for Keyword Research for Content Planning
  • Beyond Facebook Insights: Useful Facebook Analytics Tools
  • The Best Free Tools for Twitter Analytics
  • Editorial SEO Tactics for the Newsroom
  • SEO Metrics for Publishers: How are You Tracking and Measuring Success?
  • Free Tools for Monitoring Hot Search Trends
  • Google News Optimization Tips

Disclaimer

The opinions expressed on this blog are Adam Sherk's own and do not necessarily reflect the views of his company or its clients.

Sitemap | RSS | © 2009-present AdamSherk.com