Google News Not Always So Great at Favoring Original Sources

News publishers are all too familiar with the fact that Google News doesn’t always do such a great job of favoring original sources in its story clusters. It is not usual to see a syndicated, aggregated or simply re-covered version of an article gain prominent visibility over the original.

Today I noticed an example that is worth pointing out because you’d think Google News would be able to handle it properly.

On the home page the lead article in a story cluster about the Facebook music platform came from Tom’s Guide.

What caught my attention was the fact that in the article summary the original source CNBC was mentioned twice, yet the CNBC article did not appear in the cluster:

Google News Facebook Music story cluster

And in fact if you look at the Tom’s Guide article in addition to multiple CNBC citations there are two links (although one is nofollow) to the original CNBC article:

Tom's Guide Facebook Music article

You’d think that would be a strong enough signal for Google News to favor the CNBC original over Tom’s Guide.

But the CNBC article wasn’t present in the cluster and in clicking through to the Full Coverage page it was also nowhere to be found:

Google News Facebook Music Full Coverage

I clicked on each of the “All X related articles” links on the Full Coverage page but did not see the CNBC article anywhere.

Even a search on “cnbc facebook music” brought up the Tom’s Guide article but not the CNBC original:

Google News CNBC Facebook music results

The only way I could get the CNBC article to come up was with a “site:cnbc.com facebook music” search, confirming that it was in fact indexed in Google News:

Google News CNBC article indexed

In my post on Google News Optimization Tips I list the article ranking factors within a story cluster, in which Google specifically references that fact that it attempts to give priority to original articles and that Citation Rank is utilized to help determine original sources.

However that clearly was not working in this case.

Based on the story cluster ranking factors about the only advantage the Tom’s Guide article had going for it was the fact that it was about six hours more recent than the original (according to their Google News listings).

The CNBC article had more tweets, Facebook likes and Google +1’s than the Tom’s Guide piece, so social signals weren’t at play either.

I’m not directly familiar with either site’s SEO or news search optimization practices. The CNBC title tag is not great but both articles have decent headlines, which is more important for Google News. Perhaps Tom’s Guide is making better use of Google News sitemaps which calls better attention to their content.

But even if that were the case Tom’s Guide itself makes it abundantly clear that CNBC is the original source of the news, so you’d think Google News could account for that.

Doesn’t seem right, does it?

Comments

  1. Maybe if Tom’s Guide added some markup to help alert scrapers about the original source, it would be picked up. It seems to have none currently:

    Source : CNBC

    They could do something like:

    Microdata (well when using HTML5):

    RDFa (rNews) (can do now!)

  2. Ugh, seems HTML code snips are not shown as such, so please imagine super neato examples of Schema.org Microdata and IPTC rNews RDFa in the above post, as well as the source from Tom’s showing no markup on their source element :)

  3. Thanks for trying to share that Jayson. I sometimes use http://www.elliotswan.com/postable/ to convert HTML code into something that can be included in a post.

  4. James Crowley says:

    @jayson in theory they could but it’s not really in their interest, is it? And even if reputable sources do, you’ll still get the millions of scrapers that don’t.

    In my experience, the main overriding factor appears still to be recency. There’s at least one newspaper I know of in the UK that regularly republishes recent stories with new headlines in order to continue featuring in google news throughout the day…

  5. It’s a good point James, there’s not a lot of incentive for non-original sources to identify themselves as such.

    Google tried to get this going with the syndication-source and original-source tags http://googlenewsblog.blogspot.com/2010/11/credit-where-credit-is-due.html but to-date they don’t appear to have gotten much adoption.

  6. Great Post. I will have to agree with Jayson on giving credit where credit is due. I think if you are going to re-post and article, automated or not, you should always cite the source.

  7. Thanks Sarah. I agree on giving credit where credit is due, which doesn’t happen as often as it should. Unfortunately though even when credit is given through crawler-accessible attribution links and/or inline citations, that’s not always enough to stop a non-original source from being favored in a story cluster.

Leave a Comment

*


eight − = 6

Thanks for leaving a comment. In the "Name" field above, please use your personal name and not a company name or other phrase.