Google News Not Always So Great at Favoring Original Sources

News publishers are all too familiar with the fact that Google News doesn’t always do such a great job of favoring original sources in its story clusters. It is not usual to see a syndicated, aggregated or simply re-covered version of an article gain prominent visibility over the original.

Today I noticed an example that is worth pointing out because you’d think Google News would be able to handle it properly.

On the home page the lead article in a story cluster about the Facebook music platform came from Tom’s Guide.

What caught my attention was the fact that in the article summary the original source CNBC was mentioned twice, yet the CNBC article did not appear in the cluster:

Google News Facebook Music story cluster

And in fact if you look at the Tom’s Guide article in addition to multiple CNBC citations there are two links (although one is nofollow) to the original CNBC article:

Tom's Guide Facebook Music article

You’d think that would be a strong enough signal for Google News to favor the CNBC original over Tom’s Guide.

But the CNBC article wasn’t present in the cluster and in clicking through to the Full Coverage page it was also nowhere to be found:

Google News Facebook Music Full Coverage

I clicked on each of the “All X related articles” links on the Full Coverage page but did not see the CNBC article anywhere.

Even a search on “cnbc facebook music” brought up the Tom’s Guide article but not the CNBC original:

Google News CNBC Facebook music results

The only way I could get the CNBC article to come up was with a “ facebook music” search, confirming that it was in fact indexed in Google News:

Google News CNBC article indexed

In my post on Google News Optimization Tips I list the article ranking factors within a story cluster, in which Google specifically references that fact that it attempts to give priority to original articles and that Citation Rank is utilized to help determine original sources.

However that clearly was not working in this case.

Based on the story cluster ranking factors about the only advantage the Tom’s Guide article had going for it was the fact that it was about six hours more recent than the original (according to their Google News listings).

The CNBC article had more tweets, Facebook likes and Google +1’s than the Tom’s Guide piece, so social signals weren’t at play either.

I’m not directly familiar with either site’s SEO or news search optimization practices. The CNBC title tag is not great but both articles have decent headlines, which is more important for Google News. Perhaps Tom’s Guide is making better use of Google News sitemaps which calls better attention to their content.

But even if that were the case Tom’s Guide itself makes it abundantly clear that CNBC is the original source of the news, so you’d think Google News could account for that.

Doesn’t seem right, does it?


  1. says

    Maybe if Tom’s Guide added some markup to help alert scrapers about the original source, it would be picked up. It seems to have none currently:

    Source : CNBC

    They could do something like:

    Microdata (well when using HTML5):

    RDFa (rNews) (can do now!)

  2. says

    Ugh, seems HTML code snips are not shown as such, so please imagine super neato examples of Microdata and IPTC rNews RDFa in the above post, as well as the source from Tom’s showing no markup on their source element :)

  3. James Crowley says

    @jayson in theory they could but it’s not really in their interest, is it? And even if reputable sources do, you’ll still get the millions of scrapers that don’t.

    In my experience, the main overriding factor appears still to be recency. There’s at least one newspaper I know of in the UK that regularly republishes recent stories with new headlines in order to continue featuring in google news throughout the day…

  4. says

    Great Post. I will have to agree with Jayson on giving credit where credit is due. I think if you are going to re-post and article, automated or not, you should always cite the source.

  5. says

    Thanks Sarah. I agree on giving credit where credit is due, which doesn’t happen as often as it should. Unfortunately though even when credit is given through crawler-accessible attribution links and/or inline citations, that’s not always enough to stop a non-original source from being favored in a story cluster.

Leave a Reply

Your email address will not be published. Required fields are marked *