In working with publishers one of the things I’m most frequently asked to do is troubleshoot problems with Google News.
Anyone involved with a news or content site can attest to the importance of Google News and news search optimization. And I’d venture to say just as many people have experienced a range of technical and formatting issues that have impeded the indexation and performance of their news content.
To help with that, I’ve compiled a list of the Google News errors – and publisher mistakes – that I most often come across, along with a recommended solution for each.
Eligibility and Indexation
Let’s start with a few basic mistakes that do actually occur with some frequency.
- Not an approved source – site not showing up in Google News? First make sure you are an approved source, otherwise there is no point in creating a Google News sitemap or complying with the technical and content guidelines. A quick way to check is to do a [site:domain] search in Google News. If you are not an approved source you can apply using this form.
- Mixing domains – a related issue that sometimes crops up is when a news property is spread out over multiple domains. If the additional domains are not in the Google News database they won’t be considered part of the approved source, and thus won’t be indexed. If a Google News sitemap has been submitted for an unapproved domain this will trigger an “unknown news site” error in the Google Webmaster Tools profile.
- Google News sitemap – another mistake, or rather a missed opportunity, is not creating a Google News sitemap. Your articles will still get indexed via regular site crawling, but the news sitemap helps to expedite the process. It also enables you to segment your news content (i.e. articles and blog posts) from other forms of editorial content that are not eligible for indexation. With breaking news and trending topics speed counts; don’t put yourself at a competitive disadvantage. Along those lines also make sure that your Google News sitemap is updated immediately whenever new, eligible content is published.
- Unknown publication name – an error that I frequently come across is the “unknown publication name” error for the Google News sitemap, which is reported in the Sitemaps section of the site’s Google Webmaster Tools profile. This error is triggered when the publication name used in the <name> tags in the Google News sitemap is not an exact match to the publication name in Google News’ database. To determine the correct name to use do a [site:domain] search in Google News and see what publication name is displayed for the indexed articles.
- Improper URL structure – Google News requires that articles be published on unique, permanent URLs that contain at least three digits. The three-digit requirement is not enterprise-friendly but fortunately it is not required if you use a Google News sitemap. Just make sure that your Google News sitemap contains all eligible news articles and that it is frequently updated (which you should be doing anyway).
The other plus of a Google News sitemap is it gives you the ability to provide additional information such as image-related tags, keywords and genre tags.
Article Format / Content Errors
Beyond the fundamental issues covered above, the majority of Google News errors are related to the format or content of specific articles. Many of these errors are reported in the Crawl Errors section of a site’s Google Webmaster Tools profile.
There is a good rundown of all the news-specific crawl errors in the Google News publisher help section so I’m not going to cover them all here. But I will point out the ones that I most frequently come across:
- Non-literal headlines – this is an editorial mistake as opposed to a technical error, but it is one that comes up a lot so I want to emphasize it. Unlike regular Web search, Google News places more weight on the article headline than the HTML title tag. So the tactic of offsetting witty, print-style headlines with customized title tags works relatively well for Web search (although this is still sub-optimal) but it does not work for news search.
- Headline not H1 – make sure your on-page headline is in an H1 heading tag, and that it is the only H1 on the page. Doing so makes it easier for Google News to correctly extract the headline. On a related note, heading tag coding is changing with HTML5, but for SEO purposes there should still be only one H1 per template at this time (especially on articles).
- Title not found – this error is less common but occasionally an article template is coded in such a way that the editorial headline is not easily identifiable or even accessible to crawlers. Be sure to place your headlines in simple HTML text right at the top of the article body, in an H1 tag.
- Incorrect byline or date – every now and then Google News will pull in an incorrect author or date for a particular article. This is usually related to the template design or the way it is coded. Make sure both the author byline and the date are close to the headline and easily accessible to crawlers. Also avoid including additional names or dates in prominent locations that could lead to a misinterpretation.
- Article too short or article disproportionally short – these two crop up with some frequency especially on blogs, which are more likely to publish some fairly short pieces. The minimum requirement for Google News is 80 words, but I’ve seen articles just over that figure still trigger the error. For any news content that you want to be indexed, include a minimum of 100 words and preferably 250+.
- Article fragmented – this is another common error that typically occurs when the article body is broken up with things like lists, tables or sometimes embedded multimedia (or even embedded tweets). Another thing that can trigger the error is placing interstitial “speedbump” links too high up in the article body. These issues can usually be offset by including at least 80-100 continuous words prior to inserting such elements into the article. A blog post with a string of very short paragraphs (many just a single sentence) will also trigger this error on occasion.
- Article too long – this one is fairly rare but every once in a while an article gets flagged for being too long. Google News does not provide a maximum word count, but the cases I’ve seen were very long blog posts that covered several different topics at length. Beyond the indexation issue, in such cases it is better to create a separate article for each topic so that each one is more focused and better able to compete for related searches.
- No sentences found – I’ve seen this error recently on articles that do in fact have multiple sentences on the page. It could be a glitch, but it seems to be triggered by blog posts in which all of the editorial content is contained in a single, large block of text instead of being broken up into separate paragraphs. That type of formatting should be avoided regardless since it is more difficult for users to read.
- Page too large – this does not come up often, but on occasion something gets added to a template (usually in a sidebar or other module outside of the article body) that greatly increases the size of the page. The maximum permitted size is 256KB. You don’t want pages that large for a variety of reasons, so stay well clear of that figure.
- Date too old – this is one of the most frequently reported news crawl errors in Google Webmaster Tools. Anything that was encountered in a site crawl that is older than 30 days will trigger this error. Since content that old is no longer eligible for indexation these errors can be disregarded. Just make sure your Google News sitemap contains only recent articles. The Google News guideline is to only include articles from the past two days.
The news_keywords meta tag was launched to help offset this issue and give publishers more leeway. But as I covered in Google News news_keywords Meta Tag: More Cons than Pros? you still need descriptive, well-balanced headlines for Web search, social media and even for news search. So treat the news_keywords tag as a supplement to your Google News optimization efforts, not as a replacement for sound, fundamental editorial SEO.
I should point out that many of the article too short, article disproportionally short and article fragmented errors (and sometimes the title not found errors) reported in Google Webmaster Tools end up being for non-article content like galleries that were encountered during a site crawl. Since such content is not eligible for Google News indexation anyway those errors can be disregarded. Just make sure your Google News sitemap includes only articles and blog posts.
Reaching Out to Google News
Simply put, Google News is a strange beast and despite your best efforts there are going to be odd situations and errors that occur from time to time. Many will be triggered by something you’ve done (or haven’t done), but sometimes there will be no readily apparent cause.
Those of you who work for well-established news brands probably have some degree of contact with Google News, as publisher relations is something that is important to them. So you can utilize those relationships when needed. For those that don’t the Google News help forum is a decent option.
But even if you have a contact point, you don’t want to be hitting them up very often; save that for when you really need it.
Most of the time your best course of action when typical troubleshooting efforts have failed is to fill out one of these forms:
In my experience these forms do actually get paid attention to, particularly if you are an established news site. It may take some time to hear back but you will typically get a response. And if it is in fact something on their end, they’ll address it as possible.