Google Panda was designed to go after sites with a high volume of thin or low-quality content (among other things) in an effort to improve the overall strength of the search results. But the unfortunate reality is that with just about every Panda update well-established brands with good content get caught in the filter.
We work with a lot of major publishers and nearly all of them have had at least a few titles get hit by Panda in recent years. The types of sites vary but most are trusted, authoritative brands with large audiences that have produced quality, popular content for years.
Here’s an example of site that was hit by Panda 4.0 and recovered with Panda 4.1:
The good news is that each site has eventually been able to recover. But it is not always an easy process and it often takes significant changes and considerable time. And when search traffic declines substantially for several months or longer that has serious repercussions for the business.
So I thought I’d share some of the most common causes of Panda problems for reputable publishers. For in-depth Panda analysis I also recommend checking out Glenn Gabe, who frequently shares useful information on the subject.
Hopefully this will help others to avoid a similar fate. Like it or not, taking a preemptive approach against Panda and other forms of algorithmic filtering has become a fundamental part of SEO. And nearly every site has some degree of technical, design and editorial issues that are potential risk factors.
Here are 10 things that cause good sites to get caught up in Google Panda:
- Templates with limited text – many content sites have certain non-article templates that by design have relatively little editorial text. While not usually a problem by itself, when combined with other Panda risk factors this can be problematic. Instituting a minimum word count is helpful.
- Slideshows – slideshows and photo galleries are frequently used by publishers both for the visual experience and to increase pageviews. But they can also be a thin content factory and a source of duplicate page elements. So it is essential to carefully manage their setup and design, in particular the amount of content per slide and whether or not each individual slide can be indexed. Many are moving to setups in which the full content is accessible on the main page and that is the only URL that can be indexed.
- Mobile – with mobile averaging 40% of total traffic and continuing to grow, some editorial teams are modifying their writing style to be shorter and more succinct to make the content more digestible on mobile devices. Ironically this has potential to create thin content issues, so there is a balance that needs to be maintained.
- Viewability – advertising viewability has become a top priority for publishers, which has the consequence of altering the content-to-ad ratio above the fold. There is some overlap here with the “top heavy” page layout algorithm but on templates with limited editorial text this becomes a risky combination. It is critical to maintain a substantial amount of editorial content above the fold. (Update: for more on this last point see the comment from Michael Cottam below).
- Other advertisement and promotion issues – overuse of interstitial, prestitial and other forms of interruptive ads and promos can also be problematic, in part because they can lead to poor engagement signals. If it seems like it could be annoying to users, it probably is.
- Duplicate content or page elements – having a significiant number of pages with limited text that also have duplicate content or page elements (such as title tags and headings) is another common problem. Publishing the same piece on more than one URL (which often happens unintentionally due to CMS or migration issues), URL canonicalization issues and duplication caused by tracking codes can also contribute to Panda problems, particularly when there are other issues at play.
- Syndicated content – publishing a large amount of syndicated content has been problematic for some brands, particularly when that content appears on quite a few other sites. It is important that non-original content remains a small percentage of the indexable pages on the site.
- Overlapping content – overlapping content can sometimes cause problems, particularly if one version is fairly thin. An example of this would be publishing a blog post, article and gallery all on the same subject, with each version repeating certain portions of the same text and sharing the same headlines and title tags.
- Internal search and tag pages – internal search can be a source of an excessive number of thin, overlapping and partially duplicate pages. If search pages are allowed to be indexed (and often it is best not to) they need to be managed carefully. A similar issue can occur with tag pages. These are commonly used on content sites and by themselves they don’t tend to cause Panda problems. But when tag pages start to become a fairly high percentage of the total pages on the site, and many of them are overlapping or have very few links, this can be problematic.
- Incorrect or empty pages – it is not unusual for a site to have a number of incorrect or even empty pages that were never meant to exist, let alone be indexed. This can inadvertently add a lot of thin, low-value pages and potentially soft 404 errors to the site. Legacy issues and unintended problems related to redesigns and migrations are the most common sources. While typically not serious enough to trigger a Panda hit on their own, they can be a contributing factor.
The Path to Panda Recovery
Recovering from Panda typically requires a mix of template and technical changes combined with the elimination, consolidation and de-indexation of low-value pages through strategic use of 404s, permanent redirects, rel=canonical and meta robots “noindex” tags, among other things.
When we do Panda analysis for a site I always emphasize a couple things:
- The path to recovery cannot be definitively known. The most likely causes will be identified but it takes a process of trial and error to address enough of the offending issues to break a site out of the filter.
- Even after the necessary changes are made the recovery will not be instantaneous. Once a site has cleaned up its profile, breaking free of Panda requires a rolling update by Google. For a while these would happen roughly monthly and it is apparently happening more frequently now, which is encouraging. But in some cases it takes a larger Panda update for a site to break out, and that happens only periodically.
So if you have been hit by Panda, time and patience are also a necessary part of the process.