In the first article, we talked about How we recovered 65% of our website traffic after the Panda 4.0 update., In this article, we are going to focus on the concrete actions we have been taking since 2011 to recover from the Panda updates.
Everytime, we look for information about Panda, we realize that all the guides focus on one thing; “Offering quality content”, but we needed to know which concrete actions PortalProgrammas needed to take to recover from the Panda updates. Furthermore, we found few documented cases of websites that have recovered from Panda.
For this reason, we decided to make this guide: we thought it was necessary to provide the details on the actions we took , what has worked out and why; Our experience can help a lot of people know what they can do to improve their content and recover from Panda. This work is also a way for us pay back the help we have found on the Internet thanks to hundreds of people who work on guides, blog posts, articles and reviews on Panda.
Let´s look at the actions we have taken since 2011 to recover from Panda:
1st Rerwrite product reviews
One of the most costly problems was duplicate texts in some of the program descriptions. It was impossible to know how much text was copied and in what programs. We decided to play it safe and, for any of the programs that we had doubts about containing duplicated content, we went through re-analyzing and rewriting these descriptions.
At this point, we also found several hundred descriptions that had very short texts that were less than 100 words, and we decided to rewrite these descriptions too.
In total, there were over 5,000 program descriptions to rewrite. To solve this problem as quickly as possible, we expanded our workforce with four people dedicated to only analyzing programs and rewriting descriptions. The reviews changed to be more extensive, with at least 200 words, images, and videos when possible, and better controls to ensure the quality of the content.
Some of the programs were poor quality, so we deleted them showing a 404 error page in their place. In other cases, there were different versions of the same program (pages that were too similar). In this case, we kept the pages with the most downloads and we removed the others and put a 301 redirection in their place to the first one.
While the descriptions were being rewritten we put the "noindex" tag for all of them so that Google would not index these pages, and it rated us better. However, we did not see any improvements using this tag.
2nd The Syndication Of RSS Content Delayed
It is easy for scrapers to get our content through the RSS content. In our case, we offer a large number of RSS feeds so that each user can find the one that suits him better. But, this makes it a lot easier for the scraper sites to copy all of our content and, the worst of all, for Google also index them before us.
To solve this, there are guides to help you protect your RSS feeds with tips like adding only a summary of the articles or links to your content. It is important that every entry in your RSS feed always contains a link to your content: this way, the feeds always generate links to your original content. Even with it, there are scrapers that remove these links and, later, we will look at what to do with those.
In our case, the RSS feed already had links and we wanted to publish entire articles in our feeds to make it easier for our users to read the content. So, we decided to delay publishing our RSS feed content for 3 hours. This way, it is much more likely for Google to find and index the original content in our site before the content published in the scraper sites.
3rd Ask other sites to remove the copied content from us
We found various scrapers: sites that copied hundreds of pages from our portal. In some cases they were done with the RSS feeds, other times they were blatant copies of downloads from the portal, and some of these sites even positioned themselves better than us. This was a problem, so we made a list of the most important ones and contacted each one of them individually. The majority of them quickly deleted the content; and, for others, it was more difficult to find the creators of the portal (WHOIS privacy) and get the content removed. This led us to seek the help of lawyers in some cases. Still, we were able to get the content removed by most of the website owners we contacted.
At this point, we always recommend to deal with this situation in the best and easy way; many webmasters were completely unaware of the damage they were causing, and they quickly apologized and removed the content after we contacted them and explained about Panda update.
In these cases, you can also send Google a scrape report to report them. There is also a tool to help remove content from its index. And, from the US, if you attach a DMCA complaint to it, it will be even better.
Bonus tip: Do not waste a lot of time with this. We believe the focus should be on creating quality content. Think that with the same amount of time that you spend trying to have content removed, you can published 2 to 4 more pages of original content; and, this will bring you much more benefit.Focus on removing the content that can really cause a problem; the rest of the time, use it to create original content.
4th Do not overuse “noindex”
In an effort to remove "simple" pages with little content, we decided to include the “noindex" on the contact page, pages with filters and on paginations of the search engine and of the categories.
There were thousands of pages deindexed and, this was a complete error. It was a decision too drastic that did not work in all the cases. We fixed it and continue to apply other solutions to the other pages; solutions that we still use today and that we consider the most correct ones:
- Pages like the contact us page can be indexed, there is no problem
- The paginations, we labelled them with the "rel=next" tag and "rel=prev tag" that we will see later.
- The pages with filters use the "canonical" tag to indicate which is the original page.
This way, we are correctly tagging each type of page, and we let Google decide what to do with them.
5th Use the conical tag massively
A problem that came out from the Panda update is that one page can be accessed from infinite URLs, for example adding parameters:
Even when they show the same content, they are two different URLs and, search engines interpreted them as two different pages. This makes that these pages are seen as duplicated content.
A way to remedy this problem is to use the tool to manage the parameters that we have in Google Webmaster Tools. This gives you the possibility to specify which parameters to ignore. Thus, you can ignore the pagination parameters, ordering, affiliate codes, etc...
But for us, this was not the solution: there are infinite ways to generate duplicated URLs. The solution was to use the canonical tag. With canonical, we can specify which is the URL of the original content that we are viewing, and thus, no matter which URL the users see, the search engines will always understand the pages our site correctly.
Therefore, we studied the help that Google offers and a guide for implementing it. This was a difficult decision, thought out and very monitored because poorly implementing this tag can be disastrous for the site due to the fact that it sends an indication to Google about which pages are the ones it should index and which are not.
6th Use of the rel=next and rel=prev tag
The pagination can give you major headaches because search engines can give more importance to a secondary page rather than to the original. To prevent it and to make search engines understand the structure of our site, it is not advisable to use “noindex” or “canonical” tags for these pages, but rather tags “rel=next” and “rel=prev” that will indicate the relation between the different pages.
We followed this guide for implementing rel=next and rel=prev.
Although it is not directly duplicated content, as a global site strategy, we are interested in search engines perfectly understanding the structure of our site because this will help them avoid committing errors when tagging our site.
7th Use of the rel=autor tag
Including authorship information of the content in the search results from Google search has certain advantages. Although, from the point of view of Panda, the most important one is that it helps to create credibility and authority to the content thanks to associating a person. According to Matt Cutts:
"I'm pretty excited about the ideas behind authorship. Basically, if you can move from an anonymous web to a web where you have some notion of identity. And, maybe even reputation of individual author. Then web spam, you kind of get the benefits for free."
You need to follow a series of steps to link the Google account of an editor with our site and, using a tool, we can check if it is well done. All of our program pages include the rel=author tag.
8th Unique content in different languages
When we decided to expand the site and offer more languages, we knew that this would be another source of problems with duplicated content: Pages in different languages are very similar by only changing text, the automatic translations cause problems… In addition, search engines can make mistakes when displaying the correct pages to users according their language or country.
We decided to work with every new language in an independent form, with our own team of people, native speakers, and without automated translation tools. This was a major investment of time and money, but it helps us offer users the best quality content, at the same time that it benefits the portal.
We used the tag rel=alternate hreflang='x' in all our program records to help Google understand the relation between them and, thus, displaying them correctly to every user. Then, you can use this tool to test if the tags are used correctly.
In every program record, we show links tagged with “rel=alternate” to the same program record in the different languages.
Bonus: If you want to expand and work with various languages/countries do not use automated translation tools; the only thing that you will accomplish is being tagged as low quality content. Hire native writers: they will write better and they will be able to include more keywords in their texts. Your texts will be much more valuable, and it will achieve better positions in search engines. You can find them in sites like oDesk.com or eLance.com
Bonus: If the text is translated, you need to tag it correctly to prevent Google from considering it automatically generated content (or low quality). Yet, tagging it this way gives less importance to these texts than to the ones that were done without automated translations tools. It is best that people write the texts manually. Here, you have an list of errors in an international SEO strategy.
9th Improve load speeds of the site
The load speed of pages is an important factor for users. Google also takes this into account when it comes to positioning a site.
In our case, page load times were a problem, and we decided to focus on improving it for all the countries in which we work. There are a lot of guides on how to improve speed of a website; the majority of them are very similar. We followed guides making many technical changes that seemed endless: changes on the server level, sprites CSS, cache, etc… And, one that worked best for us was using the CDN for static content. You can see very good comparisons of CDN services or looking at all well-known providers.
Bonus: There is another more advanced solution, which can give you more website speed, the dynamic acceleration of portals. However, the cost of these is considerably higher, and we recommend this only for larger portals where speed is a key factor.
Bonus: take into account the limits of tracking and indexing
Many people think the more pages that are indexed, the better. And, it is not always this way, and less since Panda arrived where pages with low quality means to have a bad reputation. The first pages indexed need to be, in the first place, the most important ones, the ones that it has to crawl more, and the ones you want to be sure that are indexed.
Google assigns a limit for how many pages can be crawled per day for every website and another for the pages indexed. This is a dynamic number and it depends on the authority of a website. But, it is important to take it into account not to bombard the Google robot with pages that have little relevant content, and that later it cannot arrive to the pages that provide quality content. For example, if it has to crawl 100 pages of pages with filters and with ordering… maybe, it will not be able to crawl all your product descriptions.
The solution may be different depending on the case. We can use fewer links on secondary pages on our site, we can use "no follow", although Cutts recommends not to do so. In our case, we limited the number of secondary pages that we have to make it easier for the indexing of the more important ones.
From PortalProgramas, we recommend thoroughly reviewing the affected site to find all the possible sources of duplicated content, this is the most important. The work that is needed to recover from Panda is intensive, sometimes complicated and, at some points, risky. But, it can be done.
Today, it is more important to write an interesting quality article per week, rather than writing one per day that is only going to be read by your friends. The quality is first, and every day, it is becoming harder to trick the search engines. In the long run, websites that are kept are the ones that do things properly: in content, design, usability…
Fill your site with useful content…and do not forget about eliminating the low quality content. In some occasions, there are difficult decisions to make when removing low quality content that can hurt website traffic. Because keeping this content affects your overall rankings of your site in Google and, because for every visit that reaches this poor content, you are probably losing a lot more with the negative evaluation.
What do we think about Google Panda?
Despite everything, Google Panda has generally been a positive change for the Internet. It was necessary to stop the ease by which it was copied and positioned low quality content, the Made For Adsense (MFAs). This change has caused many content farms to disappear from the market; others, to adapt to changes and, to all of them to change the rules for everyone to creating quality content.
About PortalProgramas... Follow Us!
We are a software portal created in 2003 with programs for Windows, Mac and smartphones. We have a technology observatory where we do studies and interviews in the software industry. If you are interested in apps, and you have an Android or an iPhone , you can follow us on Facebook, Youtube, on Twitter or on Google+