Home > Google, Search Engine Optimization > Google Crawling HTML Forms IS Harmful to Your Rankings

Google Crawling HTML Forms IS Harmful to Your Rankings

A couple of months ago Google officially announced it would be “exploring some HTML forms to try to discover new web pages“. I imagine more than a few SEO’s were baffled by this decision as was I but were probably not too concerned about the decision as Google promised us all “this change doesn’t reduce PageRank for your other pages” and would only increase your exposure in the engines.

During the month of April I began to notice a lot of our internal search pages were not only indexed but outranking the relevant pages for a user’s query. For instance, if you Googled “SubConscious Subs” the first page to appear in the SERP’s would be something like:
http://raleigh.ohsohandy.com/ads/search?q=tables

rather than the page for the establishment:
http://raleigh.ohsohandy.com/review/27571-sub-concious-subs

This wasn’t just a random occurrence. It was happening a lot and in addition to the landing pages being far less relevant for the user, they weren’t optimized for the best placement in the search engines so they were appearing in position #20 instead of say position #6. These local search pages even had pagerank usually between 2 and 3.

Hmm, Just How Bad is This Problem

Eventually I began to realize how often I was running into this in Google, noticed my recent, slow, decline in traffic and it occurred to me this may be a real problem. I’ve never linked to any local search pages on OhSoHandy.com and I couldn’t see that anyone else had either. I queried to find out how many search pages Google had indexed:

Google submits forms

Whoa. 5,000+ pages of junk in the index with pagerank. I slept on it for a night, got up the next morning and plugged in

Disallow: /ads/search?q=*

in robots.txt (and threw in a meta robots noindex on those pages for safe measure). Within a week we saw a big improvement in rankings due to properly optimized pages trumping crap and traffic is up 25% since the change and back to trending upwards weekly instead of stagnant, slow decline.

Get outta here!

Bit of Advice

The robots.txt disallow works but it is slow to remove the URL’s from Google’s index. I added the meta noindex tag to the search pages a week later and saw much faster results.

Categories: Google, Search Engine Optimization Tags:
  1. June 4th, 2008 at 12:58 | #1

    What are you talking about? A resulting indexed page for “tables” is “oh so” relevant! :-\ I’ve noticed this to on several Drupal sites of mine – and have no idea why they would automatically populate and rank search pages. Talk about junking up a search index, that’s one way to do it.

  2. June 4th, 2008 at 14:29 | #2

    Tony,

    It’s not just search pages. They are crawling forms that they should never be looking at even:

    http://smackdown.blogsblogsblogs.com/2008/05/23/googlebot-creates-pages-instead-of-simply-indexing-them-new-form-crawling-algo-goes-bad/

    This is creating exact duplicates of pages, not just near duplicate search results.

  3. June 4th, 2008 at 14:36 | #3

    As a side note, looking at my comment, you might want to check out the Chunk URL’s plugin for Wordpress. :)

    http://www.village-idiot.org/archives/2006/06/29/wp-chunk/

    Automatically shortens URL’s so they don’t go outside of the containers.

  4. tony
    June 5th, 2008 at 10:44 | #4

    Thanks for the plugin tip and the other blog post on the form crawling Michael.

    Funny. A few years ago I intentionally provided easy to crawl paths for googlebot to my internal site search pages (which also had PPC ads on them :) ) and within a week they indexed 30k of these pages. Another week later they deindexed the entire site.

    Still boggles me why they are doing this intentionally now.

  5. June 7th, 2008 at 01:01 | #5

    Oh, it’s boggling, alright. I just recently had a Googler suggest that one of the reasons my images directory got deindexed was because I had cached some search pages to support some of the posts I had written… right in the middle of them screwing up the index with this clutter.