Archive

Author Archive

A feature that Yahoo has that I wish Google had:

July 30th, 2008 tony No comments

can't spell

Google sometimes I know better than you and want to see results without you thinking for me.

Categories: Google Tags:

Ruby Could Replace my Python Crawler Pretty Soon

July 28th, 2008 tony 6 comments

One of my developers just sent me some truly incredible stats about Ruby 1.9 and its threading performance.


20 threads * 100,000 iterations
Ruby 1.9 = 1.54 s.
Ruby Enterprise = 3.01 s.
JRuby 1.1.2 = 5.82 s.
Jython 2.2.1 = 11.86 s.
Python 2.5.2 = 12.32 s.
Ruby 1.8.7 = 22.68

Since our attempt at testing Ruby as a crawler really wasn’t all that much slower than Python it could be really interesting to see what will happen with Ruby 1.9.

The blog post about the test (Its in Polish)

Categories: Code, Crawlers Tags:

I’m convinced Twitter needs a complete rewrite

July 3rd, 2008 tony 2 comments

… from scratch. Yeah the performance is incredibly horrible and I really feel like I could take a small chunk of that $15 million and immediately make the performance rock, but I am starting to feel like the developers who built it complete cocked it all up. The damn thing can’t even store my replies right and I’ve heard others complain of this. I take it is bad AJAX code and I think they should reconsider using AJAX for basic functions.

Example:
I replied to @benwills‘ tweet :

with:

but if you click on the ” in reply to benwills” link in my reply it goes to the wrong tweet.

Categories: Uncategorized Tags:

Twitter is giving Rails a bad name

June 5th, 2008 tony 2 comments

twitterificUggh. Rebuild it already. Its only a few actions. It wasn’t built for this kind of app.

Python, C, Perl, whatever.

Categories: Ruby on Rails, Social Networks Tags:

Google Crawling HTML Forms IS Harmful to Your Rankings

June 3rd, 2008 tony 5 comments

A couple of months ago Google officially announced it would be “exploring some HTML forms to try to discover new web pages“. I imagine more than a few SEO’s were baffled by this decision as was I but were probably not too concerned about the decision as Google promised us all “this change doesn’t reduce PageRank for your other pages” and would only increase your exposure in the engines.

During the month of April I began to notice a lot of our internal search pages were not only indexed but outranking the relevant pages for a user’s query. For instance, if you Googled “SubConscious Subs” the first page to appear in the SERP’s would be something like:
http://raleigh.ohsohandy.com/ads/search?q=tables

rather than the page for the establishment:
http://raleigh.ohsohandy.com/review/27571-sub-concious-subs

This wasn’t just a random occurrence. It was happening a lot and in addition to the landing pages being far less relevant for the user, they weren’t optimized for the best placement in the search engines so they were appearing in position #20 instead of say position #6. These local search pages even had pagerank usually between 2 and 3.

Hmm, Just How Bad is This Problem

Eventually I began to realize how often I was running into this in Google, noticed my recent, slow, decline in traffic and it occurred to me this may be a real problem. I’ve never linked to any local search pages on OhSoHandy.com and I couldn’t see that anyone else had either. I queried to find out how many search pages Google had indexed:

Google submits forms

Whoa. 5,000+ pages of junk in the index with pagerank. I slept on it for a night, got up the next morning and plugged in

Disallow: /ads/search?q=*

in robots.txt (and threw in a meta robots noindex on those pages for safe measure). Within a week we saw a big improvement in rankings due to properly optimized pages trumping crap and traffic is up 25% since the change and back to trending upwards weekly instead of stagnant, slow decline.

Get outta here!

Bit of Advice

The robots.txt disallow works but it is slow to remove the URL’s from Google’s index. I added the meta noindex tag to the search pages a week later and saw much faster results.

Categories: Google, Search Engine Optimization Tags: