Ruby Could Replace my Python Crawler Pretty Soon
Code, Crawlers July 28th, 2008One of my developers just sent me some truly incredible stats about Ruby 1.9 and its threading performance.
20 threads * 100,000 iterations
Ruby 1.9 = 1.54 s.
Ruby Enterprise = 3.01 s.
JRuby 1.1.2 = 5.82 s.
Jython 2.2.1 = 11.86 s.
Python 2.5.2 = 12.32 s.
Ruby 1.8.7 = 22.68
Since our attempt at testing Ruby as a crawler really wasn’t all that much slower than Python it could be really interesting to see what will happen with Ruby 1.9.
The blog post about the test (Its in Polish)








July 29th, 2008 at 8:05 am
You’re presuming that your bottleneck was the threading.
July 29th, 2008 at 9:58 am
It’s kind of hard to evaluate metrics without seeing the actual code. A lot of micro benchmarks aren’t indicative of real world performance.
Tim Bray’s widefinder project might be a good reference (both your and his are IO bound). In the end, programmer proficiency is probably the most important factor in speed.
July 29th, 2008 at 5:21 pm
I don’t get it. Crawlers are network-bound; the speed of your implementation language has virtually no importance.
July 30th, 2008 at 8:53 am
@Phil If your crawler is network bound, then you need more pipe.
July 30th, 2008 at 8:54 am
@Michael You are correct.
July 30th, 2008 at 8:58 am
@Curtis: Here is those polish guy’s test code. I haven’t run it yet: http://pastie.org/private/jtqfdbloc83wqqnk525mzw