Home > Code, Crawlers > Ruby Could Replace my Python Crawler Pretty Soon

Ruby Could Replace my Python Crawler Pretty Soon

One of my developers just sent me some truly incredible stats about Ruby 1.9 and its threading performance.

20 threads * 100,000 iterations
Ruby 1.9 = 1.54 s.
Ruby Enterprise = 3.01 s.
JRuby 1.1.2 = 5.82 s.
Jython 2.2.1 = 11.86 s.
Python 2.5.2 = 12.32 s.
Ruby 1.8.7 = 22.68

Since our attempt at testing Ruby as a crawler really wasn’t all that much slower than Python it could be really interesting to see what will happen with Ruby 1.9.

The blog post about the test (Its in Polish)

Categories: Code, Crawlers Tags:
  1. Michael Campbell
    July 29th, 2008 at 08:05 | #1

    You’re presuming that your bottleneck was the threading.

  2. July 29th, 2008 at 09:58 | #2

    It’s kind of hard to evaluate metrics without seeing the actual code. A lot of micro benchmarks aren’t indicative of real world performance.

    Tim Bray’s widefinder project might be a good reference (both your and his are IO bound). In the end, programmer proficiency is probably the most important factor in speed. ;-)

  3. Phil
    July 29th, 2008 at 17:21 | #3

    I don’t get it. Crawlers are network-bound; the speed of your implementation language has virtually no importance.

  4. tony
    July 30th, 2008 at 08:53 | #4

    @Phil If your crawler is network bound, then you need more pipe.

  5. tony
    July 30th, 2008 at 08:54 | #5

    @Michael You are correct.

  6. tony
    July 30th, 2008 at 08:58 | #6

    @Curtis: Here is those polish guy’s test code. I haven’t run it yet: http://pastie.org/private/jtqfdbloc83wqqnk525mzw