<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>tony spencer &#187; Python</title>
	<atom:link href="http://www.tonyspencer.com/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tonyspencer.com</link>
	<description>It&#039;s Just Links</description>
	<lastBuildDate>Thu, 09 Apr 2009 13:19:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language></language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Crazy Python Crawler</title>
		<link>http://www.tonyspencer.com/2008/01/07/crazy-python-crawler/</link>
		<comments>http://www.tonyspencer.com/2008/01/07/crazy-python-crawler/#comments</comments>
		<pubDate>Mon, 07 Jan 2008 18:23:07 +0000</pubDate>
		<dc:creator>tony</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Crawlers]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.tonyspencer.com/2008/01/07/crazy-python-crawler/</guid>
		<description><![CDATA[Someone emailed me doubting my crawler could operate at the speeds I posted last week so here is a video I took this morning. I should have waited a few minutes after launching it before starting the video as it really starts cranking once all the threads get rocking and you can see that near [...]]]></description>
			<content:encoded><![CDATA[<p>Someone emailed me doubting <a href="http://www.tonyspencer.com/2008/01/03/big-seos-with-crawlers-lets-see-your-stats/">my crawler</a> could operate at the speeds I posted last week so here is a video I took this morning. I should have waited a few minutes after launching it before starting the video as it really starts cranking once all the threads get rocking and you can see that near the end of the video.  Also notice my streaming internet radio going in and out thanks to no available bandwidth left on my 5Mbps line.<br />
 <img src='http://www.tonyspencer.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>You can also hear a ticking sound.  That is my new 1TB drive.  It makes these weird ticking noises even when its not in use.  REally sounds like the arm hitting something its not supposed to hit.  Hope its not defective.</p>
<p><object type="application/x-shockwave-flash" data="http://blip.tv/scripts/flash/showplayer.swf?enablejs=true&#038;file=http%3A//blip.tv/rss/flash/590136&#038;feedurl=http%3A//notsleepy.blip.tv/rss/&#038;autostart=false&#038;brandname=tony%20spencer&#038;brandlink=http%3A//notsleepy.blip.tv/" width="400" height="255" allowfullscreen="true" id="showplayer"><param name="movie" value="http://blip.tv/scripts/flash/showplayer.swf?enablejs=true&#038;file=http%3A//blip.tv/rss/flash/590136&#038;feedurl=http%3A//notsleepy.blip.tv/rss/&#038;autostart=false&#038;brandname=tony%20spencer&#038;brandlink=http%3A//notsleepy.blip.tv/" /><param name="quality" value="best" /></object></p>
<p><a href="http://blip.tv/file/584448/">Video link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.tonyspencer.com/2008/01/07/crazy-python-crawler/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Big SEO&#8217;s with Crawlers: Lets See Your Stats</title>
		<link>http://www.tonyspencer.com/2008/01/03/big-seos-with-crawlers-lets-see-your-stats/</link>
		<comments>http://www.tonyspencer.com/2008/01/03/big-seos-with-crawlers-lets-see-your-stats/#comments</comments>
		<pubDate>Fri, 04 Jan 2008 04:07:39 +0000</pubDate>
		<dc:creator>tony</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Crawlers]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.tonyspencer.com/2008/01/03/big-seos-with-crawlers-lets-see-your-stats/</guid>
		<description><![CDATA[OK I&#8217;m just ecstatic with my new crawler, I think nobody but Google has one better than me, and I&#8217;m ready for a good old fashion show-and-tell. Multi-threaded programming is a bear to deal with and I&#8217;ve written several crawlers in different languages.  For years I&#8217;ve been plagued with several complex problems:
* Complex code [...]]]></description>
			<content:encoded><![CDATA[<p>OK I&#8217;m just ecstatic with my new crawler, I think nobody but Google has one better than me, and I&#8217;m ready for a good old fashion show-and-tell. Multi-threaded programming is a bear to deal with and I&#8217;ve written several crawlers in different languages.  For years I&#8217;ve been plagued with several complex problems:</p>
<p>* Complex code that is difficult to maintain and difficult to setup on a server<br />
* Memory leakage<br />
* Configurability</p>
<p>So the latest design is just 192 lines of <a href="http://www.python.org/download/">Python</a> in a single file, has a single configuration file, and takes about 5 minutes to setup on a standard Linux machine.  I ran it last night and was delighted with the results:</p>
<p><strong>Test Run</strong><br />
Tested 139,740 urls<br />
Completed in 2 hrs, 13 mins<br />
3.6 GB of html<br />
Average filesize: 25.05 KB</p>
<p><strong>Averaging</strong><br />
18.2 urls/second<br />
<em><font color=darkblue><strong>1.572 million</strong></font></em> urls/day</p>
<p><strong>Hardware and Environment</strong><br />
3 year old Dell Poweredge SC240<br />
Pentium 4<br />
3.5 GB of RAM<br />
Average CPU load: 0.16<br />
Average physical RAM used: 950 MB<br />
OS: Ubuntu 7.10 (Gutsy Gibbon)<br />
Filesystem: ReiserFS 3</p>
<p><strong>Network connection:</strong><br />
Residential cable modem 5Mbps down (of which 100% is consumed when its running so likely to be faster on a fatter pipe)</p>
<p>Even better this code is infinitely extensible.  We&#8217;ll spread it across as many machines as necessary to download the entire internet.</p>
<h1>Big SEO&#8217;s with Crawlers&#8230; what are your stats?</h1>
]]></content:encoded>
			<wfw:commentRss>http://www.tonyspencer.com/2008/01/03/big-seos-with-crawlers-lets-see-your-stats/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

