Archive for the ‘Code’ Category

Big SEO’s with Crawlers: Lets See Your Stats

January 3rd, 2008 tony 9 comments

OK I’m just ecstatic with my new crawler, I think nobody but Google has one better than me, and I’m ready for a good old fashion show-and-tell. Multi-threaded programming is a bear to deal with and I’ve written several crawlers in different languages. For years I’ve been plagued with several complex problems:

* Complex code that is difficult to maintain and difficult to setup on a server
* Memory leakage
* Configurability

So the latest design is just 192 lines of Python in a single file, has a single configuration file, and takes about 5 minutes to setup on a standard Linux machine. I ran it last night and was delighted with the results:

Test Run
Tested 139,740 urls
Completed in 2 hrs, 13 mins
3.6 GB of html
Average filesize: 25.05 KB

18.2 urls/second
1.572 million urls/day

Hardware and Environment
3 year old Dell Poweredge SC240
Pentium 4
3.5 GB of RAM
Average CPU load: 0.16
Average physical RAM used: 950 MB
OS: Ubuntu 7.10 (Gutsy Gibbon)
Filesystem: ReiserFS 3

Network connection:
Residential cable modem 5Mbps down (of which 100% is consumed when its running so likely to be faster on a fatter pipe)

Even better this code is infinitely extensible. We’ll spread it across as many machines as necessary to download the entire internet.

Big SEO’s with Crawlers… what are your stats?

Categories: Code, Crawlers, Python Tags:

Ad Blockers can Ruin Your Legitimate Web App that Isn’t Even Serving Ads

November 21st, 2007 tony 2 comments

Since rebranding some of our old classifieds sites and relaunching the system as in a newly built Ruby on Rails app we’ve received a handful of emails complaining about strange behavior that always involved links not appearing for the user.

How do you read the rest of the postings or see any pictures that were uploaded?!?! There are no links on the classifieds to keep reading them. Please help since I am new to the website.

At first I discounted this as user error. “These fools don’t know how to use the internets!” DELETE.
Read more…

Categories: Code, Ruby on Rails Tags:

New columns not immediately available in migrations

July 4th, 2007 tony 4 comments

Sometimes you add a column to a table in a migration and then you want populate the new column with some data. Run your migration and while your column has been created in the database, your data does not populate. The problem is that those columns are not accessible via ActiveRecord and so you just need to tell it to update itself:

add_column :user, :favorite_beer, :string
User.reset_column_information  #<<<<<<<< Here is the ActiveRecord reload
tony = User.find_by_name "Tony Spencer"
tony.favorite_beer = "Terrapin Rye Pale Ale"
Categories: Code, Ruby on Rails Tags:

Lighthouse Bug Tracking Review

June 21st, 2007 tony 5 comments

We’ve been using Basecamp for some time now to manage multiple projects and I have really enjoyed it except for the lack of integrated issue/bug tracking. I’ve tried hacking to-do lists and categorizing messages but I just can’t make Basecamp work for our issue tracking even though I don’t need fancy features. I just want to rapidly log/assign issues to team members, change status, and reassign back to me when the issue is completed.

For years I’ve been using Mantis and it works but its quirky and rather slow to work with as the interface isn’t designed all too well. There is also some stupid bug that makes it impossible for me to sort issues by different columns. I’ve just signed up for Lighthouse and here are a few pros and cons I’ve noticed immediately:

  • As a technical manager I like to be able to enter bugs/issues quickly w/out using the mouse. Basecamp to-do lists are very nice this way as I can quickly type, tab, and hit space bar to enter an item and assign it to someone. The create ticket feature forces me to pickup the mouse and click several places which slows things down. It would also be very nice if it tickets were created with AJAX as to-do items in BC are done so I can very quickly fill up peoples queue . (Hey my guys work fast so I have to enter bugs fast!)
    new issue
  • It’s not very apparent which project I’m currently managing. Only the small drop down on the right lets me know. I wish Lighthouse would make the current project name more prominent like in Basecamp. Also it would be quicker to bounce around between projects if they were a list of links rather than a select list.
  • There is no issue tracking in Basecamp which is why I am giving this nice looking app a try. However, I would continue to use Basecamp for other aspects of the project. It would be great if they could drop in my URL to a project in Basecamp when I create the project in Lighthouse so it could provide me that link in the right nav so I could jump back there.
  • I like the ability to add an avatar to users in Lighthouse. Helps to make it easier to see who did what and gives it a personal touch.
  • The “feature updates” box is taking up too much of the real estate on every page and never goes away.
    new issue
  • The top header is a little too big and is wasting space above the fold hindering me from seeing more without scrolling.
  • I like the ability to pay with PayPal subscription which got me up and running very quickly
  • The ability to create a simple “Page” is nice. Currently we have a writeboard in one project in Basecamp that we keep all info about our server setup in such as gems to install, cron jobs, where files exist, and how to deploy. The problem with that is I can’t share it with everyone without adding everyone to that project and it really isn’t specific to that one project. Pages solves that in Lighthouse. I will now also add pages like coding best practices, and subversion how to’s.

I know I published a lot of negatives here but on the whole I’m liking this hosted app and would love to get away from stinking Mantis and managing my own bug tracking system. I’ll post more updates as we use it more.

Update to Lighthouse Issue Tracking

It looks like they removed the banner that was wasting space which is nice. However, one BIG problem I discovered:

I cannot use a “pre” tag to drop in HTML and not have it rendered by the browser which makes it very hard for me to show a designer or developer some html I want them to use.

Also I can now tab to the field where you select a user to assign a ticket to but I still cannot change that field without picking up the mouse and clicking on it.

Damn I wish there were a simple interface for entering bugs that looked something like this :)


Categories: Bug Tracking, Code, Ruby on Rails Tags:

PHP vs. Ruby on Rails

June 9th, 2007 tony 3 comments

Again, not funny if you are stuck in PHP land trying to sync up with your team’s latest database changes:

Categories: Code, PHP, Ruby on Rails Tags: