Archive

Author Archive

Duplicate Content *Can* Penalize

February 4th, 2008 tony No comments

duplicate contentFor some time now I’ve been telling clients and friends that publishing duplicate content will not cause you to receive a penalty but that Google will only choose one version of a unique piece of content that it believes to be the authority and refuse to allow other copies to be indexed. So if you publish a copy of one of my blog posts, Google will likely allow my original copy to rank but yours won’t be found.

I think I’ve discovered that enough duplicate content can actually do harm to a domain.

I had an old site we’ll call oldsite1.com. I was publishing fresh, unique, well written content there several times a day. oldsite1.com would always enjoy nice rankings for the content published there and new content was indexed quickly. I had always intended to eventually 301 redirect all of oldsite1.com’s pages to newsite1.com which would be hosting identical content. Past experience tells me that the 301 will cause all of oldsite1.com’s backlinks and authority to transfer over to newsite1.com and within days I’d see the new site perform nearly as well as the old site’s.

Now here is the mistake I made: some time ago I setup newsite1.com to mirror oldsite1.com (for some offline promotional reasons). I had zero backlinks to newsite1.com but it was crawled and indexed anyway. Obviously it was 100% duplicate content and nothing but duplicate content. But I didn’t worry too much about it. The day came to 301 redirect and within days the traffic plummeted. Its been several weeks and no recovery has happened.

Categories: Google, Search Engine Optimization Tags:

Firefox Keyboard Commands for Back/Forward and Page Up/Down

January 28th, 2008 tony 3 comments

It drives me nuts that I cannot hit command-left-arrow, and command-right-arrow for going back and forward in Firefox when the cursor is in a text field which is always the case when navigating Google search results thanks to some javascript that sets the focus on the search field. Finally I found an alternative:

Back: command-[
Forward: command-]

If I remember correctly I started to have this problem with alt-left-arrow, alt-right-arrow on Firefox for Windows so maybe alt-[, alt-] will also be a fix for those of you suffering in Windows.

Also, I’ve always missed a page up, page down buttons on my Macbook Pro and just found these handy tools:

Page Up: spacebar
Page Down: shift-spacebar

Categories: Computers Tags:

Crazy Python Crawler

January 7th, 2008 tony 4 comments

Someone emailed me doubting my crawler could operate at the speeds I posted last week so here is a video I took this morning. I should have waited a few minutes after launching it before starting the video as it really starts cranking once all the threads get rocking and you can see that near the end of the video. Also notice my streaming internet radio going in and out thanks to no available bandwidth left on my 5Mbps line.
:)

You can also hear a ticking sound. That is my new 1TB drive. It makes these weird ticking noises even when its not in use. REally sounds like the arm hitting something its not supposed to hit. Hope its not defective.

Video link

Categories: Code, Crawlers, Python Tags:

Big SEO’s with Crawlers: Lets See Your Stats

January 3rd, 2008 tony 9 comments

OK I’m just ecstatic with my new crawler, I think nobody but Google has one better than me, and I’m ready for a good old fashion show-and-tell. Multi-threaded programming is a bear to deal with and I’ve written several crawlers in different languages. For years I’ve been plagued with several complex problems:

* Complex code that is difficult to maintain and difficult to setup on a server
* Memory leakage
* Configurability

So the latest design is just 192 lines of Python in a single file, has a single configuration file, and takes about 5 minutes to setup on a standard Linux machine. I ran it last night and was delighted with the results:

Test Run
Tested 139,740 urls
Completed in 2 hrs, 13 mins
3.6 GB of html
Average filesize: 25.05 KB

Averaging
18.2 urls/second
1.572 million urls/day

Hardware and Environment
3 year old Dell Poweredge SC240
Pentium 4
3.5 GB of RAM
Average CPU load: 0.16
Average physical RAM used: 950 MB
OS: Ubuntu 7.10 (Gutsy Gibbon)
Filesystem: ReiserFS 3

Network connection:
Residential cable modem 5Mbps down (of which 100% is consumed when its running so likely to be faster on a fatter pipe)

Even better this code is infinitely extensible. We’ll spread it across as many machines as necessary to download the entire internet.

Big SEO’s with Crawlers… what are your stats?

Categories: Code, Crawlers, Python Tags:

Google Adwords Testing Icons Next to Ads?

December 12th, 2007 tony No comments

Weirdness. I’ve never seen clipart next to Adwords in the SERPs but look at these funky Christmas tree icons next to the ads when searching for NC LLC Department:

adwords icons

Categories: Adwords, Google Tags: