Spammin’ & Jammin’ With Craig’s List

Have you run out of great ideas when it comes to clever ways to get some link love from super trusted .org sites? If you have, here’s a tip. Surf on over to Craig’s List and browse the resume section. Copy the content of the oldest resumes you can find. Then repost those resumes with an off-topic title and text links embeded in the bottom of the page.

Just don’t pick a title that would draw the attention of the only people on the web who have a clue about the purpose of the text links you added. :)

Looking For a Few Good People Part 2

Back in February I posted that we were moving into a new office in March, and that we were looking to hire a few new people. Unfortunately, we ran into a few delays regarding the office space, so we put off the hiring until we got things worked out.

I’m happy to report that the new office will be operational by the end of the week, so we are now ready to do some hiring.

We need to fill the following positions asap:

Junior Level LAMP Programmer

You will be developing custom open source web applications for both client and company owned projects. C++, CSS/XHTML experience is a big plus.

CSS/XHTML Guru

You will be in charge of standards-compliant front-end development for both client and company owned projects. Many projects will be brand new start-ups, so strong corporate identity skills is a must.

Senior Level SEO

You will be working directly with me on client consulting projects. You will also oversee the implementation of all search marketing initiatives for company owned properties. Experience with PPC management and ROI tracking is a must.

Junior Level SEO

You will work directly with the Senior SEO and you will be responsible for competitive research, data collection, and general link development related activities.

These are full-time, in-house positions. We are located in Valencia, CA.

If you think you might be a fit, send your resume to:

jobs at webguerrilla.com

Turning Registration Back On

When I originally started this blog, I required users to register in order to comment. However, there seemed to be a bug in WordPress that caused the registration function to fail for some people. Rather than track the problem down, I just switched to running everything in pre-mod mode.

Unfortunately, the level of comment spam has grown to the point that using pre-mod alone has become a serious pain in the ass. So I’m going to go ahead and turn registration back on. If you have a problem registering, drop me an email through the contact form and we’ll try and track down the problem and get it fixed.

Google’s Crawl Caching Proxy

Just incase you missed it, Matt is back from Boston and has posted an indepth explanation of Google’s new crawl caching proxy. (The system that is responsible for Mediabot fetching pages).

Matt’s diagram pretty much matches what we’ve been seeing, although I still think there might be some issues regarding robots.txt, but we’re still collecting some data on that, so I’m going to wait until that’s done before commenting on it.

Another question that has popped up is whether or not Mediabot has been crawling pages that don’t have AdSense. Anyone else seeing that?

AdSense Bot Part 2

Just a quick follow-up on the whole Mediabot thing……

Many people seem to be jumping to the conclusion that adding AdSense code might be a new submission tool that will help a site get indexed faster. I think most webmasters would love it if that were the case, but I haven’t seen anything in all the examples I’ve looked at that would suggest that there is any truth to it .

All of the pages from this site that are in Google’s index cached in our Mediabot template originally appeared in the Google index cached in our Googlebot template. In each case, new pages that were created (and crawled by Mediabot within hours of being published) were not included in the Google index until after a visit by the regular Googlebot. In fact, we still have many pages indexed in the Googlebot template, even though they have been visited by the Mediabot after they were originally indexed.

I guess it’s possible that the initial collection of new urls found by Mediabot might be dumped into the Googlebot hopper, which in turn could potentially lead to Googlebot showing up sooner than it normally would, but I haven’t seen anything that would suggest that is happening.

The only thing I see happening is that content that already exists in Google’s database is occasionally being refreshed by Mediabot. And I don’t really have a problem with that happening. It makes sense from an efficiency standpoint.

What I do have a problem with is the fact Google didn’t think it was an important enough change to warrant any kind of public disclosure before it was implemented. Several months of BigDaddy discussions and not a single comment on the fact they were planning on making dramatic changes to they way they collect data.

The whole thing reminds me of when they decided to start crawling secure servers without telling anyone. That little fiasco caused all kinds of problems because no one bothered to exclude secure content because Google stated in their official documentation that they wouldn’t crawl it.

I would think that four years would be more than enough time to come up with a little better strategy when it comes to things like this. But I guess that’s not the case.

Anyone want to place some bets on how long it will take for Google to edit the AdSense FAQ ?

AdSense Bot Working Overtime

During last Tuesday’s Rockstar show, I mentioned that I had been working on a project that got a bit messed up due to the fact that Google’s Mediapartner bot (aka Mediabot) was being used to index content for Google’s database. We had setup some 301’s for Googlebot, but had neglected to redirect the Mediabot. The end result was a whole bunch of duplicate content due to the fact that we were serving Mediabot the old url, and Googlebot the new one. Both were getting indexed and added to the cache.

Matt was in the chat room when I made the original comments, and he said that he’d like to see some examples. So I thought I’d post one from this site.

The content of that post got indexed in a template that we only serve to AdSense. It has no navigation and no comments; just the actual post. We built this template to experiment with getting better ads to display. The idea being that it might be possible to get ads other than blog related products to show if we removed all the content that wasn’t part of the actual post.

The interesting thing to note about this page is that the post was originally made in January. And for quite sometime it had a cached page that was a representation of what Googlebot was given. But then Mediabot visited on April 7th. And the page it was served on that date ended up replacing the Googlebot version in the cache.

The Greatest Real Estate Agent in the World