MSNBot? Or Something Else?

By | 2010-09-26

I’ve been getting a ton of hits from what appears to be MSNbot.  But is it really?  I’ve gotten thousands of hits from this little beastie, just over the last week or so.  Normally I wouldn’t mind getting this much attention from a search engine.

The problem is that it appears to be generating random filenames and requesting them from the web server.  This generates a 404, or file not found message.  The problem is, every time there is a 404 on the site, I get an email.  

Here is an example of what it is requesting (watch for line wrap):

2010-09-26 07:06:28 W3SVC3 GET /httpkbindianaedudataagazhtml cust=94316029283131 80 - 65.55.55.211 HTTP/1.1 msnbot/2.0b+(+http://search.msn.com/msnbot.htm)._ - - www.fortypoundhead.com 404 0 0 6841 255 3203

So in this example (there are tons more entries in the log), the bot purports to be MSNbot 2.0b, or the Bing search engine.  And the IP address matches up to a block owned by Microsoft.

Another thing I have noticed is that the bot doesn’t appear to be honoring robots.txt instructions.  For example, if I have a crawl delay in there, it is ignored, and the bot will crawl around the site for an hour at full speed.

I’ve heard that others have experienced the bot completely ignoring directory exclusions as well, however this is only hearsay.  I’ve not seen this behavior myself.

So for now, until this naggy little bot gets under control, I’ll just be throwing the class C net block into the 403 list.

Anybody else got any input on this? Seen this weirdness on your webserver? or something worse?

Author: dwirch

Derek Wirch is a seasoned IT professional with an impressive career dating back to 1986. He brings a wealth of knowledge and hands-on experience that is invaluable to those embarking on their journey in the tech industry.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.