[TriLUG] MSN bot is pounding my website...
Robert Ryals
rryals at tmio.com
Thu Dec 9 15:14:19 EST 2004
gregbrown at mindspring.com wrote:
>The following is the number of hits from MSN bot, from all MSN bot IP addresses, to my webserver (through ALL historical logs I still have around):
>
> 1227 65.54.188.69
> 58 65.54.188.70
> 42 65.54.188.64
> 18 65.54.188.68
> 4 65.54.188.67
>
>
>If I look at all traffic to my website MSN bot is still on top
>
> 1227 65.54.188.69
> 127 192.58.204.226
> 59 65.54.188.70
> 42 65.54.188.64
> 29 64.244.30.79
> 24 66.196.91.227
> 19 65.87.170.103
> 19 129.33.49.251
> 18 65.54.188.68
> 17 66.26.93.162
>
>
>I know it's from MSN because it leaves the following in my log:
>"msnbot/0.3 (+http://search.msn.com/msnbot.htm)"
>
>I assume over at MSN they are trying to scrape the Internet to build up their own web search engine. I am curious if others are seeing this same activity.
>
>The command I used for these queries was (as root in /var/log/httpd):
>
>for msn bot
>cat access_log| grep msnbot | awk '{ print $1 }' | sort | uniq -c | sort -gr | head
>
>and
>
>for all hits
>cat access_log| awk '{ print $1 }' | sort | uniq -c | sort -gr | head
>
>Greg
>
>
You can prevent this by adding a few lines to your apache config file.
<Directory /var/www/htdocs>
SetEnvIfNoCase User-Agent "msnbot" bad_bot
Deny from env=bad_bot
</Directory>
More information about the TriLUG
mailing list