View Single Post
  #22 (permalink)  
Old 06-27-2007, 11:46 AM
josephx josephx is offline
WebProWorld New Member
 
Join Date: Jun 2007
Posts: 5
josephx RepRank 0
Default Re: Search Bots Eating Bandwidth

Thank you all for the suggestions and advice!

@Sheriff: When I put this line RewriteRule ^.* - [F,L] it will give me an internal server error, so I just remove the F, I hope it doesn't make the code useless if I remove it?

@seo4china: Yes my contents are both in Chinese and English. Baidu crawls heavily on my site.. I also need to block Yahoo! slurp China, but when I put
RewriteCond %{HTTP_USER_AGENT} ^Yahoo! slurp China [OR] it will give me a 500 internal server error.

I submitted my site to a B2B search engine (Jayde.com), it did help increase my page ranking but so much BW is lost.

Now i have these on my .htaccess

########## Block unwanted robots ##########
RewriteCond %{HTTP_USER_AGENT} ^Twiceler-0.9 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider+ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^YodaoBot/1.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5.0 [OR]
RewriteCond %{HTTP_REFERER} ^baidu\.com
RewriteRule ^.* - [L]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ http://www.mysite.com/image.jpg [R,NC]

Could somebody please check if anything's wrong with the codes? Just hope they won't eat up my remaining 500mb BW, otherwise I need to get a paid hosting.

My Robots/Spiders Visitors:

Unknown robot (identified by 'spider')181016+69 2.73 GB 27 Jun 2007 - 03:34
Yahoo! Slurp China24489+45 707.65 MB 27 Jun 2007 - 03:16
Googlebot12582+21 308.10 MB 27 Jun 2007 - 03:33
Unknown robot (identified by 'bot/' or 'bot-')4529+98 119.71 MB 26 Jun 2007 - 15:41
Unknown robot (identified by 'robot')4219+5 93.20 MB 26 Jun 2007 - 12:18
Yahoo Slurp1591+516 39.52 MB 27 Jun 2007 - 03:32
Ask1005+20 26.21 MB 21 Jun 2007 - 16:58
MSNBot574+313 14.50 MB 26 Jun 2007 - 23:03
Unknown robot (identified by 'crawl')438+8 10.84 MB 27 Jun 2007 - 02:18
MSNBot-media92+4 2.29 MB 26 Jun 2007 - 22:52
Unknown robot (identified by hit on 'robots.txt')0+10 2.77 KB 24 Jun 2007 - 09:38
Alexa (IA Archiver)1+2 29.33 KB16 Jun 2007 - 06:26

Last edited by josephx; 06-27-2007 at 12:25 PM.
Reply With Quote