iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Internet Security Discussion Forum This forum is for the discussion of security related issues. If you find a new Phishing scheme, spyware, virus or malicious site - let us know about it. If any of the above found you... here's where you ask for help.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 06-26-2007, 07:52 AM
WebProWorld New Member
 
Join Date: Jun 2007
Posts: 5
josephx RepRank 0
Default Search Bots Eating Bandwidth

Hello everybody,

I have submitted my site to a search engine a few months ago, but lately, unknown robots eats up my 2.5 GB Bandwidth in just about 10-20 days. I only have about 2 short videos and about 2-30 images on my site, so there's no reason why it would eat up that amount of bandwidth.

I have checked my access log, and found out:
38.99.13.123 - - [23/Jun/2007:20:44:45 +0900] "GET /t/imagery/ HTTP/1.0" 404 17507 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"

61.135.162.52 - - [23/Jun/2007:20:44:53 +0900] "HEAD /folder/folder/folder/content.html HTTP/1.1" 200 0 "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)"

They were crawling on my site every minute!

May I know how to block these robots using the .htacces? Please let me know if this is not the right forum to discuss about this issue.

Thank you.
Reply With Quote
  #2 (permalink)  
Old 06-26-2007, 11:14 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 7,968
Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4
Default Re: Search Bots Eating Bandwidth

Add in your .htaccess file the following lines:

order allow,deny
deny from 38.99.13.123
deny from 61.135.162.52
allow from all
Reply With Quote
  #3 (permalink)  
Old 06-26-2007, 01:57 PM
chrisJumbo's Avatar
WebProWorld Veteran
 
Join Date: Oct 2005
Location: California
Posts: 314
chrisJumbo RepRank 1
Default Re: Search Bots Eating Bandwidth

Webnauts, I know you posted a very lenghty .htaccess file on another thread. Does having such a large file slow things down, such that good users are hindered?

We seem to be having quite a few different IP addresses seemingly attemtp to attack our form posting (I made a post about this in a different thread). The IP address only appears once and then a different address is used. Seems like it would be incredibly difficult to stop all of them.

Plus, can't everyone who uses a larger host like AOL or Yahoo/SBC/ATT be showing a single IP address and blocking that would hinder the good people from seeing our site?

Oh, the joys of "success". :O)

cd :O)
__________________
CD Rates | CD Rates Blog | Banking Online
Reply With Quote
  #4 (permalink)  
Old 06-26-2007, 02:17 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,743
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default Re: Search Bots Eating Bandwidth

Quote:
Originally Posted by chrisJumbo View Post
Webnauts, I know you posted a very lenghty .htaccess file on another thread. Does having such a large file slow things down, such that good users are hindered?
Not at all.
Reply With Quote
  #5 (permalink)  
Old 06-26-2007, 02:28 PM
bj's Avatar
bj bj is offline
WebProWorld 1,000+ Club
 
Join Date: Apr 2005
Location: Delaware Valley, PA
Posts: 1,208
bj RepRank 2bj RepRank 2
Default Re: Search Bots Eating Bandwidth

You can also use this in robots.txt, which should help:

User-agent: * #put spider name in here, or leave it as wildcard
Crawl-delay: 10

And some spiders will recognize this, which is part of the proposed new robots exclusion spec:
User-agent: * #put spider name in here, or leave it as wildcard
Request-rate: 1/5 # maximum rate is one page every 5 seconds
Visit-time: 0600-0845 # only visit between 6:00 AM and 8:45 AM UT (GMT)

Last edited by bj; 06-26-2007 at 02:31 PM. Reason: clarification
Reply With Quote
  #6 (permalink)  
Old 06-26-2007, 02:30 PM
bj's Avatar
bj bj is offline
WebProWorld 1,000+ Club
 
Join Date: Apr 2005
Location: Delaware Valley, PA
Posts: 1,208
bj RepRank 2bj RepRank 2
Default Re: Search Bots Eating Bandwidth

Oops, forgot this-- it's a robots.txt validator
Reply With Quote
  #7 (permalink)  
Old 06-26-2007, 02:36 PM
mtheory's Avatar
WebProWorld Veteran
 
Join Date: Aug 2003
Location: Connecticut, US
Posts: 633
mtheory RepRank 1
Default Re: Search Bots Eating Bandwidth

All aspects of the internet is evolving to a rich media, high bandwidth environment.

You can either drive yourself crazy nickle and diming bandwidth or you can move with times and get a host with more bandwidth.

2.5 gigs is nothing today.
Reply With Quote
  #8 (permalink)  
Old 06-26-2007, 05:30 PM
WebProWorld Member
 
Join Date: Jun 2007
Posts: 56
Matteo RepRank 0
Default Re: Search Bots Eating Bandwidth

If you adapted your robots.txt page to disallow the video, wouldn't this keep the bandwidth use down?
__________________
There is a time for every purpose under heaven.
http://www.expresspools.com http://www.sjvwd.com
Reply With Quote
  #9 (permalink)  
Old 06-26-2007, 05:50 PM
WebProWorld New Member
 
Join Date: Oct 2003
Location: san diego
Posts: 6
phillypleez RepRank 0
Default Re: Search Bots Eating Bandwidth

I am having the same problem. I tried to do htaccess but new search bots with slightly different ip address would start bombarding.. any advice?

phil
Reply With Quote
  #10 (permalink)  
Old 06-26-2007, 05:51 PM
timmathews.com's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jan 2004
Location: Fresh from Manhattan
Posts: 1,008
timmathews.com RepRank 0
Default Re: Search Bots Eating Bandwidth

I was having an issue with this a couple years ago... Altavista & Lycos spiders ate up our bandwith with a vengence.
We blocked the spiders IPs, (about 50 all together), and 2 weeks later we were dropped from Altavista and Lycos.
Mind you, no one would care today, but 6-7 years ago Altavista & Lycos were HUGE.

I know bandwith is expensive when you don't host your own sites from a datacenter or the like, but if spiders are crawling the crap out of your site, it is for a reason.
I let them go.

Last edited by timmathews.com; 06-26-2007 at 05:54 PM. Reason: more explanation
Reply With Quote
  #11 (permalink)  
Old 06-26-2007, 08:12 PM
zbatia's Avatar
WebProWorld Pro
 
Join Date: Jul 2003
Location: Baltimore, MD
Posts: 126
zbatia RepRank 1
Default Re: Search Bots Eating Bandwidth

I just had to pay recently for extra bandwidth usage as well. I don't think it's because of bots (and I see them regularly, especially Yahoo's) but because of amount of a spam. I was sick and tired of cleaning up thousands(!) of spam e-mails and not only e-mails. The Bulletin Board was suffering from spam messages originated by the special bots for phpbb.

I have decided to gather the statistics who are the biggest spammers. For one month I have checked every spam e-mail and I wrote down the headers' IPs. Then, I blocked the whole range of IPs even down to A-class addresses. Believe or not, I have slashed spam at ~80%!

I was so angry, that I have overestimated my results and mistakenly slashed 65.x IP range where the Google's bot lives. You can imagine what happened to the web site with a PR 6...
The number 6 turned to 0! Our sales stopped completely for 2 months. It took me awhile (many hours of hard work and web site optimization) to get back on track.

The positive thing is that I am working in the right direction now by allowing the Ip addresses to hit my site only from the areas I want. It's like the top-bottom approach to the security.
If you want, get my results file here (some people asked me to post it):
http://www.800-security.com/tech/SPAMaddresses.txt
Please be careful, and verify your restrictions. The biggest spammers are Poland, Russia, and Asian region. There are some in America, as well.
Use the following site to check the WHOIS, etc services:
Information Security Resources and Links. Security Certifications, Firewalls, IDS, Microsoft Security, CISSP, Security+

The answer to your problem is to gather the statistics. The bots are usually use the same IPs (no more than several addresses). Restrict the bandwidth eaters but again: be careful.
Use the Control Panel to restrict the addresses.
__________________
The Cyber Teacher
http://www.rtek2000.com
http://www.800-webdesign.com/web-master-links.html -Free Web Master's Resources
_________________
Reply With Quote
  #12 (permalink)  
Old 06-26-2007, 08:34 PM
WebProWorld Veteran
 
Join Date: Aug 2006
Location: Burlington, Ontario, Canada.
Posts: 410
jtracking RepRank 1
Default Re: Search Bots Eating Bandwidth

Quote:
Originally Posted by timmathews.com View Post
...I know bandwith is expensive when you don't host your own sites from a datacenter or the like, but if spiders are crawling the crap out of your site, it is for a reason.
I let them go.

I second the motion. I believe spiders are visiting our sites for a number of reasons and one of them is to list us in the directories etc. I usually try to find out where they came from and then see what they're about...most of the time they've got me on their site. : )
__________________
Post as-it-happens crime stories of criminal behaviour at crimedigg.com
Reply With Quote
  #13 (permalink)  
Old 06-26-2007, 08:40 PM
WebProWorld Veteran
 
Join Date: Aug 2006
Location: Burlington, Ontario, Canada.
Posts: 410
jtracking RepRank 1
Default Re: Search Bots Eating Bandwidth

Quote:
Originally Posted by zbatia View Post
I have decided to gather the statistics who are the biggest spammers. For one month I have checked every spam e-mail and I wrote down the headers' IPs. Then, I blocked the whole range of IPs even down to A-class addresses. Believe or not, I have slashed spam at ~80%!

I was so angry, that I have overestimated my results and mistakenly slashed 65.x IP range where the Google's bot lives. You can imagine what happened to the web site with a PR 6...
The number 6 turned to 0! Our sales stopped completely for 2 months. It took me awhile (many hours of hard work and web site optimization) to get back on track.
lol I did the same thing and wrote a utility that would redirect the spammer to Fight Spam on the Internet!

my terrible but maybe somehow they can find the dude who's spamming me. anyways in the end i found out the more ip addresses you block the more there is a chance that you'll block legitimate visitors.

I think I have an idea on how to easily block the spammers...i'll post it if it works.
__________________
Post as-it-happens crime stories of criminal behaviour at crimedigg.com

Last edited by jtracking; 06-26-2007 at 08:42 PM.
Reply With Quote
  #14 (permalink)  
Old 06-26-2007, 11:29 PM
timmathews.com's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jan 2004
Location: Fresh from Manhattan
Posts: 1,008
timmathews.com RepRank 0
Default Re: Search Bots Eating Bandwidth

Quote:
Originally Posted by jtracking View Post
I second the motion. I believe spiders are visiting our sites for a number of reasons and one of them is to list us in the directories etc. I usually try to find out where they came from and then see what they're about...most of the time they've got me on their site. : )
EXACTLY!
DO NOT BLOCK THEM BECAUSE IT IS COSTING YOU MONEY, ALLOW THEM AND MONITOR THEM BECAUSE THEY WILL ESSENTIALLY MAKE YOU MONEY!

Think about it, your car requires FUEL to proceed forward, are you going to not put gas in it because it cost you money?

I know that statement is broad, but that is the broadest (most broad?) analogy I could think of to get people to understand.

Hit me.
Reply With Quote
  #15 (permalink)  
Old 06-27-2007, 01:47 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 7,968
Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4
Default Re: Search Bots Eating Bandwidth

I think joining this free project, you can get more information about who is good for you and who is not: Distributed Spam Harvester Tracking Network | Project Honey Pot

I am using that myself and it is just incredible.
Reply With Quote
  #16 (permalink)  
Old 06-27-2007, 01:53 AM
WebProWorld New Member
 
Join Date: Nov 2004
Location: India
Posts: 5
harishkumar09 RepRank 0
Default Re: Search Bots Eating Bandwidth

I have successful forum dedicated to the film industry of one of the states of India.I know a lot of people are visiting the forum.And so also a lot of spam bots.I want to know how much of the hits are actually due to humans and how much due to spam bots ? Is there any way of finding that out ?
Reply With Quote
  #17 (permalink)  
Old 06-27-2007, 02:21 AM
WebProWorld New Member
 
Join Date: Jun 2007
Posts: 1
Sheriff RepRank 0
Default Re: Search Bots Eating Bandwidth

Blocking robots using robots.txt or IP addresses are both bad ideas.
Bad robots generally do not pay attention to robots.txt.
Blocking IP addresses as some have suggested has all kinds repercussions.
Normally bots will not change thier name very often so use the following in you
.htaccess file in your root directory and deny all from inner directories except for you local ips.

Using ModRewrite {Apache}

If the string or regular expression matches the user-agent HTTP header it will send them to a forbidden page

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Twiceler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^
Baiduspider
RewriteRule ^.* - [F,L]

you can change the RewriteRule and send them somewhere else like a non-linked page
that records hits and user-agents therefore letting you know how many bad bots are taking
the bait! You will have to use PHP and MySQL if you do not want to save it in a file.

If you do not have ModRewrite the following should help.

SetEnvIfNoCase user-agent "^Twiceler" bad_bot=1
SetEnvIfNoCase user-agent "^Xaldon\ WebSpider" bad_bot=1
SetEnvIfNoCase user-agent "^
Baiduspider" bad_bot=1
<FilesMatch "(.*)">
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</FilesMatch>
Reply With Quote
  #18 (permalink)  
Old 06-27-2007, 02:43 AM
WebProWorld Member
 
Join Date: Jun 2007
Posts: 74
seo4china RepRank 0
Default Re: Search Bots Eating Bandwidth

Quote:
Originally Posted by josephx View Post
Hello everybody,

I have submitted my site to a search engine a few months ago, but lately, unknown robots eats up my 2.5 GB Bandwidth in just about 10-20 days. I only have about 2 short videos and about 2-30 images on my site, so there's no reason why it would eat up that amount of bandwidth.

I have checked my access log, and found out:
38.99.13.123 - - [23/Jun/2007:20:44:45 +0900] "GET /t/imagery/ HTTP/1.0" 404 17507 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"

61.135.162.52 - - [23/Jun/2007:20:44:53 +0900] "HEAD /folder/folder/folder/content.html HTTP/1.1" 200 0 "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)"

They were crawling on my site every minute!

May I know how to block these robots using the .htacces? Please let me know if this is not the right forum to discuss about this issue.

Thank you.
Do you have Chinese language content? Do your site target users located in mainland China? If so I would not mind the heavy Baidu crawl but rather find ways to accommodate it.
Reply With Quote
  #19 (permalink)  
Old 06-27-2007, 04:54 AM
WebProWorld New Member
 
Join Date: Jun 2007
Posts: 12
versuri32 RepRank 0
Default Re: Search Bots Eating Bandwidth

MSN Bot Eating Up ALL my bandwidth!!! MSN Search Optimization forum discussing the Microsoft search engine for MSN and Live.com.


Can i do something?
Reply With Quote
  #20 (permalink)  
Old 06-27-2007, 04:54 AM
WebProWorld New Member
 
Join Date: Jun 2004
Location: Italy
Posts: 1
sjachille RepRank 0
Default Re: Search Bots Eating Bandwidth

IMO the issue is much simpler than many of the replies I have read so far on this thread. I gave a run down on my blog here:

Robots.txt >>Search Engine Robots generating too much traffic on your site ?

In a nutshell you can reduce to an absolute minimum robot bandwidth consumption by simply putting an empty robots.txt file in your web or blog.
Reply With Quote
  #21 (permalink)  
Old 06-27-2007, 06:28 AM
NetProwler's Avatar
WebProWorld Member
 
Join Date: Jan 2007
Posts: 74
NetProwler RepRank 0
Default Re: Search Bots Eating Bandwidth

>>38.99.13.123 - - [23/Jun/2007:20:44:45 +0900] "GET /t/imagery/ HTTP/1.0" 404 17507 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"

That was a 404 for which the server returned something worth 17507 bytes. It might be a custom Page Not Found error page. In a large site where the bandwidth is a worry, avoid using a custom error page with anything more than a couple of KB - if possible. They all add up eventually.

If it is the images which are crawled in excess ( the exact figure will vary depending upon individual circumstances) you can do something to avoid that.

It might be too cumbersome to reproduce it here. But you can see how it is done here:
Targetwoman Blog » Saving Bandwidth in servers

Incidentally, I have not written that blog nor do I have anything to do with that. I just happened to see that.
Reply With Quote
  #22 (permalink)  
Old 06-27-2007, 10:46 AM
WebProWorld New Member
 
Join Date: Jun 2007
Posts: 5
josephx RepRank 0
Default Re: Search Bots Eating Bandwidth

Thank you all for the suggestions and advice!

@Sheriff: When I put this line RewriteRule ^.* - [F,L] it will give me an internal server error, so I just remove the F, I hope it doesn't make the code useless if I remove it?

@seo4china: Yes my contents are both in Chinese and English. Baidu crawls heavily on my site.. I also need to block Yahoo! slurp China, but when I put
RewriteCond %{HTTP_USER_AGENT} ^Yahoo! slurp China [OR] it will give me a 500 internal server error.

I submitted my site to a B2B search engine (Jayde.com), it did help increase my page ranking but so much BW is lost.

Now i have these on my .htaccess

########## Block unwanted robots ##########
RewriteCond %{HTTP_USER_AGENT} ^Twiceler-0.9 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider+ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^YodaoBot/1.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5.0 [OR]
RewriteCond %{HTTP_REFERER} ^baidu\.com
RewriteRule ^.* - [L]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ http://www.mysite.com/image.jpg [R,NC]

Could somebody please check if anything's wrong with the codes? Just hope they won't eat up my remaining 500mb BW, otherwise I need to get a paid hosting.

My Robots/Spiders Visitors:

Unknown robot (identified by 'spider')181016+69 2.73 GB 27 Jun 2007 - 03:34
Yahoo! Slurp China24489+45 707.65 MB 27 Jun 2007 - 03:16
Googlebot12582+21 308.10 MB 27 Jun 2007 - 03:33
Unknown robot (identified by 'bot/' or 'bot-')4529+98 119.71 MB 26 Jun 2007 - 15:41
Unknown robot (identified by 'robot')4219+5 93.20 MB 26 Jun 2007 - 12:18
Yahoo Slurp1591+516 39.52 MB 27 Jun 2007 - 03:32
Ask1005+20 26.21 MB 21 Jun 2007 - 16:58
MSNBot574+313 14.50 MB 26 Jun 2007 - 23:03
Unknown robot (identified by 'crawl')438+8 10.84 MB 27 Jun 2007 - 02:18
MSNBot-media92+4 2.29 MB 26 Jun 2007 - 22:52
Unknown robot (identified by hit on 'robots.txt')0+10 2.77 KB 24 Jun 2007 - 09:38
Alexa (IA Archiver)1+2 29.33 KB16 Jun 2007 - 06:26

Last edited by josephx; 06-27-2007 at 11:25 AM.
Reply With Quote
  #23 (permalink)  
Old 06-27-2007, 01:20 PM
Orion's Avatar
WebProWorld Veteran
 
Join Date: Sep 2003
Location: Halton Hills, ON
Posts: 687
Orion RepRank 3Orion RepRank 3
Default Re: Search Bots Eating Bandwidth

blocking all access to the video(s) might be a decent idea using the robot.txt file (this would cut down on bandwidth). but it will end the listings of that video in the search engines.

another option is that there are a TON of free video hosting solutions out there on the net (utube etc.) you could host the video(s) there then embed them in your site. That way the actual video will get more play and not cost you anything in bandwidth or $. Also it will probably play and stream much better than hosting it with the rest of your site as most general web hosting is not properly optimized for true video streaming.

just my 2cents.
Reply With Quote
  #24 (permalink)  
Old 06-28-2007, 03:51 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 7,968
Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4
Default Re: Search Bots Eating Bandwidth

Quote:
Originally Posted by versuri32 View Post
MSN Bot Eating Up ALL my bandwidth!!! MSN Search Optimization forum discussing the Microsoft search engine for MSN and Live.com.


Can i do something?
You may try adding these two lines in your .htaccess file:

RewriteRule .* - [E=HTTP_IF_MODIFIED_SINCE:%{HTTP:If-Modified-Since}]
RewriteRule .* - [E=HTTP_IF_NONE_MATCH:%{HTTP:If-None-Match}]

That will save yourself and the search engines a lot a bandwidth.
Reply With Quote
  #25 (permalink)  
Old 06-28-2007, 03:53 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 7,968
Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4Webnauts RepRank 4
Default Re: Search Bots Eating Bandwidth

And when MSNBot is crawling your site, it generally does not try to access your site more frequently than one time every few seconds. If MSNBot determines that your site has a slow connection, it automatically adjusts the frequency. To specify a minimum frequency (in seconds), use the Crawl-delay parameter in the robots.txt file:
User-agent: msnbot
Crawl-delay: 120
Reply With Quote
  #26 (permalink)  
Old 11-30-2007, 11:37 AM
tombstoneweb's Avatar
WebProWorld Member
 
Join Date: Sep 2004
Location: Tombstone Arizona
Posts: 87
tombstoneweb RepRank 1
Default Re: Search Bots Eating Bandwidth

Quote:
Originally Posted by Webnauts View Post
I think joining this free project, you can get more information about who is good for you and who is not: Distributed Spam Harvester Tracking Network | Project Honey Pot

I am using that myself and it is just incredible.

Hey, great thread.

Just for my information, how is it incredible? It sounds interesting to me.
__________________
Invent the possibilities, not the obstacles.
Tombstone Arizona - Tombstone Arizona History - Tombstone Arizona Souvenirs
Reply With Quote
  #27 (permalink)  
Old 01-15-2008, 05:41 AM
WebProWorld New Member
 
Join Date: Sep 2003
Location: Spain
Posts: 20
asimegusta RepRank 0
Question Re: Search Bots Eating Bandwidth

After reading this thread as well as various other forums on bandwidth sucking bots, I'm almost convinced that the best is to add bandwidth and let them run.
Six months ago I had to add bandwidth because my site got shut for going over my quota. Now th problem arose again in December when the bandwidth consumption doubled respective to previous months. I didn't upgrade to more bandwidth because I noticed that it was the robots who were using it, although I did also have an increase in visitors. Anyway, i got several warnings from my hosting, but in then end they didn't shut me down even though I exceeded 120% of quota.

I have a dynamic shopping cart that, besides the php pages, generates an equivalent html catalog of over 2000 pages. I have on average just 240 visitors per day.
I have used to date 17 000/20 000MB bandwidth quota for the month.

Okay, now in January I'm seeing the phenomena continue. Here's what the biggest crawlers sucked today.
Googlebot 2,98GB
Unknown robot 2,38GB
Inktomi Slurp 238,53 MB

In addition the following IP are big consumers, not sure what they do.
85.17.216.133 - 3.47 GB 14 Ene 2008
85.17.187.8 - 2.50 GB 15 Ene 2008
85.17.211.73 - 2.36 GB 08 Ene 2008
85.17.211.77 - 723.22 MB 11 Ene 2008

Are these averages numbers for bandwith consumption by robots?

Thanks for your time.
__________________
<a href="www.bordadosdistintivos.com">Bordados</a><br>
<a href="www.alicante-escapade.com">Alicante Escapade</a>
Reply With Quote
  #28 (permalink)  
Old 01-15-2008, 07:47 AM
bj's Avatar
bj bj is offline
WebProWorld 1,000+ Club
 
Join Date: Apr 2005
Location: Delaware Valley, PA
Posts: 1,208
bj RepRank 2bj RepRank 2
Default Re: Search Bots Eating Bandwidth

Quote:
Are these averages numbers for bandwith consumption by robots?
Kandoo, that sounds high. Have you ever installed a google sitemap? The reason I ask is if you have a google sitemap spider crawl the site, you might see why the search engine spiders are sucking that much bandwidth. Sometimes dynamic programs have more than one way to access the same information, which will be pretty evident if you let the sitemap spider run and then view the results for the first time. For instance, on one of my directory sites there was a choice for people to see the sites listed on the page sorted by alpha order or by google PR order. Having spiders crawling the same info three times (original "stock" page, plus the two choices of sort-types) was really counterproductive. When stuff like this is the problem there is usually a common way they're being generated, with a corresponding construction appended to the end of the url, and a filter can be set so the sort pages are not crawled. I did this, and submitted the sitemap to google and yahoo and saw my spider bandwidth drop, while frequency of spider visits did not.

Hope that helps!
Reply With Quote
  #29 (permalink)  
Old 01-15-2008, 09:21 AM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,376
wige RepRank 5wige RepRank 5wige RepRank 5wige RepRank 5wige RepRank 5wige RepRank 5
Default Re: Search Bots Eating Bandwidth

Those 85.17.x.x IPs don't seem to resolve to any known search engines that I can find. The IP address block is managed by RIPE.net, which I can't see a reason for that much bot activity. Most search engines set their bots to resolve back to their own domain name. Additionally, doing a search for a few of those IP addresses in Google I find several web log files showing those IP addresses at the top of some sites' traffic reports.

You might be able to save some bandwidth by blocking these bots, although you might want to capture the associated user agent string and create a block based on that as well. As you catch these bad bots, you can then add them to your firewall so that the requests are denied and use virtually no bandwidth, or serve a very light (small file size) 403 Not Authorized error message.
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog
Reply With Quote
  #30 (permalink)  
Old 01-15-2008, 02:48 PM
WebProWorld New Member
 
Join Date: Sep 2003
Location: Spain
Posts: 20
asimegusta RepRank 0
Default Re: Search Bots Eating Bandwidth

Thanks for the help. I am going to try and block those IPs.
Also, I don't have a google sitemap set up on this site.
__________________
<a href="www.bordadosdistintivos.com">Bordados</a><br>
<a href="www.alicante-escapade.com">Alicante Escapade</a>
Reply With Quote
Reply

  WebProWorld > Webmaster, IT and Security Discussion > Internet Security Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
"Java" robot eating up bandwidth arvana IT Discussion Forum 0 06-05-2006 02:06 AM
Yahoo! Search Tips for Webmasters: Saving Bandwidth YahooMike Yahoo! Discussion Forum 0 02-12-2005 04:33 PM
MSN Bot eating bandwidth. Easywebdev MSN Search Discussion Forum 12 12-28-2004 08:55 AM
Search Bots ohlson Graphics & Design Discussion Forum 1 09-10-2004 12:54 PM


All times are GMT -4. The time now is 07:12 PM.



Search Engine Optimization by vBSEO 3.3.0