iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Yahoo! Discussion Forum Yahoo Search discussion. Any topic or subject specific to Yahoo should go here. You will also find a subforum dedicated to YPN & Panama.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-03-2007, 06:43 AM
WebProWorld Member
 
Join Date: Aug 2004
Location: Australia
Posts: 81
glinted RepRank 0
Default Slurp Chewing up 20+ gigs per Month per Site

I have just found out my bandwidth on my one of my windows servers has quadrupled in the last few weeks due to Slurp going crazy... taking around 20 + gigs per month on some of my sites each that have a lot of dynamic product pages. Unfortunately this is not turning into the same increases in traffic & sales and the only increase I'm getting is huge bandwidth bills.

I don't want to shut them out completely as I'm sure some of my buying visitors come from Yahoo but I do have to work out a solution here as 20gigs per site a month is just way to much for a spider and this is also going to be putting excess stress on the server.

I have found that Yahoo do offer a Robot restriction that I can add to my Robots.txt so I'm going to try this. I have no idea though which number I should use as there is absolutely nothing in their documentation from what I can see that tells you what these numbers mean specifically, just higher is going to mean less visits from slurp.

They give this as an example:

User-agent: Slurp
Crawl-delay: 10

So if my website has 30,000 product pages, what would be a good number to set this? I would like to restrict them to only visit once every 10 days and hopefully they would only use around 1 - 2 gigs a month as opposed to 20.

Any feedback and ideas on what number I should use etc would be helpful.

Unfortunately this host is more expensive than your typical Unix host that offers thousands of gigs per month, they only offer 20 gigs of bandwidth with my package but I'm going to ask them to increase this to be in line with other Windows hosts that offer around 30-40 for my multi domain reseller package.

Last edited by glinted; 07-03-2007 at 06:47 AM.
Reply With Quote
  #2 (permalink)  
Old 07-03-2007, 04:46 PM
bj's Avatar
bj bj is offline
WebProWorld 1,000+ Club
 
Join Date: Apr 2005
Location: Delaware Valley, PA
Posts: 1,208
bj RepRank 2bj RepRank 2
Default Re: Slurp Chewing up 20+ gigs per Month per Site

I'm not sure but I believe that Crawl-delay: 10 refers to the timespan "seconds" and not days.

Google's Matt Cutts had this to say about crawl delay, which leads me to suspect I'm right. "I asked the crawl team about this a while ago, and there’s a good reason. It turns out that a lot of webmasters give crawl-delay values that are way out of whack, in the sense that we’d only be able to crawl 15-20 urls from a site in an entire day. I’ll try to post more details about that sometime in the future. The crawl guys are interested in allowing people to give some sort of hostload hint, but it’s their opinion that crawl-delay isn’t the best way to do it."

It's possible that you might be able to do something via .htaccess.

See thread here:
Search Bots Eating Bandwidth
Reply With Quote
  #3 (permalink)  
Old 07-03-2007, 06:21 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,743
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default Re: Slurp Chewing up 20+ gigs per Month per Site

Your not the only one:

Yahoo! Slurp Crawling Wild?
Reply With Quote
  #4 (permalink)  
Old 07-03-2007, 08:06 PM
WebProWorld New Member
 
Join Date: Dec 2005
Location: Florida
Posts: 1
meriweather RepRank 0
Default Re: Slurp Chewing up 20+ gigs per Month per Site

Gee guys, & gals,

I feel a little outclassed here, with Slurp only gobbling 35.23 Megs. last month (per site - on average), but thats more than all 13 of the other, regular SE bots combined. I got just over 7,000 uniques from Google, for their 2.74 Megs. of crawler bandwidth as opposed to 778 uniques from Yahoo.

While I don't have the dynamic shopping cart problem, I do have a lot of websites which seem to get trawled way too heavily by the hungry Inktomi spider in relation to the amount of traffic I get from Yahoo. Hasn't gotten so bad that I want to exclude Yahoo traffic altogether, but there's got to be a way they can get the info they need to do a creditable job of indexing without raping a site every time they come to get it.

Will keep an eye to this thread to see what others have gleaned or suspect.

Cheers,
Doc
Reply With Quote
  #5 (permalink)  
Old 07-03-2007, 08:29 PM
bj's Avatar
bj bj is offline
WebProWorld 1,000+ Club
 
Join Date: Apr 2005
Location: Delaware Valley, PA
Posts: 1,208
bj RepRank 2bj RepRank 2
Default Re: Slurp Chewing up 20+ gigs per Month per Site

I just checked my stats on my sites, and they've been hammering me, too, and all client sites. Maybe Yahoo needs to be made aware of this. A lot. They say the squeaky wheel gets the grease.

Hmm, ya gotta wonder what's happening at Yahoo to their servers and databases that they feel the need to crank up the arachnids this high . . .
Reply With Quote
  #6 (permalink)  
Old 07-04-2007, 12:22 AM
NetProwler's Avatar
WebProWorld Member
 
Join Date: Jan 2007
Posts: 74
NetProwler RepRank 0
Default Re: Slurp Chewing up 20+ gigs per Month per Site

Yahoo is one of the most voracious bots on the Net. However there are a few things one should do to keep the ravenous appetite of Yahoo in check.

1. Restrict the size of the custom 404 error page. Most of the time Yahoo bombards the site looking for nonexistent pages. If the 404 error page is less than a few Kb it lowers the bandwidth consumed to some extent.

2. If your site serves dynamic pages, try and implement some form of control (like mod_throttle) which will slow down the serving of pages and save your server from coming to its knees.

3. If Yahoo is showing undue interest in your images (reason for massive consumption of bandwidth), deny access to Yahoo in the image directory.
Reply With Quote
  #7 (permalink)  
Old 07-04-2007, 04:46 AM
WebProWorld New Member
 
Join Date: Feb 2007
Location: Essex, UK
Posts: 1
hitchman RepRank 0
Default Re: Slurp Chewing up 20+ gigs per Month per Site

I had problems last year with Slurp following an events calendar on one of our sites. Even though there were very few events listed it was finding all the links to dates, and previous/next month links, and had followed them through many decades past and future before I started to get bandwidth warnings on the site.

They did sort the problem out and replied very quickly. I suspect they may have just restricted crawling on that particular site though.
Reply With Quote
  #8 (permalink)  
Old 07-04-2007, 09:15 AM
WebProWorld Veteran
 
Join Date: Aug 2006
Location: Burlington, Ontario, Canada.
Posts: 410
jtracking RepRank 1
Default Re: Slurp Chewing up 20+ gigs per Month per Site

Quote:
Originally Posted by NetProwler View Post
Yahoo is one of the most voracious bots on the Net. However there are a few things one should do to keep the ravenous appetite of Yahoo in check.

1. Restrict the size of the custom 404 error page. Most of the time Yahoo bombards the site looking for nonexistent pages. If the 404 error page is less than a few Kb it lowers the bandwidth consumed to some extent.

2. If your site serves dynamic pages, try and implement some form of control (like mod_throttle) which will slow down the serving of pages and save your server from coming to its knees.

3. If Yahoo is showing undue interest in your images (reason for massive consumption of bandwidth), deny access to Yahoo in the image directory.

Interesting. Thanks for that little tidbit. I too am feeling yahoo bombardments and tons of 404 error page notifications sent to me - essentially wasting my time. I think it might be time to tame the lesser giant.
__________________
Post as-it-happens crime stories of criminal behaviour at crimedigg.com
Reply With Quote
  #9 (permalink)  
Old 07-05-2007, 09:58 AM
WebProWorld Member
 
Join Date: Aug 2004
Location: Australia
Posts: 81
glinted RepRank 0
Default Re: Slurp Chewing up 20+ gigs per Month per Site

Quote:
Originally Posted by NetProwler View Post
Yahoo is one of the most voracious bots on the Net. However there are a few things one should do to keep the ravenous appetite of Yahoo in check.

1. Restrict the size of the custom 404 error page. Most of the time Yahoo bombards the site looking for nonexistent pages. If the 404 error page is less than a few Kb it lowers the bandwidth consumed to some extent.

2. If your site serves dynamic pages, try and implement some form of control (like mod_throttle) which will slow down the serving of pages and save your server from coming to its knees.

3. If Yahoo is showing undue interest in your images (reason for massive consumption of bandwidth), deny access to Yahoo in the image directory.
I also agree that these are some great tips... would anyone know off-hand how I would restrict slurp specifically to my image folder? I think this could help!
Quote:
I'm not sure but I believe that Crawl-delay: 10 refers to the timespan "seconds" and not days.
Oh I see, that makes sense now.... as for what you said about the .httaccess file, well I'm on a windows server for these sites so it doesn't have one...
Reply With Quote
  #10 (permalink)  
Old 07-05-2007, 10:18 AM
bj's Avatar
bj bj is offline
WebProWorld 1,000+ Club
 
Join Date: Apr 2005
Location: Delaware Valley, PA
Posts: 1,208
bj RepRank 2bj RepRank 2
Default Re: Slurp Chewing up 20+ gigs per Month per Site

Oops, sorry. The mind has trouble believing anyone prefers using a Windoze server in this day and age . . .

I assume you meant keep arachnids out of your image folder?

User-agent: *
Disallow: /

That's the basic format, so if you want to disallow certain spiders from a certain directory, you'd do this:

User-agent: Slurp
Disallow: /images/

More info here:
Yahoo! Help -
Reply With Quote
  #11 (permalink)  
Old 07-06-2007, 05:18 AM
NetProwler's Avatar
WebProWorld Member
 
Join Date: Jan 2007
Posts: 74
NetProwler RepRank 0
Default Re: Slurp Chewing up 20+ gigs per Month per Site

As bj pointed out, you can put up a robots.txt in the document root of your server with these lines :
User-agent: Slurp
Disallow: /images/
Reply With Quote
  #12 (permalink)  
Old 07-07-2007, 03:03 AM
WebProWorld Member
 
Join Date: Aug 2004
Location: Australia
Posts: 81
glinted RepRank 0
Default Re: Slurp Chewing up 20+ gigs per Month per Site

{quote}User-agent: Slurp
Disallow: /images/

ok thanks! ;0{/quote}

{quote}
Oops, sorry. The mind has trouble believing anyone prefers using a Windoze server in this day and age . . . {/quote}

i have over 500 websites, some use php, some use asp & some use .net .. i love my asp sites as they do some slick things I have not got around to doing on php yet

Last edited by glinted; 07-07-2007 at 03:13 AM.
Reply With Quote
  #13 (permalink)  
Old 07-08-2007, 11:55 PM
WebProWorld Member
 
Join Date: Sep 2006
Location: Copenhagen
Posts: 69
Mads Dam RepRank 0
Default Re: Slurp Chewing up 20+ gigs per Month per Site

Just looked at my stats. Seems like Yahoo is everywhere: 63 % of all robots this month were Yahoo slurping.

Does anyone know when this started..?
__________________
Photo + Graphic + Animation
www.madsdam.net
Reply With Quote
  #14 (permalink)  
Old 07-10-2007, 04:40 AM
WebProWorld Member
 
Join Date: Aug 2004
Location: Australia
Posts: 81
glinted RepRank 0
Thumbs down Re: Slurp Chewing up 20+ gigs per Month per Site

well nothing has stopped it, I added this to the robot file last weeK'

User-agent: Slurp
Disallow: /graphics/

User-agent: Slurp
Crawl-delay: 5 (was 50)

Slurp has already chewed up 5 gigs on this one website this month alone.

I'm now going to try it at 200
Reply With Quote
Reply

  WebProWorld > Search Engines > Yahoo! Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
One month old site, please advise yoyo Submit Your Site For Review 3 08-29-2006 09:41 AM
FS: Site-wide text links from $5/month weblaunch Ad Space Buy and Sell 0 11-15-2005 05:20 AM
I'm Selling text ads on a pr 4/10 site. 8,700 visits a month tesla Ad Space Buy and Sell 0 07-13-2005 02:33 AM
How to get Slurp on to your site regularly sudhani Yahoo! Discussion Forum 4 08-12-2004 10:16 PM
Inktomi Slurp or Yahoo Slurp hunegnaw Yahoo! Discussion Forum 2 08-10-2004 05:55 AM


All times are GMT -4. The time now is 07:16 PM.



Search Engine Optimization by vBSEO 3.3.0