iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-25-2005, 03:35 PM
WebProWorld Member
 
Join Date: Sep 2003
Location: Rochester, New York
Posts: 50
PunkyLZ RepRank 0
Default Robots.txt File Questions

I've been reading up on the use of the Crawl-delay syntax in the robots.txt file. I have a potential client who is employing this method. Below is a snippet of their robots.txt file:

User-agent: Slurp
Crawl-delay: 20

User-agent: msnbot
Crawl-delay: 20

User-agent: YahooSeeker
Crawl-delay: 20

I'm wondering the following:
1 - Has anyone ever used this and had success?
2 - Has it had any impact spiders indexing your site?
3 - Isn't a crawl-delay of 20 a little excessive?
4 - Do you think I should recommend the client moves to a new hosting provider who can handle the spider traffic?

Any other information on the use of the crawl-delay would be greatly appreciated.

Thanks!
__________________
Let BizWonk handle your Custom Web Design, Search Engine Optimization and Social Media Marketing Needs.
Reply With Quote
  #2 (permalink)  
Old 01-25-2005, 04:03 PM
WebProWorld Pro
 
Join Date: May 2004
Posts: 150
AndrewX RepRank 0
Default

Never used it, but that does seem like a lifetime as far a delay and the server being able to handle it.

If they really need that delay, I hope they don't have human visitors! ;) Just kidding. Are you sure that is needed?
__________________
Money Talk || SEO + Directory = SEOMA | SEO 1 | Link Vault
Reply With Quote
  #3 (permalink)  
Old 01-25-2005, 04:43 PM
WebProWorld Member
 
Join Date: Sep 2003
Location: Rochester, New York
Posts: 50
PunkyLZ RepRank 0
Default

Personally....I don't think they need that big of a delay...BUT then again we are not hosting the website. Not sure if the hosting company can put some type of bandwidth limitation on the client's site. Afterall it is a shopping site. BUT if they did put limitations on the amount of traffic at any given time wouldn't they lose customers?
__________________
Let BizWonk handle your Custom Web Design, Search Engine Optimization and Social Media Marketing Needs.
Reply With Quote
  #4 (permalink)  
Old 01-25-2005, 08:29 PM
Keimos's Avatar
WebProWorld Veteran
 
Join Date: Jul 2003
Location: United Kingdom
Posts: 477
Keimos RepRank 0
Default Robots

Hi PunkyLZ,

Before switching, as I would at seeing the robots.txt file ask why they are doing it.

From other forums it is against all that I have read.
If it is working, then if we have some idea why, then this info can be passed on to others.

I just say this because every site want the spiders to be there in the first place.

As said above is this a hosting problem?

Ask the questions.
__________________
Keimos - Always learning something new each day
www.keimos.co.uk , www.keimos.net , www.selfpacedit.co.uk
Reply With Quote
  #5 (permalink)  
Old 01-25-2005, 09:38 PM
WebProWorld New Member
 
Join Date: Jul 2003
Posts: 8
rptasiuk RepRank 0
Default Crawl-delay

I do not see this as causing any problems. All you are doing is telling the visiting spider to wait 20 seconds between indexing each link it finds on your page. I can see this as a good thing for some of the more aggressive spiders. There must have been just cause for some of them to include it in their bots, maybe on very busy sites it can make a difference in bandwidth usage during the spiders visit.
Reply With Quote
  #6 (permalink)  
Old 01-25-2005, 10:36 PM
GSO's Avatar
GSO GSO is offline
WebProWorld Pro
 
Join Date: Jul 2003
Location: San Diego, CA, USA
Posts: 116
GSO RepRank 0
Default Crawl-delay

Hi PunkyLZ

Yahoo says:
You can add a "Crawl-delay: xx" instruction, where "xx" is the minimum delay in seconds between successive crawler accesses. If the crawler rate is a problem for your server, you can set the delay up to 60 or 300 or whatever value is comfortable for your server.

Setting a crawl-delay of 20 seconds for Yahoo! Slurp would look something like:

User-agent: Slurp
Crawl-delay: 20

Since most of the major Search Engines are using this instruction why not wildcard the user-agent and let the smaller Search Engines play catch up. That will save you a lot of work.

You can validate your robotstxt.file here: http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
__________________
GSO
http://www.GlobalSpecialOperations.com/
-------------------------------------
Reply With Quote
  #7 (permalink)  
Old 01-26-2005, 02:27 AM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

Some people on certain servers have found Yahoo's Slurp in partuicular to be quite greedy about bandwidth and this can cause problems with other bots and human visitors -- for those sites, the crawl-delay instruction is a good idea, although I'd probably be more inclined to use 5 or 10 as a max.

I would not recommend using the wildcard as suggested above. First, Google hits pages about one per second or so and thus doesn't create the problem that Slurp does. Why slow down bots that are already behaving? Second, I don't believe it's true that all spiders even recognize the limiting instruction.
Reply With Quote
  #8 (permalink)  
Old 01-26-2005, 03:50 AM
GSO's Avatar
GSO GSO is offline
WebProWorld Pro
 
Join Date: Jul 2003
Location: San Diego, CA, USA
Posts: 116
GSO RepRank 0
Default Slowing Robots

I agree that some people whose web sites are listed on Google and Yahoo and have those search engine robots coming in everyday and using up their bandwidth because they update everyday and have 190K page views per month and 80 Gigs per month of bandwidth provided by their web hosting service should be concerned about the rate that those bots request pages after spending all that time to get listed and would really like to risk that the robots will take the path of least resistance and leave their site for an easier one.
__________________
GSO
http://www.GlobalSpecialOperations.com/
-------------------------------------
Reply With Quote
  #9 (permalink)  
Old 01-26-2005, 04:25 AM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

There's no need to try to discourage the bots from spidering your site. You just slow them down a bit.

That's exactly what the crawl-delay instruction does: slows them down to once every 2 or 3 or 5 seconds. You still get the pages indexed that way, which i assume is a desirable thing, no?
Reply With Quote
  #10 (permalink)  
Old 01-26-2005, 05:05 AM
GSO's Avatar
GSO GSO is offline
WebProWorld Pro
 
Join Date: Jul 2003
Location: San Diego, CA, USA
Posts: 116
GSO RepRank 0
Default Crawl-delay Instruction

I agree that the name of the game is to do what is necessary to get the robots to crawl your site especially new pages for listing. I just don't understand why you would want to try to control an already controlled robot with a default delay written in the program based on many criterias including server load at the time the bot begins loading pages. If you check your logs and statistics reports, you can find any rogue robots and then deal with them on an individual basis but to list them all is, in my opinion, not a good practice.
__________________
GSO
http://www.GlobalSpecialOperations.com/
-------------------------------------
Reply With Quote
  #11 (permalink)  
Old 01-26-2005, 12:05 PM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default Re: Crawl-delay Instruction

Quote:
Originally Posted by GSO
I just don't understand why you would want to try to control an already controlled robot with a default delay
See above. I've already stated that as one of my arguments for not using wildcards for crawl-delay. First, Googlebot already behaves so leave it alone. Second, Yahoo! Slurp and MSNbot do not always behave so if it's a problem for your site use crawl-delay to make them behave since they both state clearly that they will obey that instruction.

For other smaller bots that misbehave, they can just be banned if you like. But it seems to me to be SE suicide to ban one of the big three.
Reply With Quote
Reply

  WebProWorld > Search Engines > Search Engine Optimization Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 03:54 PM.



Search Engine Optimization by vBSEO 3.3.0