iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-23-2007, 07:36 AM
WebProWorld New Member
 
Join Date: May 2007
Location: UK
Posts: 21
Littlemansearch RepRank 0
Question Why is googlebot ignoring my robots.txt file?

For the past fortnight googlebot has been on my site to index non existant urls,without first trying to obtain the robots.txt file,is their bot turning rogue or what.Here is a sample of my server log for the 23rd may today
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:01:27 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:04:27 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:07:38 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:10:41 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:14:22 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:17:25 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:20:31 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:23:42 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:26:52 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:30:03 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:33:34 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:36:44 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:40:00 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:43:24 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:46:34 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:49:44 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:52:54 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:56:04 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:03:59:14 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:02:33 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:05:43 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:09:01 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:12:16 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:15:37 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:18:47 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:21:57 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:25:14 +0100] "GET /search/search.pl?q=cars HTTP/1.1" 404 -
crawl-66-249-65-198.googlebot.com - - [23/May/2007:04:28:24 +0100] "GET /search/search.pl?q=bebo.com HTTP/1.1" 404 -

Make up your own minds I have emailed them about this to recieve no reply and a continuation of trying to index these urls that do not existthe search url is incorrect for my search engine all this does is slow the processes down on my server they continuosly try to ndex the same links for between 6 and 7 minutes,does any one have any ideas,I would be gratefull for suggestions.I have modified my robots.txt file to ban the indexing of these urls also when a 404 is found isn`t it automatically removed from googles index,it is from mine.Like I said the above is an actuall sample from my log today but this as been ongoing for over a fortnight.
Thanks
Sincerely

http://www.littlemansearch.co.uk/index.html
Reply With Quote
  #2 (permalink)  
Old 05-23-2007, 07:52 AM
SemAdvance's Avatar
WebProWorld Veteran
 
Join Date: Dec 2005
Location: In Your Mind
Posts: 788
SemAdvance RepRank 3SemAdvance RepRank 3SemAdvance RepRank 3
Default Re: Why is googlebot ignoring my robots.txt file?

Hi

It's not ignoring your robots.txt file.

Your file is incorrect and contains quite a few errors. Therefor googlebot and others will not download the file.

Use the URL below to find the errors and how to resolve.

http://tool.motoricerca.info/robots-checker.phtml

you should enter the full URL to your robots.txt file.

Peace
Reply With Quote
  #3 (permalink)  
Old 05-23-2007, 08:22 AM
WebProWorld New Member
 
Join Date: May 2007
Location: UK
Posts: 21
Littlemansearch RepRank 0
Question Re: Why is googlebot ignoring my robots.txt file?

yahoo yesterday indexed my site and first it requested the robots.txt file as it should,googlebot used to but one day I was checking my logs and noticed that it wasn`t requesting the robots.txt file and nothing has changed.Almost Every other crawler requests the robots.txt file.


http://www.littlemansearch.co.uk
Reply With Quote
  #4 (permalink)  
Old 05-23-2007, 08:55 AM
WebProWorld New Member
 
Join Date: May 2007
Location: UK
Posts: 21
Littlemansearch RepRank 0
Lightbulb Re: Why is googlebot ignoring my robots.txt file?

Thanks for the tip I followed the link and amended the few errors lets just see if it works now.Great blog by the way I would normally agree but I run a search engine myself,Visit http://www.awagawag.co.uk/search/search.pl?Mode=AnonAdd to add your website for crawling its free.

Sincerely
http://www.littlemansearch.co.uk

Last edited by Littlemansearch; 05-23-2007 at 09:05 AM. Reason: extra information
Reply With Quote
  #5 (permalink)  
Old 05-23-2007, 11:15 AM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Exclamation Re: Why is googlebot ignoring my robots.txt file?

I noticed two things in your robots.txt file. The first is that you may be too specific. Google uses more than one bot in it's searches, and they suggest
Code:
User-agent: Googlebot
as opposed to your code, which only restricts one specific bot. Also, you might want to make the code User-agent: * and combine the entries. There is likely a bad link somewhere on the web that caused Google to find these bad pages, and other spiders may eventually start looking as well.

As a side note from my Black Hat side, it looks like you are blocking access to certain settings files through your robots.txt file. If that is the case, this is usually a bad idea. You would be better served using a meta tag on the actual settings page.
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #6 (permalink)  
Old 05-23-2007, 12:13 PM
WebProWorld New Member
 
Join Date: May 2007
Location: UK
Posts: 21
Littlemansearch RepRank 0
Unhappy Re: Why is googlebot ignoring my robots.txt file?

Thanks for your reply
I have used the no index no follow rule in my web page areas that I don`t want indexing but the problem remains that google keep trying to index the same link every 12 hours or so thus tying my server up as it is only currently on a normal pc and not on a dedicated server they have been trying to index the same url`s for at least a fortnight the thing is the url`s do not exist and according to my server logs it is not even requesting the robots.txt file which is the first thing it should do,It`s like walking into someone elses house uninvited.

Thanks for the tip about google using more than one bot

Sincerely

Littleman Search

http://www.littlemansearch.co.uk/index.html

Last edited by Littlemansearch; 05-23-2007 at 12:15 PM.
Reply With Quote
  #7 (permalink)  
Old 05-23-2007, 04:01 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Why is googlebot ignoring my robots.txt file?

Google does not recheck the robots.txt file before every file is retrieved. It checks at most daily, and I think the average is closer to weekly. Based on the number of requests, it sounds like Google found a large number of links to this URL somehow. Is there any way could create a 301 redirect to somewhere else? That should get Google to drop the page a lot faster.

Google looks at 404 error messages as the server saying, "For some unknown reason, I can't find what you are telling me should be here." Google then keeps saying "Find it yet? How about now?" Especially if a lot of other pages say it should be there. A 301 message tells Google "What you want was moved over there, and nothing will ever be here again. Stop asking."

I think Google still crawls pages that are forbidden by robots.txt, to see if there are links on the resulting page that it can follow or index, but the page is not added to the index or cached.

I have had 404s that Google kept looking for for over a year, until I put up 301 redirects. You can do the same thing in your server configuration or .htaccess, depending on your server software.
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #8 (permalink)  
Old 05-24-2007, 06:36 AM
WebProWorld New Member
 
Join Date: May 2007
Posts: 9
Gracia RepRank 0
Default Re: Why is googlebot ignoring my robots.txt file?

This post may help clear this up a bit:
http://googlewebmastercentral.blogsp...about-googlebo...


Particularly this part:


'If my robots.txt file contains a directive for all bots as well as a
specific directive for Googlebot, how does Googlebot interpret the
line addressed to all bots?
If your robots.txt file contains a generic or weak directive plus a
directive specifically for Googlebot, Googlebot obeys the lines
specifically directed at it."


If your file includes a user-agent: Googlebot line, Googlebot will
obey that line and ignore the user-agent: * line. If your file does
not include a user-agent: Googlebot line, then Googlebot obeys the
user-agent: * line.
Reply With Quote
  #9 (permalink)  
Old 05-24-2007, 05:40 PM
WebProWorld MVP
WebProWorld MVP
 
Join Date: Jul 2004
Location: Omaha
Posts: 2,714
brian.mark RepRank 3brian.mark RepRank 3
Default Re: Why is googlebot ignoring my robots.txt file?

If a request every 3 minutes ties up the machine, you may want to consider a bigger machine. That's not much, although I'd agree that fixing your robots.txt should make them quit hitting a 404 (which shouldn't take much, if any, resources from your machine.)

Brian.
__________________
ToolBarn.com, an Internet Retailer Top 500 and Inc. 500 Company | Tool Parts | Pet Supplies
Reply With Quote
Reply

  WebProWorld > Search Engines > Google Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need Advice! Google and Googlebot ignoring my client site mantawebsolutions Search Engine Optimization Forum 27 01-26-2007 02:04 AM
PR 6 for Robots.txt file amar Search Engine Optimization Forum 2 12-26-2006 09:57 PM
What is a robots.txt file? Tamelyne Search Engine Optimization Forum 3 10-25-2004 08:42 PM
Google ignoring Robots.txt? strum4life Google Discussion Forum 4 10-12-2004 11:04 PM
Googlebot only visting index.asp and robots.txt only pbatson Google Discussion Forum 14 03-04-2004 12:17 AM


All times are GMT -4. The time now is 12:32 AM.



Search Engine Optimization by vBSEO 3.3.0