iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-10-2004, 03:19 PM
WebProWorld New Member
 
Join Date: Apr 2004
Location: NJ
Posts: 9
StripersOnline RepRank 0
Default Why do search engine "bots" appear to read error l

Greetings. This has been bothering me for some time now. I've noticed this in the past, primarily for Googlebots...but now am seeing it from others as well.

It appears that some bots are particularly interested in error logs. I may be 100% off base, this is just the only way I can explain what I've been seeing - if I'm off base, please accept my apologies in advance - I'm open to suggestions :)

Going through my error log I often would see oddball errors followed almost immediately by the same error but from a different IP. I started looking into some of the followup IP addresses and they are all from search engine bots. Here's an example below - last week I made a mistake in entering an image URL - took me three tries to get it right. Below are the error log entries - my IP mostly removed - left enough so you can see it's different from the followup. Notice shortly after I made the mistakes, Googlebot attempted the same URL. I've tried to get Googlebot to follow successful requests the same way, but have no noticed any response - only those in the error log:



I've seen that before, always wondered why, but I've seen it before :) Then today - 1 1/2 days later - I noticed Alexa crawling the same exact URLs:



I cannot figure out any other way that Alexa would know to crawl those exact "not found" URLs other than snooping through my error log.

Anyone know why they are interested in error logs? Is there some other explanation? Thanks for your help :)
Reply With Quote
  #2 (permalink)  
Old 05-10-2004, 03:35 PM
WebProWorld 1,000+ Club
 
Join Date: Sep 2003
Location: Texas
Posts: 1,156
flood6 RepRank 0
Default

I'm not sure why Alexa would know to look for that image unless it has seen that path before and went looking for it automatically. Same thing with google. If it indexed your site once with the broken image path, it might hit your site expecting to see the same image, so it requests it? Just guesses.

However, I've never seen logs located in a directory that could be indexed. If yours are, you should move them for security and visitor's privacy.
Reply With Quote
  #3 (permalink)  
Old 05-10-2004, 03:55 PM
WebProWorld New Member
 
Join Date: Apr 2004
Location: NJ
Posts: 9
StripersOnline RepRank 0
Default

I assure you, the logs are not in a directory where they can be indexed :) Look closely at the times and order that Google requested the same exact pages that I had entered in error - and the page containing incorrect link was never indexed. It was only a link for a couple seconds while I tried to get the URL correct. As soon as I published the page and noticed the broken image, it was corrected and published again - you can tell by looking at the times of the first three entries in the image. The entire ordeal took a total of 12 seconds It's doubtful Google indexed that page 3 separate times in 12 seconds, miraculously catching the 3 separate errors. I know Google is good, but there isn't any way Google grabbed the same page 3 times in 12 seconds catching all three "versions" of my mistake ;)
Reply With Quote
  #4 (permalink)  
Old 05-11-2004, 01:30 AM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

Why do you think Googlebot (or other bots) are interested in your error logs?

These errors come about in the normal process of crawling a site where there are broken links. Bots will follow every link on a page, so if you do have mistakes on your page every time the page is spidered the errors will show up in your error logs. Alexa will find the same errors because it spiders in the same way.
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
  #5 (permalink)  
Old 05-11-2004, 10:56 AM
WebProWorld Member
 
Join Date: Aug 2003
Location: Kirkwood MO
Posts: 85
mhalloran RepRank 0
Default Googlebot & broken links

Also, once google has crawled your site with bad image links, the cached page, whenever brought up will attempt to load the images from the "cached" bad link. So, until the pages having a bad link are re-crawled, the cached page will attempt to load those images from the broken links.

mark
Reply With Quote
  #6 (permalink)  
Old 05-11-2004, 02:30 PM
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2004
Location: Austin, TX
Posts: 1,229
jestep RepRank 0
Default

I have seen this a few times. It happened when using php based includes etc. By linking to images using ./images/img.jpg vs using absolute or images/img.jpg. When using the ./ method to link to images, I often get errors only with spiders. Normal web browsers can still read this, but spiders see a broken link, and therefore produce an error.
Reply With Quote
  #7 (permalink)  
Old 05-12-2004, 05:26 PM
WebProWorld New Member
 
Join Date: Apr 2004
Location: NJ
Posts: 9
StripersOnline RepRank 0
Default

Quote:
Originally Posted by Mel
Why do you think Googlebot (or other bots) are interested in your error logs?

These errors come about in the normal process of crawling a site where there are broken links. Bots will follow every link on a page, so if you do have mistakes on your page every time the page is spidered the errors will show up in your error logs. Alexa will find the same errors because it spiders in the same way.
I think perhaps I'm not explaining myself clearly :)

1. This page was only published incorrectly for 12 seconds before I corrected the error - 3 times. Does anyone think Google indexed it 3 separate times, catching all three versions of the error, in those 12 seconds?

It has nothing to do with any spider following any link on the page - the error was only on the page for 12 seconds - the page was not indexed - it was a single post on a thread - and I edited the post 3 times in 12 seconds to get it right. No way it was indexed by two different bots catching all three different versions of the error in 12 seconds :)

Is there another answer? I assure you, it has nothing to do with spidering. Here's another example - this should clarify what I'm trying to say:



This was the first instance - and this example proves that it has nothing to do with any spidering or indexing of pages. That day, I made a typo while entering a URL in my browser - except I didn't know it was a typo so when the page failed, I checked my error log. I was shocked to see not only my entry with the typo, but immediately below, another IP that has entered the same exact typo! The above image shows me intentionally typing into my browser urls that do not exist and have never existed - it was a test. Google followed up by calling the same exact nonsensical urls that I had typed. If that's not proof that it has nothing to do with spidering or indexing, then I dunno what it :)

Any ideas? Thanks, it's driving me nuts not knowing :)
Reply With Quote
  #8 (permalink)  
Old 05-12-2004, 05:54 PM
WebProWorld 1,000+ Club
 
Join Date: Sep 2003
Location: Texas
Posts: 1,156
flood6 RepRank 0
Default Toolbar?

I could not confirm 64.68.87.69 as a googlebot IP until I found it here:

http://www.canufly.net/~georgegg/google/

So I guess it really is a googlebot. IP's are hard to spoof in situations like this.

Shot in the dark, do you have any toolbars installed?

I love a good mystery...
Reply With Quote
  #9 (permalink)  
Old 05-12-2004, 06:49 PM
WebProWorld New Member
 
Join Date: Apr 2004
Location: NJ
Posts: 9
StripersOnline RepRank 0
Default Re: Toolbar?

Quote:
Originally Posted by flood6
Shot in the dark, do you have any toolbars installed?

I love a good mystery...
Interestingly enough, the two bots that I've seen respond to keyboard entry like the above example are Google and Alexa - and yes, I have both Google and Alexa toolbars installed.

At least now it makes sense, thank you :-) I still don't understand why they track manually entered URLs like that - no clicking, no link, just text typed into a browser. Strange...

TimS
Reply With Quote
  #10 (permalink)  
Old 05-12-2004, 10:16 PM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

In this case (the error only being online for 12 seconds - though I wonder do you normally use a stopwatch when entering things in your site?) I would opt for Google finding the page through the toolbar since you now mention that it is installed.

The toolbar has to use the name entered n the browser, and refer that to the Google servers in order to retrieve and show PageRank for the page being displayed and it could be that if it cannot find the URL in its database it will immediately spider that URL.

If this is indeed the case we now have an answer for those people who are concerned about their pages not being indexed by Google. Could be an interesting test.
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
  #11 (permalink)  
Old 05-13-2004, 10:06 AM
WebProWorld New Member
 
Join Date: Apr 2004
Location: NJ
Posts: 9
StripersOnline RepRank 0
Default

Quote:
Originally Posted by Mel
In this case (the error only being online for 12 seconds - though I wonder do you normally use a stopwatch when entering things in your site?)
No need to wonder Mel, I had no idea how long the error was online until I looked at the error log images posted above and noticed the first error was at 14:17:37 and the last error was at 14:17:49 ;-)

Quote:
Originally Posted by Mel
If this is indeed the case we now have an answer for those people who are concerned about their pages not being indexed by Google. Could be an interesting test.
I agree - I was intrigued by this idea when I first saw this. I called it "feeding a Googlebot" - results were inconclusive. 100% of the time if there was a Googlebot on my site, it would follow a manually entered URL that produced either a 404 or 500 error. I didn't check any others. But I could not reliably "feed" it manually entered URLs that didn't produce an error. I guess it might have something to do with the page not already being in the index as you suggested.

I found it interesting, thank you and flood6 for your help in understanding it :-)

TimS
Reply With Quote
Reply

  WebProWorld > Search Engines > Search Engine Optimization Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 08:08 PM.



Search Engine Optimization by vBSEO 3.3.0