|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Greetings. This has been bothering me for some time now. I've noticed this in the past, primarily for Googlebots...but now am seeing it from others as well.
It appears that some bots are particularly interested in error logs. I may be 100% off base, this is just the only way I can explain what I've been seeing - if I'm off base, please accept my apologies in advance - I'm open to suggestions :) Going through my error log I often would see oddball errors followed almost immediately by the same error but from a different IP. I started looking into some of the followup IP addresses and they are all from search engine bots. Here's an example below - last week I made a mistake in entering an image URL - took me three tries to get it right. Below are the error log entries - my IP mostly removed - left enough so you can see it's different from the followup. Notice shortly after I made the mistakes, Googlebot attempted the same URL. I've tried to get Googlebot to follow successful requests the same way, but have no noticed any response - only those in the error log: I've seen that before, always wondered why, but I've seen it before :) Then today - 1 1/2 days later - I noticed Alexa crawling the same exact URLs: ![]() I cannot figure out any other way that Alexa would know to crawl those exact "not found" URLs other than snooping through my error log. Anyone know why they are interested in error logs? Is there some other explanation? Thanks for your help :) |
|
|||
|
I'm not sure why Alexa would know to look for that image unless it has seen that path before and went looking for it automatically. Same thing with google. If it indexed your site once with the broken image path, it might hit your site expecting to see the same image, so it requests it? Just guesses.
However, I've never seen logs located in a directory that could be indexed. If yours are, you should move them for security and visitor's privacy. |
|
|||
|
I assure you, the logs are not in a directory where they can be indexed :) Look closely at the times and order that Google requested the same exact pages that I had entered in error - and the page containing incorrect link was never indexed. It was only a link for a couple seconds while I tried to get the URL correct. As soon as I published the page and noticed the broken image, it was corrected and published again - you can tell by looking at the times of the first three entries in the image. The entire ordeal took a total of 12 seconds It's doubtful Google indexed that page 3 separate times in 12 seconds, miraculously catching the 3 separate errors. I know Google is good, but there isn't any way Google grabbed the same page 3 times in 12 seconds catching all three "versions" of my mistake ;)
|
|
|||
|
Why do you think Googlebot (or other bots) are interested in your error logs?
These errors come about in the normal process of crawling a site where there are broken links. Bots will follow every link on a page, so if you do have mistakes on your page every time the page is spidered the errors will show up in your error logs. Alexa will find the same errors because it spiders in the same way. |
|
|||
|
Also, once google has crawled your site with bad image links, the cached page, whenever brought up will attempt to load the images from the "cached" bad link. So, until the pages having a bad link are re-crawled, the cached page will attempt to load those images from the broken links.
mark |
|
|||
|
I have seen this a few times. It happened when using php based includes etc. By linking to images using ./images/img.jpg vs using absolute or images/img.jpg. When using the ./ method to link to images, I often get errors only with spiders. Normal web browsers can still read this, but spiders see a broken link, and therefore produce an error.
|
|
|||
|
Quote:
1. This page was only published incorrectly for 12 seconds before I corrected the error - 3 times. Does anyone think Google indexed it 3 separate times, catching all three versions of the error, in those 12 seconds? It has nothing to do with any spider following any link on the page - the error was only on the page for 12 seconds - the page was not indexed - it was a single post on a thread - and I edited the post 3 times in 12 seconds to get it right. No way it was indexed by two different bots catching all three different versions of the error in 12 seconds :) Is there another answer? I assure you, it has nothing to do with spidering. Here's another example - this should clarify what I'm trying to say: ![]() This was the first instance - and this example proves that it has nothing to do with any spidering or indexing of pages. That day, I made a typo while entering a URL in my browser - except I didn't know it was a typo so when the page failed, I checked my error log. I was shocked to see not only my entry with the typo, but immediately below, another IP that has entered the same exact typo! The above image shows me intentionally typing into my browser urls that do not exist and have never existed - it was a test. Google followed up by calling the same exact nonsensical urls that I had typed. If that's not proof that it has nothing to do with spidering or indexing, then I dunno what it :) Any ideas? Thanks, it's driving me nuts not knowing :) |
|
|||
|
I could not confirm 64.68.87.69 as a googlebot IP until I found it here:
http://www.canufly.net/~georgegg/google/ So I guess it really is a googlebot. IP's are hard to spoof in situations like this. Shot in the dark, do you have any toolbars installed? I love a good mystery... |
|
|||
|
Quote:
At least now it makes sense, thank you :-) I still don't understand why they track manually entered URLs like that - no clicking, no link, just text typed into a browser. Strange... TimS |
|
|||
|
In this case (the error only being online for 12 seconds - though I wonder do you normally use a stopwatch when entering things in your site?) I would opt for Google finding the page through the toolbar since you now mention that it is installed.
The toolbar has to use the name entered n the browser, and refer that to the Google servers in order to retrieve and show PageRank for the page being displayed and it could be that if it cannot find the URL in its database it will immediately spider that URL. If this is indeed the case we now have an answer for those people who are concerned about their pages not being indexed by Google. Could be an interesting test. |
|
|||
|
Quote:
Quote:
I found it interesting, thank you and flood6 for your help in understanding it :-) TimS |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |