I'll admit, I don't go through my raw web site error logs as often as I should. I have the server set up to only show me errors that are caused by users, because spambots and hackersafe generate a lot of useless errors that don't affect the site. This condensed report is what I monitor.
However, I was doing an experiment in another thread to see if Google would try to crawl a link, and as I was reading through the raw data, I realized that there were hundreds of errors caused by Yahoo looking for nonexistant files.
The problem I am seeing is as follows:
Suppose I have a page at
http://www.mysite.com/section/subsection/page.php. Most search engines will also check
http://www.mysite.com/section/ and
http://www.mysite.com/section/subsection/ to look for index files that may have been missed. Yahoo, however, is looking for
http://www.mysite.com/section and
http://www.mysite.com/section/subsection (without the trailing slash). This generates a huge number of errors, and I want to find out if there is a way to stop Yahoo from doing this. I can't even see a reason why the spider would be looking for files structured like this, because they are unlikely to exist.
Has anyone else encountered this issue? More important, does anyone have any suggested fixes?
I have considered that because my site is dynamic there might be bad links somewhere, but this seems to happen in every single directory and subdirectory on the site, and the only search engine bot acting this way is Yahoo, and they seem to have a lower crawl rate than any of the other engines, so I would expect Google or MSN to have picked up a bad link first.