Quote:
Originally Posted by wige
First I would try to find out where the traffic is coming from, to determine if it might be a spider or other legitimate but erroneous traffic. Try doing a reverse lookup on the IP address (from your command prompt enter ping -a -n1 0.0.0.0 where 0.0.0.0 is the IP address. Check that site to see if it is a directory or search engine or something else.
Also, if a search engine or other spider is picking up old links, you may be able to locate the source of the old links by entering "yoursite.com/path/to/the.page" exactly as it appears in your error logs, in quotes, in Google. This should show you any page that shows that lists that URL. You can also use the link: search to find true links to the pages in question. It is possible that their are links to the old pages, possibly even on a foreign language web site, that some other SE is indexing every few months and causes this periodic spike in traffic.
When I redesigned my primary site, after setting up all the redirects I set up a logging system to track access attempts to the old addresses, so I could track down old links that had not been updated. I found that the traffic to the old links would sometimes spike as search engines checked some pages, found the redirects, then started heavier traffic pretty quickly. I noticed this mostly with Yahoo, which also sent randomized URLs to force the server to give error messages after the redesign. It seemed from the logs that once the spider found a few redirects traffic from that spider increased as it explored the new structure.
Depending how you handled the removal of the old pages (404, 301, 302...) some databases may store the non-existant url and recheck periodically. Supposedly, if you simply delete the old page, Google would see it deleted but periodically check back (monthly to semi-annually depending on a range of factors) to see if the page came back. If you did a permanent redirect, Google would retain the old URL for a much shorter amount of time. I think a lot of other SEs do the same, but I have more info from my logs and hearsay regarding Google than I do about any of the others.
Also, there are software programs that let users store offline copies of web sites. For example IE5 had this feature built in. This could be related to a feature such as that.
|
Thank you for your help. I tried the following and here is what happened:
I tried to the Ping command for the last 5 attempts and all 5 timed out. Just to make sure the ping command was working, i tried known good IPs and it worked fine.
I tried 10 random links from these errors in quotes in google, and all 10 came up with nothing.
Normally we do custom 404 errors, but on product pages we redirect to a new page with product suggestions. Since these old pages are no longer indexed in google, I am thinking that somebody is getting them from somewhere else, like maybe a website archive from a third party.
Could somebody be scanning these old pages to look for any old pages that we still might have on our server, but no longer use. Maybe to look for possible security loop holes? At this point I am thinking that it is not a search bot.