Re: Old Pages Being Indexed By Third Party
First I would try to find out where the traffic is coming from, to determine if it might be a spider or other legitimate but erroneous traffic. Try doing a reverse lookup on the IP address (from your command prompt enter ping -a -n1 0.0.0.0 where 0.0.0.0 is the IP address. Check that site to see if it is a directory or search engine or something else.
Also, if a search engine or other spider is picking up old links, you may be able to locate the source of the old links by entering "yoursite.com/path/to/the.page" exactly as it appears in your error logs, in quotes, in Google. This should show you any page that shows that lists that URL. You can also use the link: search to find true links to the pages in question. It is possible that their are links to the old pages, possibly even on a foreign language web site, that some other SE is indexing every few months and causes this periodic spike in traffic.
When I redesigned my primary site, after setting up all the redirects I set up a logging system to track access attempts to the old addresses, so I could track down old links that had not been updated. I found that the traffic to the old links would sometimes spike as search engines checked some pages, found the redirects, then started heavier traffic pretty quickly. I noticed this mostly with Yahoo, which also sent randomized URLs to force the server to give error messages after the redesign. It seemed from the logs that once the spider found a few redirects traffic from that spider increased as it explored the new structure.
Depending how you handled the removal of the old pages (404, 301, 302...) some databases may store the non-existant url and recheck periodically. Supposedly, if you simply delete the old page, Google would see it deleted but periodically check back (monthly to semi-annually depending on a range of factors) to see if the page came back. If you did a permanent redirect, Google would retain the old URL for a much shorter amount of time. I think a lot of other SEs do the same, but I have more info from my logs and hearsay regarding Google than I do about any of the others.
Also, there are software programs that let users store offline copies of web sites. For example IE5 had this feature built in. This could be related to a feature such as that.
Last edited by wige; 07-11-2007 at 05:19 PM.
|