View Single Post
  #1 (permalink)  
Old 07-11-2007, 02:15 PM
briscoe98 briscoe98 is offline
WebProWorld New Member
 
Join Date: Oct 2005
Posts: 4
briscoe98 RepRank 0
Default Old Pages Being Indexed By Third Party

Within the past six weeks we have had many page errors going to old products that we no longer carry, in some cases, we haven’t carried them for long time. I am assuming they are coming from some sort of robot or script that is crawling the entire site, but errors out when it hits the old product pages. We will get several page errors (about 50-75 at a time) within a few minutes from the same IP address, and then it will be fine for a couple of days.

When it happens, the IP addresses almost always come from international locations. I don’t know if it is one person, and they are masking their IP address, or if it is actually coming from different places. Some of the places include Denmark, Norway, Hong Kong, Canada and every once in a while a United States location. The IP address is never a common robot like Google, MSN, AOL, LYCOS, Yahoo etc.

Another thing is the URL in these cases is always http://sitename.com and not http://www.sitename.com. The common robots always use www in our URL. Going back four years, I have never seen consecutive errors coming from a non www URL, until about six weeks ago.

I am worried that since this is not coming from common robots that it might be something malicious, especially coming from various international IP addresses. Does anybody have suggestions on what this might be and what can be done to prevent this? I know I can do an ISAPI Rewrite to prevent the non www issue, but I am more concerned why old non existing pages keep getting hit by something out there.

Last edited by briscoe98; 07-11-2007 at 02:28 PM.
Reply With Quote