|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
Within the past six weeks we have had many page errors going to old products that we no longer carry, in some cases, we haven’t carried them for long time. I am assuming they are coming from some sort of robot or script that is crawling the entire site, but errors out when it hits the old product pages. We will get several page errors (about 50-75 at a time) within a few minutes from the same IP address, and then it will be fine for a couple of days.
When it happens, the IP addresses almost always come from international locations. I don’t know if it is one person, and they are masking their IP address, or if it is actually coming from different places. Some of the places include Denmark, Norway, Hong Kong, Canada and every once in a while a United States location. The IP address is never a common robot like Google, MSN, AOL, LYCOS, Yahoo etc. Another thing is the URL in these cases is always http://sitename.com and not http://www.sitename.com. The common robots always use www in our URL. Going back four years, I have never seen consecutive errors coming from a non www URL, until about six weeks ago. I am worried that since this is not coming from common robots that it might be something malicious, especially coming from various international IP addresses. Does anybody have suggestions on what this might be and what can be done to prevent this? I know I can do an ISAPI Rewrite to prevent the non www issue, but I am more concerned why old non existing pages keep getting hit by something out there. Last edited by briscoe98; 07-11-2007 at 02:28 PM. |
|
||||
|
you can also use the DNS tools to look up IPs, owners etc.
If it's ok traffic and not just someone trying to waste your bandwidth, I'd consider 301'ing the majority of those pages in your .htaccess file to either similar products or to your main catalogue page(s). If it's a bot, hopefully they'll pick up on that and actually fix the links they're following in their database.
__________________
Ron Boyd website consulting (design, optimization, marketing) :: Follow Me: @orionsweb |
|
|||
|
Quote:
I tried to the Ping command for the last 5 attempts and all 5 timed out. Just to make sure the ping command was working, i tried known good IPs and it worked fine. I tried 10 random links from these errors in quotes in google, and all 10 came up with nothing. Normally we do custom 404 errors, but on product pages we redirect to a new page with product suggestions. Since these old pages are no longer indexed in google, I am thinking that somebody is getting them from somewhere else, like maybe a website archive from a third party. Could somebody be scanning these old pages to look for any old pages that we still might have on our server, but no longer use. Maybe to look for possible security loop holes? At this point I am thinking that it is not a search bot. |
|
||||
|
So are you worried about this because you don't want to see all the errors in your logs files or just don't want some weird bot crawling your pages and taking bandwidth? Sounds just like a crappy scrapper bot to me.
|
|
||||
|
If a page or pages of your site that are no longer active, but still getting traffic, you are missing this traffic by letting it go to a 404 error. Create a custom 404 error page that allows the visitor to still access your site. Also, as mentioned, use 301 redirects for such pages.
__________________
DrTandem's San Diego Web Page Design, drtandem.com |
|
|||
|
Just as a thought I write spiders daily for my job. It could be possible that someone has paid to have a custom spider created to scrape the product information from your site.
If this is the case more than likely it would be from many different IP's. This type of bot does not read any robots.txt and can read java/ajax, captcha images, encoded emails, pretty much everything that you think is safe from a bot. On a good note if it is a professional spider it will create errors for the owner and it will then be modified to go after relevant pages. Hope this helps Last edited by shannonlp; 07-11-2007 at 10:03 PM. Reason: spelling oops |
|
|||
|
Quote:
Chances are if they are stealing keywords, it is one sites that when you click on the results in Goolge, it has nothing to do with what the site actually is. Somebody just trying to gain rankings. Last edited by briscoe98; 07-12-2007 at 12:18 PM. |
|
|||
|
Sorry about that. I mean pull content for products manufactured by us.
|
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Witty hackers create junk pages on 3rd party sites | freetraff | Internet Security Discussion Forum | 3 | 06-26-2008 01:39 PM |
| What pages are indexed??? | knowvak | Search Engine Optimization Forum | 6 | 12-20-2005 11:55 AM |
| Not all pages being indexed | jkjazz | Search Engine Optimization Forum | 3 | 06-09-2005 05:10 PM |
| Indexed Pages | C French | Yahoo! Discussion Forum | 3 | 05-12-2005 12:19 AM |
| Getting More Pages Indexed | KtoID | MSN Search Discussion Forum | 4 | 01-20-2005 07:58 PM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |