View Single Post
  #1 (permalink)  
Old 05-12-2008, 12:48 PM
wige's Avatar
wige wige is offline
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,782
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Deluge of the bots

Last Thursday, I developed and implemented a new logging system which aims to identify what products on my site are the most popular. The system is designed to be robust enough to both display to visitors a live list of what is popular, as well as show customized recommendations to users based on their browsing history. I designed the system to record not only the product viewed but the user's IP address and user agent, and record everything so I could get the largest data size possible to test the system with.

I came in today, and started analyzing the recorded data. In addition to the expected bots from Google/MSN/Yahoo, there were also the expected (and easily filtered) spambots. I also noticed a lot of visits from disguised bots, that I probably would never have found if not for the filters I added to this system. The following is from one such bot:

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)
220.95.108.239
2008-05-09 08:24:08
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
220.95.108.239
2008-05-09 08:31:45
Mozilla/5.0 (compatible; Googlebot/2.1; +How Google crawls my site)
220.95.108.239
2008-05-09 08:38:27
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
220.95.108.239
2008-05-09 08:44:52
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
220.95.108.239
2008-05-09 08:55:01
Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5
220.95.108.239
2008-05-09 09:23:43
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419.3 (KHTML, like Gecko) Safari/419.3
220.95.108.239
2008-05-09 09:31:08
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)
220.95.108.239
2008-05-09 09:36:59
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461)
220.95.108.239
2008-05-09 09:43:15
Mozilla/4.0 (compatible; MSIE 5.5; Windows 9
220.95.108.239
2008-05-09 09:48:44
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
220.95.108.239
2008-05-09 09:56:14
Mozilla/4.0 (compatible; MSIE 5.5; Windows 9
220.95.108.239
2008-05-09 09:59:23
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
220.95.108.239
2008-05-09 10:02:03
Googlebot/2.1 (+How Google crawls my site)
220.95.108.239
2008-05-09 10:04:41

Of course, this is only the traffic from that one bot, on one product page. There are hits from literally hundreds of similar bots on the same page, and this is spread across thousands of product pages.

Is there any way I might be able to programmatically filter these bad bots out? Even if it is just an SQL query that could spot the bots in a somewhat reliable way, and list them so I can remove them from the popularity system, and so I can add them to my deny list?
__________________
The best way to learn anything, is to question everything.
Reply With Quote