PDA

View Full Version : Do you know this robot?



greeneagle
12-12-2003, 02:17 AM
Unknown robot (identified by 'crawl')

Thanks,
Ken

le_gber
12-12-2003, 04:05 AM
I think that you are using a stat package that just didn't recognize this robot that's all. Somethimes it also does the

Unknown robot (identified by 'robot')

leo

Whit
12-12-2003, 04:21 AM
(Assuming that you are looking at your raw logs and not the formatted output of a stats package.)

That sounds like a home made jobe to me. I wouldn't worry too much about it. I actually found the following in my log files tonight:
"Agent: epbtatutgpxvbg evtxan 2nmeelhh"

You might consider ip banning them if it proves to be a continuing problem and a drain on resources.

vfaulkner
12-14-2003, 09:36 PM
http://www.robotstxt.org/wc/active/html/index.html
is a pretty good list of robots, their attributes, and purposes.

minstrel
12-14-2003, 09:46 PM
Vicki:

What Leo was saying is that he found the entry

"Unknown robot (identified by 'robot')"

in his logs. Your URL won't help because the robot is not identified - presumably because it is either a "home-made" robot or at the very least a non-standard one that coudln't or didn't identify itself to the logs.

minstrel
12-14-2003, 10:05 PM
It may not be anything to worry about -- Using a Google search for "Unknown robot (identified by 'robot')" and including the quote marks, I discovered that there have been several questions about similar entries and I found this thread (http://www.webmasterworld.com/forum11/2388.htm) in another forum:


Since the 9th of this month one of my sites is being crawled by a bot which my awstats identifies like this :

unknown robot (identified by 'spider') .. on 09 Oct 2003 - 17:35
unknown robot (identified by 'robot') .. on 18 Oct 2003 - 16:26
unknown robot (identified by 'crawl') .. on 20 Oct 2003 - 05:14



203.219.86.6 - - "GET / HTTP/1.1" "-" "unknown (compatible; unknown; unknown)"
203.219.86.6 - - "GET /some/image_on.gif HTTP/1.1" "http://www.somesite.com/" "unknown (compatible; unknown; unknown)"
203.219.86.6 - - "GET /some/image_off.gif HTTP/1.1" "http://www.somesite.com/" "unknown (compatible; unknown; unknown)"
Total of 87 hits in under 2 minutes (mostly images). Took the front page of the site with images (including rollover images), then a few other pages. Two pages were requested twice. No request for robots.txt. Looks like a browser to me.


If it's a regular surfer you will see a hit in your logs for each graphic on the page. Did he pull any graphics outside the few pages accessed?


No, just those associated with the pages. Whatever it was had to process JavaScript to load the mouseover images, so if it's not a browser then it's some kind of intelligent agent.



> unknown robot (identified by 'crawl')
Since AWStats doesn't know about every bot that there is, the developer has taken a pretty neat route and analyzes the UA for some keywords - "spider", "robot" and "crawl" - and tries to identify (unknown) bots that way.

To use one (common) example, AWStats doesn't identify LookSmart's "grub" crawler by name, but because of the use of the word "crawl" in the UA, AWStats catches grub that way...

Mozilla/4.0 (compatible; grub-client-1.5.3; Crawl your own stuff with http*://grub.org)