View Full Version : new yahoo spider?
sadiq1133
07-15-2004, 09:21 AM
i found this in my log file
Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
Is this a new yahoo spider? any info on this?
Elite Skills
07-15-2004, 04:33 PM
IP looks to be from fast. I had it too. It looks like it's picking at the images on my site so MM probably means multimedia. yahoo images search?
TrafficProducer
07-15-2004, 04:55 PM
Database of Web Robots
This may or may not help you out?
http://www.robotstxt.org/wc/active.html
ronniethedodger
07-15-2004, 05:23 PM
IP looks to be from fast. I had it too. It looks like it's picking at the images on my site so MM probably means multimedia. yahoo images search?
Are you sure it is from Fast? Or just guessing.
I did search of Internet logs and came across this one here (http://216.109.117.135/search/cache?p=yahoo-inc.com+mmcrawler&ei=UTF-8&n=100&fl=0&u=www.aprilladeville.com/access_log.1&w=yahoo+inc+.com+mmcrawler&d=8D605469EA&c=482&yc=3172&icp=1). There are numerous entries in this log to view.
The crawler resolves to mmcrmX.search.scd.yahoo.com were the X is the number of the crawler. Here are some more sitings (http://www.google.com/search?num=50&hl=en&lr=&ie=UTF-8&c2coff=1&q=mmcrm8+OR+mmcrm2+OR+mmcrm1) of the crawler.
It does appear to be requesting graphic formatted files (.jpg, .png, .gif) as well as the directory index of that the files may reside in.
If you want to disallow access to images directory, then I would put an exclusion for it in your robots.txt file. To stop them from requesting a directory index of your images -- insert a blank .htm file in your images directory. This will stop them from reading the index and serve up a blank page instead.
ronniethedodger
07-15-2004, 05:39 PM
It appears that the robot is the old AltaVista robot for images which was identified as vscooter in the user-agent field.
It does appear to be operated by Fast though as Elite Skills suggested. Look here. (http://www.pgts.com.au/cgi-bin/psql?robot_info=15054)
cooper
07-15-2004, 06:02 PM
I have noticed the following spiders from one of my client's access logs:
msnbot/0.11 ( http://search.msn.com/msnbot.htm)
Googlebot/2.1 ( http://www.googlebot.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
ia_archiver
check_http/1.24.2.4 (nagios-plugins 1.3.1)
NPBot (http://www.nameprotect.com/botinfo.html)
UCmore
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; MSIECrawler)
LinkWalker
sohu-search
Yahoo-MMCrawler
Baiduspider ( http:
Gigabot/1.0
NaverBot-1.0 (NHN Corp. / 82-2-3011-1954 / nhnbot@naver.com)
mozDex
IlTrovatore-Setaccio/1.2 (Indexing; http://www.iltrovatore.it/bot.html; bot@iltrovatore.it)
JoeDog/1.00 [en] (X11; I; Siege 2.59)
QuepasaCreep ( crawler@quepasacorp.com )
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; MSIECrawler)
TAMU_CS_IRL_CRAWLER
CosmixCrawler
Szukacz
So that's two that identify themselves as Yahoo!
Most of those other I don't even recognize. There were some entries for our own WebTrends reporter, but that's the gist of it for July so far.
I think that Elite Skills has a good guess for what "Yahoo-MMCrawler" means.
dkginternet
07-17-2004, 01:00 AM
Hello,
You might consider reserving an additional domain and hosting it on another server containing a copy of the site that you want to backup.
Of course it won't do much good if it's not marketed but it would give you an option when main site is down and the phone starts ringing.
HTH,
Danny