iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 02-16-2004, 12:35 AM
WildSeeker's Avatar
WebProWorld Member
 
Join Date: Nov 2003
Location: New York City
Posts: 30
WildSeeker RepRank 0
Default Identifying IP Addresses

Greetings!

I have noticed that a particular IP address (198.64.149.243) is crawling my site in a way that is much more significant than what I would normally expect. What I am wondering is ... are there any tools available to help identify the source of the robot/spider? I am very interested in finding out who/what is looking at my site in such detail.

I have done very little work in the way of limiting spiders from crawling my site, much less certain pages ... I was just hoping that they would come. Well, now they seem to have found the site, and this one seems to be doing a bit of extra work. Any tips or suggestions on how best to manage spiders would be great. Do some of you just allow certain spiders, and block the rest ... if so, which are you choosing to allow?

Any ideas here would be greatly appreciated!
__________________
Regards,

Tim
WildSeeker
The Adventure Travel Resource
Reply With Quote
  #2 (permalink)  
Old 02-16-2004, 01:07 AM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

There are several on-line tools but I keep this little freeware utility from Softnik handy - IP Lookup, which supplied the following information:

(On checking, I learned that IP Lookup has been superseded by an updated but apparently still freeware program called WhoisView.)

That IP originates with Verio:

OrgName: Verio, Inc.
OrgID: VRIO
Address: 8005 South Chester Street
Address: Suite 200
City: Englewood
StateProv: CO
PostalCode: 80112
Country: US

ReferralServer: rwhois://rwhois.verio.net:4321/

NetRange: 198.63.0.0 - 198.66.255.255
CIDR: 198.63.0.0/16, 198.64.0.0/15, 198.66.0.0/16
NetName: VRIO-198-063
NetHandle: NET-198-63-0-0-1
Parent: NET-198-0-0-0-0
NetType: Direct Allocation
NameServer: NS0.VERIO.NET
NameServer: NS1.VERIO.NET
NameServer: NS2.VERIO.NET
NameServer: NS3.VERIO.NET
NameServer: NS4.VERIO.NET
Comment: *Rwhois information on assignments from this block available
Comment: at rwhois.verio.net port 4321
RegDate: 2000-07-26
Updated: 2003-08-27

TechHandle: VIA4-ORG-ARIN
TechName: Verio, Inc.
TechPhone: +1-303-645-1900
TechEmail: vipar@verio.net

OrgAbuseHandle: VAC5-ARIN
OrgAbuseName: Verio Abuse Contact
OrgAbusePhone: +1-800-551-1630
OrgAbuseEmail: abuse@verio.net

OrgNOCHandle: VSC-ARIN
OrgNOCName: Verio Support Contact
OrgNOCPhone: +1-800-551-1630
OrgNOCEmail: support@verio.net

OrgTechHandle: VIA4-ORG-ARIN
OrgTechName: Verio, Inc.
OrgTechPhone: +1-303-645-1900
OrgTechEmail: vipar@verio.net

---------------

If that isn't your host, have a look at these:

Robots in the Henhouse
Legal Decsion

Quote:
The Recorder
July 24, 2001

Verio, a competitor of an Internet domain name registration Web site, Register.com, deployed software robots to ferret out data on Register's customers. Then it used the information to peddle Verio's Internet services to those customers.
One of the suggestions about Verio's behavior was that it ignored the robots.txt file, in essence making it a "rogue robot"... see Robot Cop: robots.txt: It's the Law

more here in this Google search
Reply With Quote
  #3 (permalink)  
Old 02-16-2004, 01:11 AM
Jurgen's Avatar
WebProWorld Member
 
Join Date: Sep 2003
Location: Castle Rock, Colorado
Posts: 83
Jurgen RepRank 0
Default Re: Identifying IP Addresses

Quote:
Originally Posted by WildSeeker
I have noticed that a particular IP address (198.64.149.243) is crawling my site in a way that is much more significant than what I would normally expect. What I am wondering is ... are there any tools available to help identify the source of the robot/spider? I am very interested in finding out who/what is looking at my site in such detail.
Hello Wildseeker,

here is the information of your referenced IP address:

Quote:
OrgName: Verio, Inc.
OrgID: VRIO
Address: 8005 South Chester Street
Address: Suite 200
City: Englewood
StateProv: CO
PostalCode: 80112
Country: US

ReferralServer: rwhois://rwhois.verio.net:4321/

NetRange: 198.63.0.0 - 198.66.255.255
CIDR: 198.63.0.0/16, 198.64.0.0/15, 198.66.0.0/16
NetName: VRIO-198-063
NetHandle: NET-198-63-0-0-1
Parent: NET-198-0-0-0-0
NetType: Direct Allocation
NameServer: NS0.VERIO.NET
NameServer: NS1.VERIO.NET
NameServer: NS2.VERIO.NET
NameServer: NS3.VERIO.NET
NameServer: NS4.VERIO.NET
Comment: *Rwhois information on assignments from this block available
Comment: at rwhois.verio.net port 4321
RegDate: 2000-07-26
Updated: 2003-08-27

TechHandle: VIA4-ORG-ARIN
TechName: Verio, Inc.
TechPhone: +1-303-645-1900
TechEmail: vipar@verio.net

OrgAbuseHandle: VAC5-ARIN
OrgAbuseName: Verio Abuse Contact
OrgAbusePhone: +1-800-551-1630
OrgAbuseEmail: abuse@verio.net

OrgNOCHandle: VSC-ARIN
OrgNOCName: Verio Support Contact
OrgNOCPhone: +1-800-551-1630
OrgNOCEmail: support@verio.net

OrgTechHandle: VIA4-ORG-ARIN
OrgTechName: Verio, Inc.
OrgTechPhone: +1-303-645-1900
OrgTechEmail: vipar@verio.net

# ARIN WHOIS database, last updated 2004-02-15 19:15
# Enter ? for additional hints on searching ARIN's WHOIS database.

Rwhois server data:

%rwhois V-1.5:0078b6:00 rwhois.verio.net (Vipar 0.1a. Comments to vipar@verio.net)
network:Class-Name:network
network:Auth-Area:198.64.128.0/19
network:ID:NETBLK-W061-198-064-128.127.0.0.1/32
network:Handle:NETBLK-W061-198-064-128
network:Network-Name:W061-198-064-128
network:IP-Network:198.64.128.0/19
network:In-Addr-Server;I:NS931-HST.127.0.0.1/32
network:In-Addr-Server;I:NS1829-HST.127.0.0.1/32
network:In-Addr-Server;I:NS4208-HST.127.0.0.1/32
network:IP-Network-Block:198.64.128.0 - 198.64.159.255
network:Org-Name:Verio Advanced Hosting - Dulles
network:Street-Address:22451 Shaw Rd
network:City:Sterling
network:State:VA
network:Postal-Code:20166
network:Country-Code:US
network:Tech-Contact;I:IA17312-VRIO.127.0.0.1/32
network:Created:2002-03-13 17:07:41+00
network:Updated:2002-03-13 17:07:41+00
You can find information with the 'whois' tools. One would be: http://www.geektools.com/whois.php

There is nothing wrong for spiders to crawl your site. The only restriction are the portions of the site or files you DON'T want them to see. These parts you 'disallow' in your robots.txt file.

You also can 'disallow' certain spiders to crawl your site in your robots.txt file.

This one bans googlebot from all files on the server:

User-agent: googlebot
Disallow: /

Read more about this subject at: http://www.searchengineworld.com/rob...s_tutorial.htm

Jurgen
www.absolutelyfabulousflowers.com
Reply With Quote
  #4 (permalink)  
Old 02-16-2004, 01:13 AM
Jurgen's Avatar
WebProWorld Member
 
Join Date: Sep 2003
Location: Castle Rock, Colorado
Posts: 83
Jurgen RepRank 0
Default

Sorry Minstrel, you were quicker... :-)

Jurgen
Reply With Quote
  #5 (permalink)  
Old 02-16-2004, 01:21 AM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

Not by much! :-)

What bothered me about the Verio source and my subsequent brief search into what robots might be coming from Verio was the stuff that popped up about previous wrongdoing by Verio's 'bots - and the suggestion in one story that there bots were specifically ignoring the robots.txt file.

Now that case dates back to 2000-2001 but it wasn't THAT long ago...
Reply With Quote
  #6 (permalink)  
Old 02-16-2004, 01:32 AM
Jurgen's Avatar
WebProWorld Member
 
Join Date: Sep 2003
Location: Castle Rock, Colorado
Posts: 83
Jurgen RepRank 0
Default

Quote:
Originally Posted by minstrel
Not by much! :-)

What bothered me about the Verio source and my subsequent brief search into what robots might be coming from Verio was the stuff that popped up about previous wrongdoing by Verio's 'bots - and the suggestion in one story that there bots were specifically ignoring the robots.txt file.
I am reading your links as we speak.... Sure makes me wonder what is going on with Verio. To be honest, never heared before, but sure will have a eye on that one.

Thanks David,

Jurgen,
www.absolutelyfabulousflowers.com
Reply With Quote
Reply

  WebProWorld > Search Engines > Search Engine Optimization Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 03:43 AM.



Search Engine Optimization by vBSEO 3.3.0