Contact Us Forum Rules Search Archive
WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Site Design > Graphics & Design Discussion Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Graphics & Design Discussion Forum Post your graphics design questions/comments/ideas in here. Ask questions, post tutorials, discuss trends and best practices. Sub-forum for website accessibility and usability.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-24-2004, 02:30 AM
WebProWorld Member
 

Join Date: Jan 2004
Location: Alberta
Posts: 82
kimbecker1 RepRank 0
Default Robot Meta Tag

Should I have this tag on my site?

<META NAME="ROBOTS" content="index,follow"></META>

I only have one page that I have a robot meta tag on and it is the noindex,nofollow version.

What are the benefits of adding this tag? Are there any downfalls to adding it?

Thanks.

Kim
Reply With Quote
  #2 (permalink)  
Old 01-24-2004, 04:31 AM
WebProWorld Member
 

Join Date: Jan 2004
Location: Out There Somewhere
Posts: 37
OleTom RepRank 0
Default

Hi Kim,

<HTML><HEAD>
<TITLE>Your Title</TITLE>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
<META name="resource-type" content="document">
<META name="audience" content="all">
<META name="distribution" content="global">
<META name="robots" content="INDEX, FOLLOW">
<META name="revisit-after" content="15 days">
<META name="description" content="A nice long discription of your site ">
<META name="keywords" content="As ,many ,key ,words, as, you ,can ,think, of ,with commas ,between ,them, or short phrases,will work too">
<meta name="copyright" content="Copyright ©2003 You">
</HEAD>

This is a good one I use it and yahoo,google,aol spider my site dozens of times every day so I guess it works OK :D
__________________
Life is Good, OleTom
http://ConstructionWorkers.US
Reply With Quote
  #3 (permalink)  
Old 01-24-2004, 04:43 AM
WebProWorld 1,000+ Club
 

Join Date: Jul 2003
Location: Toronto, Canada
Posts: 2,193
cyanide RepRank 0
Default

Well, having that tag really shouldn't make any difference. It's largely being ignored. Bots are configured to follow links, plain and simple.

The only time you may need something like that is if you don't want a spider to crawl a certain directory, then you would put a robots.txt file in the root of the directory.

Ofcourse, if it makes you feel better to put that tag in, then it certainly won't hurt
__________________
|
Web Hosting Guru
| Need Help For Your Forum?
Reply With Quote
  #4 (permalink)  
Old 01-24-2004, 08:53 AM
ranjan's Avatar
WebProWorld Pro
 

Join Date: Sep 2003
Location: CA
Posts: 174
ranjan RepRank 0
Default A tutorial

Here is a tutorial on the subject

http://www.total-impact.com/reviews....wcontent&id=56
__________________
ranjan | Macromedia Certified Dreamweaver MX Developer
http://www.dreamlettes.net - a dreamweaver resource
http://www.ranjan.ws - got blog?
http://www.total-impact.com - a web design community
Reply With Quote
  #5 (permalink)  
Old 01-24-2004, 01:47 PM
WebProWorld Member
 

Join Date: Jan 2004
Location: Alberta
Posts: 82
kimbecker1 RepRank 0
Default Thanks

Thanks to you all. I appreciate your time :)

I am going to do the reading.
Reply With Quote
  #6 (permalink)  
Old 01-26-2004, 01:33 PM
EJRS.COM's Avatar
WebProWorld Veteran
 

Join Date: Dec 2003
Location: Malaysia
Posts: 814
EJRS.COM RepRank 0
Default tags you don't need

<META name="audience" content="all">
<META name="distribution" content="global">
<META name="revisit-after" content="15 days">

remove these - you don't need them, they only use up your bandwidth plus it tells teh bots to skip checking on yr site till 15 days later. can't have that can we?
Reply With Quote
  #7 (permalink)  
Old 01-26-2004, 07:05 PM
mikmik's Avatar
WebProWorld 1,000+ Club
 

Join Date: Aug 2003
Location: Edmonton, AB, Canada
Posts: 3,406
mikmik RepRank 1
Default

Hi, everyone, good advice, good link from ranjan.
I have heard that it is much better to have a robots.txt file in the root directory -ie. beside the index.html file in your site folder.
Is this right? Seems that they are supposed to be more 'spider friendly' and give you more control.

Here is my file, it is set to have the search spiders index all my html pages, but stop them (and some hackers) from 'seeing' my other folders. It is also set to stop the known e-mail harvesters - although they may ignore this and other measures should be taken.
Just copy this to a text file, as is, making sure that you have the right names for the folders (I have assets, Temp, cgi-bin blocked), and name it 'robots.txt' and upload!


User-agent: *
Disallow:


Disallow: /assets/


Disallow: /cgi-bin/


Disallow: /protect/


Disallow: /temp/

User-agent: scooter
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: grub
Disallow: /

User-agent: looksmart
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: larbin
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: Copernic
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: Googlebot-Image
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: Mozilla
Disallow: /

User-agent: mozilla
Disallow: /

User-agent: mozilla/3
Disallow: /

User-agent: mozilla/4
Disallow: /

User-agent: mozilla/5
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
Disallow: /

User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: WebmasterWorldForumBot
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: Wget
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: QueryN Metasearch
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: Bookmark search tool
Disallow: /

User-agent: GetRight/4.2
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: WebmasterWorld Extractor
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: Oracle Ultra Search
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: searchpreview
Disallow: /

User-agent: sootle
Disallow: /

User-agent: es
Disallow: /

User-agent: Enterprise_Search/1.0
Disallow: /

User-agent: Enterprise_Search
Disallow: /
__________________
What I am is what I am, are you what you are, or what.
Eddie Brickel
Reply With Quote
  #8 (permalink)  
Old 01-27-2004, 12:50 AM
WebProWorld Pro
 

Join Date: Oct 2003
Location: Alberta, Canada
Posts: 233
weegillis RepRank 1
Default

Quote:
User-agent: *
Disallow:
Disallow: /assets/
Disallow: /cgi-bin/
Disallow: /protect/
Disallow: /temp/
Mike,

Since you're not disallowing any, wouldn't the above be enough to include in the robots.txt file with the same effect?
__________________
Volunteer for something in your community today!
Reply With Quote
  #9 (permalink)  
Old 01-27-2004, 01:21 AM
WebProWorld Member
 

Join Date: Jan 2004
Location: Colorado
Posts: 59
disciple RepRank 0
Default

Mic

If you are using a linux maching you could do this in your htaccess file.

<Limit GET PUT POST>
Order Allow,Deny
Allow from all
</Limit>

RewriteEngine on

#The next lines check for Email Spammers Robots and redirect them to a fake page
RewriteCond %{HTTP_USER_AGENT} ^Alexibot [OR]
RewriteCond %{HTTP_USER_AGENT} ^asterias [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Black.Hole [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlowFish [OR]
RewriteCond %{HTTP_USER_AGENT} ^BotALot [OR]
RewriteCond %{HTTP_USER_AGENT} ^BuiltBotTough [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bullseye [OR]
RewriteCond %{HTTP_USER_AGENT} ^BunnySlippers [OR]
RewriteCond %{HTTP_USER_AGENT} ^Cegbfeieh [OR]
RewriteCond %{HTTP_USER_AGENT} ^CheeseBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^CopyRightCheck [OR]
RewriteCond %{HTTP_USER_AGENT} ^cosmos [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DittoSpyder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^EroCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Foobot [OR]
RewriteCond %{HTTP_USER_AGENT} ^FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^Harvest [OR]
RewriteCond %{HTTP_USER_AGENT} ^hloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^httplib [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^humanlinks [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoNaviRobot [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JennyBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Kenjin.Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Keyword.Density [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^libWeb/clsHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkextractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkScan/8.1a.Unix [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mata.Hari [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister.PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^moget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.Mozilla/2.01 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline.Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProPowerBot/2.14 [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^QueryN.Metasearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^RepoMonkey [OR]
RewriteCond %{HTTP_USER_AGENT} ^RMA [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpankBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^spanner [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^suzuran [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz/1.4 [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT} ^The.Intraformant [OR]
RewriteCond %{HTTP_USER_AGENT} ^TheNomad [OR]
RewriteCond %{HTTP_USER_AGENT} ^TightTwatBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Titan [OR]
RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^True_Robot [OR]
RewriteCond %{HTTP_USER_AGENT} ^turingos [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot/1.5 [OR]
RewriteCond %{HTTP_USER_AGENT} ^URLy.Warning [OR]
RewriteCond %{HTTP_USER_AGENT} ^VCI [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEnhancer [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.Image.Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebmasterWorldForumBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website.Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster.Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW-Collector-E [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu's [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ emailsforyou.php [L]


Here is the php file for the rule:

Code:
<?
# $Author: zx $ 
# $Date: 2003/08/01 20:24:46 $
/* 
    This file generate a list of 100 email address faked, 
    useful to create some troubles to stupid spam bots!
*/
echo "This server does not accept known web page mining tools.  If you set your user-agent to one of those types, please remove it so you may access the
web sites again. 

--Webmaster

";

srand ((double) microtime() * 1000);
$indirizzi= array('ping', 'pong', 'fsck', 'fling', 'sbum', 'pang', 'dumdedum', 'homer', 'simpson', 'bart', 'curvaceous', 'anointed','kooyanisqatsi', 'quitter', 'elmerfoodbeat', 'oingoboing', 'garmalina', 'osperitizia', 'formenterol', 'lamp', 'tmp', 'dump', 'newbie', 'n00b', 'email', 'overload', 'chat', 'calf', 'high', 'trickster');
for($i=0; $i<100; $i++)
{
    $mail=$indirizzi[rand(0,count($indirizzi)-1)].'@'.$indirizzi[rand(0,count($indirizzi)-1)].$indirizzi[rand(0,count($indirizzi)-1)].$indirizzi[rand(0,count($indirizzi)-1)].$indirizzi[rand(0,count($indirizzi)-1)].'.com';
    echo "<a href=mailto:$mail>$mail</a> ";
}
# $Log: emailsforyou.php,v $
# Revision 1.5  2003/08/01 20:24:46  zx
# CVS Keyword Repair
#
?>
Courtesy of Nuke Cops and phpNuke.
__________________
Christian Web Hosting
Digitals
Reply With Quote
  #10 (permalink)  
Old 01-27-2004, 03:49 AM
minstrel's Avatar
WebProWorld 1,000+ Club
 

Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 3,619
minstrel RepRank 0
Default

Mik:

That's gotta be the longest robots.txt exclusion list I've ever seen - many if them I don't even recognize but a couple caught my eye:

For example --

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

Xenu is a popular freeware website link-checker, one I've been using for at least 3-4 years now, typically on a monthly basis when i have time.

If you disallow Xenu and I have a link to your website, it would seem to me that Xenu will report your site as a dead link, or one with a problem. Depending on how busy I am that day/week, or how important I think that link is for my site, I might investigate it manually -- or, I might just delete the link. This may not be desirable for your website in terms of Google PR...

Some of the others that I didn't recognize made me wonder if they don't fall into that category...

It might be worth going through your list and double-checking each entry against its corresponding service.
Reply With Quote
  #11 (permalink)  
Old 01-27-2004, 04:27 AM
mikmik's Avatar
WebProWorld 1,000+ Club
 

Join Date: Aug 2003
Location: Edmonton, AB, Canada
Posts: 3,406
mikmik RepRank 1
Default

Quote:
Quote:
User-agent: *
Disallow:
Disallow: /assets/
Disallow: /cgi-bin/
Disallow: /protect/
Disallow: /temp/


Mike,

Since you're not disallowing any, wouldn't the above be enough to include in the robots.txt file with the same effect?
That is exactly how I had it until I read that tutorial!

Minstrel wrote:
Quote:
Xenu is a popular freeware website link-checker, one I've been using for at least 3-4 years now, typically on a monthly basis when i have time.
It's you!
Thanks, I got this list and posted it right away, I don't even have it uploaded yet, ny server went dead when I tried!

disciple wrote :
Quote:
If you are using a linux maching you could do this in your htaccess file.

<Limit GET PUT POST>
Order Allow,Deny
Allow from all
</Limit>
( I was going tp put it all in but I don't want to burn out THIS WPW Server, also ;o>)

I an on 2K IIS, I have to go through the 'ticket' system to do this simple thing - .htaccess.

Could I just do the same with a 'user agent, allow:(index?) /gotchasucker.php, ?

Oh, are those Mozilla guys all spiders? I was wondering about more than a few of those, but with GPL stuff and robot scripts everywhere, I am not sure what is what!

Mozilla/4.0 (compatible; MSIE 5.0; Windows NT) = Lycos!
__________________
What I am is what I am, are you what you are, or what.
Eddie Brickel
Reply With Quote
  #12 (permalink)  
Old 01-27-2004, 11:31 AM
WebProWorld Member
 

Join Date: Jan 2004
Location: Colorado
Posts: 59
disciple RepRank 0
Default

Quote:
Xenu is a popular freeware website link-checker, one I've been using for at least 3-4 years now, typically on a monthly basis when i have time.

If you disallow Xenu and I have a link to your website, it would seem to me that Xenu will report your site as a dead link, or one with a problem. Depending on how busy I am that day/week, or how important I think that link is for my site, I might investigate it manually -- or, I might just delete the link. This may not be desirable for your website in terms of Google PR...
Are you using Xenu to spider other peoples sites? There are many link validation programs available to check reciprocal links. I use Xenu on occasion but only to help me create my own site map pages.

When Xenu comes along like any other spider or bot it eats your bandwidth and how many of these spiders or bots are worth your bandwidth? I think this is the real question.

You can always add to or delete from the list as you see fit.

I just supplied thes as information for the masses if you wish to use it fine, if you don't fine. In our business as in any other the more knowledge you have OR have access to the better off you will be.
__________________
Christian Web Hosting
Digitals
Reply With Quote
  #13 (permalink)  
Old 01-27-2004, 12:12 PM
minstrel's Avatar
WebProWorld 1,000+ Club
 

Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 3,619
minstrel RepRank 0
Default

Quote:
Originally Posted by disciple
Are you using Xenu to spider other peoples sites?
No... I use it to check that links already on my site have not gone dead... I'm not spidering anybody.

Quote:
You can always add to or delete from the list as you see fit. I just supplied thes as information for the masses if you wish to use it fine, if you don't fine. In our business as in any other the more knowledge you have OR have access to the better off you will be.
disciple, I wasn't criticizing you or anybody else - in fact, my post was a reply to MikMik and it was simply to ask the question about why he was trying to prevent a link-checker from looking at his site? Given that links to your site are helpful, at least in ranking well in Google, I was suggesting that might have a negative impact on one's website and giving an example of my own practices in using the link-checker for my website.
Reply With Quote
  #14 (permalink)  
Old 01-27-2004, 12:41 PM
WebProWorld Member
 

Join Date: Jan 2004
Location: Colorado
Posts: 59
disciple RepRank 0
Default

Quote:
disciple, I wasn't criticizing you or anybody else
I did not take your post as criticism.
__________________
Christian Web Hosting
Digitals
Reply With Quote
Reply

  WebProWorld > Site Design > Graphics & Design Discussion Forum
Tags: , ,



Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Search Engine Optimization by vBSEO 3.2.0