View Single Post
  #6 (permalink)  
Old 11-24-2006, 06:52 PM
kgun's Avatar
kgun kgun is offline
WebProWorld 1,000+ Club
 

Join Date: May 2005
Location: Norway
Posts: 5,066
kgun RepRank 3kgun RepRank 3
Default

Quote:
Originally Posted by Webnauts
Kjell I did not follow the links yet, but do you want to say that Google, MSN and Yahoo do not obey to these commands?

User-agent: Googlebot
Disallow: /

User-agent: MSNBot
Disallow: /

User-agent: Slurp
Disallow: /
Look at the banned-ip.xml file at Gary Keiths Browser Capabilities project.

Some other resources:
IP Addresses of Search Engine Spiders

CrawlWall, the firewall for webpages.

Now when I start with a new website I start with making:
  • A robots.txt file.
  • A .hataccess file that block a lot of bad bots.
  • Configuration settings in the .htaccess file for PHP so it is porable using eg:

    php_value include_path "Path string"

    A useful little script:

    <?php
    echo ( '<pre>' );
    echo 'DOCUMENT_ROOT = ' . $_SERVER['DOCUMENT_ROOT'] ;
    echo ( '</pre>' );
    echo ( '<pre>' );
    echo 'Include_path = ' . ini_get('include_path') . "\n";
    echo ( '</pre>' );
    ?>
  • So before I write any content and make any markup, I get control over my part of the server where my sites are hosted.

That saves me time, bandwith, referrer spam and last but not least cleaner and easier to read logs.

Reccomendation: When you start on a new WebSite, start by making a firewall around it.
Reply With Quote