Submit Your Article Forum Rules

Page 2 of 2 FirstFirst 12
Results 11 to 18 of 18

Thread: Support for creating robots txt against bad bots

  1. #11
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    9,028
    [quote="edhan"]
    Quote Originally Posted by Andilinks
    [code]As said by Andilinks, using the RewriteEngine will be better.
    I am already doing that. Thanks everybody.

  2. #12
    Senior Member
    Join Date
    Sep 2005
    Posts
    254
    Still, this ain't gonna keep'em all away because a really mean bot might shed its skin and fake the UserAgent anyway...
    I've never really looked into blocking bad bots as it's not that much of a problem on the sites i've worked on but i'm sure you can also block by ip ranges. So you could do primary checks by user agents and any that get through could be checked by ip address. Still not 100% foolproof tho...

  3. #13
    Junior Member solecist's Avatar
    Join Date
    Sep 2003
    Posts
    23

    had to block by IP

    I had to put an IP blocking scheme in my blog's comment folder to block some spambots - its a PITA - but you have to have several layers to get them

  4. #14
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    9,028

    Re: had to block by IP

    Quote Originally Posted by solecist
    I had to put an IP blocking scheme in my blog's comment folder to block some spambots - its a PITA - but you have to have several layers to get them
    I use for my blog "Spam Karma 2", and I am very happy. :)

  5. #15
    Member
    Join Date
    Mar 2006
    Posts
    39

    htaccess

    hi there,

    just to be onb the safe side, andilinks said that adding that piece of code to htaccess will help avoid 'bad bot'

    RewriteEngine on
    SetEnvIf User-Agent ^FunWebProducts bad_bot=1
    deny from env=bad_bot

    by substituting FunWebProducts with the user-agent name? Do I understand correctly?

    So, if I want to avoid '8484 Boston Project v 1.0 1836' then I would write

    RewriteEngine on
    SetEnvIf User-Agent ^8484 Boston Project v 1.0 1836 bad_bot=1
    deny from env=bad_bot

    ??
    thanks

  6. #16
    Senior Member Andilinks's Avatar
    Join Date
    Feb 2004
    Posts
    752
    The "^" symbol indicates "begins with," so "^f" would block all user-agents beginning with "f." "^fun" would block all user-agents that begin with "fun" and so on, getting increasingly selective with additional characters. I'm not sure what a "space" character would do here, maybe someone who is more familiar with the Apache mod-rewrite coding can jump in here with the answer to that.

    Like I said above the server is very unforgiving of errors so it would pay to be careful. I have successfully used this code without any spaces in the user-agent name.

    Adding lines to the .htaccess file does cause additional processing time for every page served so it would not be wise to just block a wholesale list of bots as suggested in the original post. It is better to watch for specific bots that misbehave and block them individually. I have successfully run with an .htaccess file as large as 24K but not only does it slow down delivery, too many lines can unbalance and crash the server so it pays to be very careful with it. I currently try to keep my .htaccess file under 8K.

    Again, maybe a sysadmin type guy could add some more authoritative opinion here, I'm just a site owner. :)
    ...the Rockies may tumble, Gibralter may crumble... G & I Gershwin, 1937

  7. #17
    Member
    Join Date
    Mar 2006
    Posts
    39

    well...

    thank you...

  8. #18
    Junior Member
    Join Date
    Jan 2007
    Posts
    1
    Quote Originally Posted by Andilinks
    Code:
    RewriteEngine on 
    SetEnvIf User-Agent ^FunWebProducts bad_bot=1
    deny from env=bad_bot
    use this
    SetEnvIfNoCase

    to get also ^funwebproducts

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Preventing Bad bots (robots)
    By sck4784 in forum Internet Security Discussion Forum
    Replies: 5
    Last Post: 05-24-2007, 05:07 PM
  2. Help in creating a robots file
    By justinw in forum Search Engine Optimization Forum
    Replies: 2
    Last Post: 03-28-2004, 10:21 AM
  3. Shallow Bots - Deep Bots?
    By jonathan-uk in forum Google Discussion Forum
    Replies: 1
    Last Post: 02-01-2004, 08:32 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •