 |

03-01-2005, 11:54 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Nov 2003
Location: mid south USA
Posts: 385
|
|
What is a grabber?
I'm trying very hard to get things set up on my site so the bad bots can't harvest things and so people can't steal my work.
In checking my stats under "browser types" I found four listings that were designated as "grabbers". They are:
Wget, Curl, Acrobat, and WebCopier
Do I want to ban these from my site, and if so how do I do it? If it is through .htaccess please give me the exact code to add to my existing .htaccess pages. I have one in each directory, so can put the code in all of them.
I don't think it should go in robots.txt because these were listed under browsers, but if it should go in there also please let me know, and how to code it.
And speaking of robots.txt -- Google has garnered all of my images (thousands of them) and people are clicking on them and then using them -- sometimes hotlinking to them. Do I want to ban the Google image robot, and if so will this hurt my rankings? I certainly do NOT want to ban the regular Google bot!!!!
PLEASE do not tell me to use a code to prevent hotlinking. I tried four versions and none of them would work. My CP has a link to click to stop hotlinking also, but when I activate it no images will load. At all. On my own pages. I deal with hotlinking by moving my graphics often. It's a lot of work, but it does work for me when nothing else will.
|

03-01-2005, 12:24 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Jul 2003
Location: UK
Posts: 2,803
|
|
Banning unwanted crawls
Hi there Weedy Lady,
There are several questions in your post. I'm not going to attempt to answer them all, but I'll give you a few quick replies to get the ball rolling.
A grabber is an automated crawler/spider. Also known as a Web scraper or Screen scraper. They can be used legitimately, to index links from news sites and construct RSS feeds. However, there is always a flip-side to everything! The worst examples of 'grabbers' simply scour a site's content with the purpose of extracting email addresses and contact information.
I don't have a list of malicious grabbers / scrapers, but I'm sure other members will be able to supply a few names. I would imagine it's unlikely that any such program has been written to obey the robots exclusion rules, so as you say, adding them to your site's .htaccess file could well be your best bet.
You may find the following page helpful:
How to block spambots, ban spybots, and tell unwanted robots to go to hell
Regarding Google's spidering of images. The bot Google uses to index images is called Googlebot-Image. The robots.txt file can be modified to control the bot's activity. You can choose to ban the bot from a specific directory as below:
Code:
User-Agent: Googlebot-Image
Disallow: /images/
Or alternatively, you could ban the bot from your site altogether.
Code:
User-Agent: Googlebot-Image
Disallow: /
I can't say for sure whether or not this would affect your rankings, but I would very much doubt it!
|

03-02-2005, 05:38 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 947
|
|
Hello Weedy Lady,
there's a page over at Google with information on that:
"Remove an image from Google's Image Search"
http://www.google.com/remove.html#images
hth,
Alex
|

03-02-2005, 08:53 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Nov 2003
Location: mid south USA
Posts: 385
|
|
thanks
Thanks to both for the information about the google images.
I've found some baddies in my stats that I would like to ban in .htaccess as well as in my robots.txt file, but I do that through my CP and it puts them into my .htaccess file. In order to do that I need url or full domain name rather than just name of the robot.
When looking at my stats I find either IP or name for visitors, but can't seem to find IP for the robots. I tried looking them up on Who Is, but don't know if the information I got was for the bot companies or for other companies who are unfortunate to have a domain name that matches the bot name. Can someone give me the IP of the following?
wget (all versions)
WebCopier
Web Image Collector
Curl
Would also like to ban MSFront Page and Acrobat, but don't know how to do those either. I have looked and looked on line and all the instruction pages say you have to modify code on the server. I don't have my own server so can't do this.
Can someone provide IPs for the above 4?
Thanks
|

03-02-2005, 09:09 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 947
|
|
Re: thanks
Quote:
|
Originally Posted by Weedy Lady
wget (all versions)
WebCopier
Web Image Collector
Curl
[...]
Can someone provide IPs for the above 4?
Thanks
|
wget is a retrieval program
http://directory.fsf.org/wget.html
--> no robot, no fixed IP here
WebCopier is a retrieval program
http://www.maximumsoft.com/
--> no robot, no fixed IP here
pretty much the same with the other two.
Just do a google search - you find them.
To cut it short: You can't ban them by IP, you have to ban them by agent ID. Although you do not have a dedicated server, this is not needed, as it may be possible to modify your .htaccess accordingly.
As for the robots.txt, I suggest to put all your pictures in a subdirectory and exclude all robots form that directory.
Alex
|

03-02-2005, 09:21 AM
|
 |
WebProWorld Veteran
|
|
Join Date: Nov 2003
Location: mid south USA
Posts: 385
|
|
ban to sub directories
I have 5 sub directories with graphics, and one with music files. I change the names of these directories quite often to keep hotlinkers away and take the images off if they have been hotlinked. I would have to change my robots.txt file each time I did this. I can do it........the only problem will be to remember to do it.
I have tried all variations of the hotlinking ban scripts and they just do not work for me. I copied and pasted so did not do them wrong. They just won't work. One of them worked for one check that I did, and then stopped working.
If that is the best way to keep the robots out of my graphics I guess I'll have to do it.
None of this is easy, is it?????
|

03-02-2005, 08:58 PM
|
|
WebProWorld New Member
|
|
Join Date: Feb 2005
Posts: 2
|
|
Re: ban to sub directories
You might like to look at the Copysentry services provided at http://www.copyscape.com/
But at the end of the day if they want to knock off your stuff badly enough, they will.
To a certain extent it's a bit like steganography and digital watermarking a nice idea but probably not worth the effort. Or to put it another way don't let the effort of protecting your work distract you from the main game.
BTW which CP are you using. If it works the way you describe it is an install issue and your hosting company should be able to fix it or at least post a bug report with the CP developer.
Good luck
|

03-02-2005, 10:10 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Nov 2003
Location: mid south USA
Posts: 385
|
|
to magic2147
Thanks for the link. I'll definitely look at it.
It isn't a problem with the CP. My hosting company says the hotlinking function works fine with other sites. I think it's the way I have mine set up.....or rather the way my coding has to be because of that. I have to put the FULL url path to any files not in the same folder (all graphics, all music, and several other things), and that makes them all look like links from outside sites.
I set it up with too many folders, and so many of my pages have really good rankings on the SEs that -- if I would change the whole structure now, it means I would have to do hundreds of redirects.
If ONLY I HAD the WPW forum and had read all this good stuff six years ago before I started my site, I definitely would have followed all the great web site design advice in the first place. Now I fear I have a monster that I just have to deal with. You don't even want to know what my directory structure is!
I truly appreciate everyone's help, because it has enabled me to make a lot more sense out of things, and I am now much more organized than before.
|

03-02-2005, 10:18 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Nov 2003
Location: mid south USA
Posts: 385
|
|
re the link to copy sentry
Copy Sentry sounds like a great service, but it would cost me $200 per month. I don't make that much. In fact, I don't really make enough to cover all my expenses. Nice at income tax time.
Obviously, I'm not doing this for the money. I just treat it as if I were.
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|