 |

02-03-2008, 07:58 PM
|
|
WebProWorld Member
|
|
Join Date: Apr 2004
Location: N.E.
Posts: 39
|
|
Excluding the "good" BOTS from /images folder
Hi again all! I had an issue way back with MSN trying to index my site from my /images/ folder due to the presence of an index.htm file
Since then I have excluded the MSN bot from the /images/ and VIOLA! my site is now being indexed properly!
Today I discovered that Google has my /images/index.htm page in a result for a keyword of mine. Mind you it's a pretty obscure keyword but nonetheless, it's been indexed. I see no evidence of Google trying to find my content within the /images/ folder as the MSN bot kept trying.
Question: Should I exclude the /images/ directory from all bots? If I do that then won't the cached pages not show the images? I tried to disable hotlinking once and the cached pages looked awful! Would you exclude the /images/ folder or is there another way that I can stop the contents of said folder from being displayed when some snarky user wants to see the images in the directory?
Many thanks!
Maggie
|

02-04-2008, 10:23 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,194
|
|
Re: Excluding the "good" BOTS from /images folder
Hi Maggie,
Unless there's some reason for excluding your images from being indexed, it might just help you gain some qualified and relevant traffic. This is especially true if you're running a shopping cart.
What does it mean to opt-in to enhanced image search?
If you choose to opt in to enhanced image search, Google may use tools such as Google Image Labeler to associate the images included in your site with labels that will improve indexing and search quality of those images.
To opt in to enhanced image search:- Sign into Google webmaster tools with your Google Account.
- Click the URL for the site you want.
- Click Tools, and then click Enable enhanced image search.
Once you have opted in to enhanced image search, you can opt out at any time by returning to this page and clearing the checkbox.
Source: What does it mean to opt-in to enhanced image search?
|

02-04-2008, 10:26 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: May 2005
Location: Norway
Posts: 4,948
|
|
Re: Excluding the "good" BOTS from /images folder
If you know the bot's you wan't to exclude and you are on an Apache server, you can put a
.htaccess
file in that folder with the following content.
order deny,allow
allow from all
deny from
.............
You can also use robots.txt, but I have to look up the commands since I do not remember them. And .htaccess is on a lower level and as such more secure.
|

02-04-2008, 03:09 PM
|
|
WebProWorld Member
|
|
Join Date: Apr 2004
Location: N.E.
Posts: 39
|
|
Re: Excluding the "good" BOTS from /images folder
Thanks all! I have used the robots.txt to exclude the MSNBot from the /images folder
I'd hate to exclude GoogleBot and have it impact my results. I'd also hate to get zinged for a duplicate because I have the index.htm folder in all my subdirectories so that no one can view the contents. I do this so people can't bring up the contents of the directory, esp. the images directory.
Thanks again!
Maggie
|

02-04-2008, 04:04 PM
|
 |
WebProWorld Pro
|
|
Join Date: May 2004
Location: Biarritz, France
Posts: 153
|
|
Re: Excluding the "good" BOTS from /images folder
Quote:
Originally Posted by memaggiem
Hi again all!
|
Hi Maggie!
Quote:
Originally Posted by memaggiem
Since then I have excluded the MSN bot from the /images/ and VIOLA! my site is now being indexed properly!
|
I suppose you meant "VOILA". What you wrote is a complete different thing (verb)...
|

02-04-2008, 04:16 PM
|
 |
WebProWorld Pro
|
|
Join Date: Jul 2003
Location: Baltimore, MD
Posts: 121
|
|
Re: Excluding the "good" BOTS from /images folder
I have excluded my /images folders at all web sites inside of a robot.txt file. First of all, I don't want my images to be indexed by the search engines, and then reused by the other web folks. Second of all, I found no problem with it since I have restricted it about 4 years ago.
I'd be glad to find out what the other folks are thinking about the impact of restricting...
|

02-04-2008, 04:21 PM
|
|
WebProWorld Member
|
|
Join Date: Sep 2007
Posts: 47
|
|
Re: Excluding the "good" BOTS from /images folder
I usually separate out product images from the site images into two different folders and restrict site images via robots.txt. That way you get a bit of search action going from the product images, and your site images are at least a little less out there, though anyone who knows how can still snag them.
|

02-04-2008, 04:54 PM
|
 |
WebProWorld New Member
|
|
Join Date: Jan 2008
Posts: 11
|
|
Re: Excluding the "good" BOTS from /images folder
Hi memaggiem,
This is the robots.txt files I'm using on my website;
# robots.txt for Example Web Page
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
Disallow: /foo.html
Disallow: /images
I hope this works for you, keep us posted!
Thanks,
GizGaz -=*L*=-
|

02-04-2008, 05:12 PM
|
|
WebProWorld Member
|
|
Join Date: Apr 2004
Location: N.E.
Posts: 39
|
|
Re: Excluding the "good" BOTS from /images folder
Quote:
Originally Posted by effisk
|
Hmmm.....that's not really helpful but thanks for the spelling and grammar lesson
I think that I may not have communicated exactly what I'm trying to do. Sorry  I give it another try! I'm still learning.
I don't want the Googlebot to index the index.htm page that is within any of my subdirectories. I have found that Google has indexed my /images/index.htm file.
When the MSNBot did this, it ended up trying to find ALL of my content within the /images/ folder - one can only guess as to why. I took a big nosedive from MSN due to this and had to exclude the /images folder from MSNBot within my robots.txt file.
So without excluding the /images folder from Google, how might I stop people from viewing the contents of the /images folder if I don't have the index.htm file there? I don't sell anything so I cannot separate images into folders as someone mentioned (that would be a great idea tho!) I don't want to take a bad hit by excluding the Googlebot from the /images folder either!
Gosh, I'm not sure that makes any more sense than what I originally wrote
Thanks for your patience and generosity in replying!!!
Maggie
|

02-04-2008, 05:38 PM
|
|
WebProWorld New Member
|
|
Join Date: Dec 2004
Posts: 13
|
|
Re: Excluding the "good" BOTS from /images folder
memaggiem this should do what your asking and not cause any problems with any of the bots.
User-agent: *
Disallow:/subdirectory/index.htm
or
User-agent: *
Disallow:/index.htm
But that might stop your home page from being indexed depending on site design.
As to the bigger question of allowing images to be indexed, we do not allow any of our images to be indexed. We do our own photography of products and want to protect them as much as possible so we completely disallow our images folder in our robots.txt file. This does not prevent the images from showing in the cache on the search engines as they use links to our images on our site instead of caching the images them selves.
|

02-04-2008, 07:47 PM
|
|
WebProWorld Member
|
|
Join Date: Feb 2004
Location: Stupid question. At my PC.
Posts: 99
|
|
Re: Excluding the "good" BOTS from /images folder
No spider should ever find the index.htm file in a subfolder unless there is a link going to that folder, perhaps a broken link like http://www.mydomain.com/images/
Although the robots.txt file is useful, it is not reliable. I've seen Google and Yahoo index pages that were excluded properly in the robots.txt.
The method that I use that thus far has been 100% reliable is to use the robots meta tag with a noindex attribute. Whenever I use an index.htm file to protect a directory, it includes the following:
<meta name="robots" content="noindex,follow">
I also include a link back to the home page.
|

02-04-2008, 09:14 PM
|
|
WebProWorld Member
|
|
Join Date: Apr 2004
Location: N.E.
Posts: 39
|
|
Re: Excluding the "good" BOTS from /images folder
Quote:
Originally Posted by Penman
we completely disallow our images folder in our robots.txt file. This does not prevent the images from showing in the cache on the search engines as they use links to our images on our site instead of caching the images them selves.
|
GREAT! This is exactly what I was hoping for!
and TechEvangelist - I will do the META tag within the index.htm in the /images/ directory just for good measure too!
Thanks all very very much!
Maggie
Now, if anyone wants to take a gander at why MSN kept looking for my content within the /images/ folder when it found the index.htm file - not that I really care; it's just something I find very odd!
|

02-05-2008, 01:32 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,194
|
|
Re: Excluding the "good" BOTS from /images folder
I wouldn't even bother with excluding the index.htm files located in your subdirectories via the robots.txt file.
Just use the following Meta tags in each document and you'll be good to go.
Code:
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<META NAME="ROBOTS" CONTENT="NONE">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
|

02-05-2008, 07:19 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Oct 2003
Location: Encinitas, CA
Posts: 1,908
|
|
Re: Excluding the "good" BOTS from /images folder
Why don't you simply change the permissions (chmod) of the image directory from whatever it is (probably 644 or 755) to 711 and remove the index file? That will prevent browsing the contents of a directory.
__________________
DrTandem's San Diego Web Page Design, drtandem.com
|

02-05-2008, 07:48 AM
|
 |
WebProWorld Member
|
|
Join Date: Jan 2007
Posts: 45
|
|
Re: Excluding the "good" BOTS from /images folder
You can add the following in a .htaccess in the image directory:
RewriteEngine on
RewriteRule ^$ http://your-site/ [R,NC]
What the above directive does is it sends any visitors who lands in the image folder to your site in a redirect. But it does not affect viewing of other files located in this directory - if they call the file by name.
All the above presupposes that you are on a *nix /Apache server.
You don;t need to keep an index.htm in this case.
|

02-05-2008, 11:20 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,194
|
|
Re: Excluding the "good" BOTS from /images folder
Quote:
Originally Posted by DrTandem1
Why don't you simply change the permissions (chmod) of the image directory from whatever it is (probably 644 or 755) to 711 and remove the index file? That will prevent browsing the contents of a directory.
|
Yeah, I thought of that too but it's often easier said than done for some folks.
Good suggestions though!
|

05-20-2008, 09:19 AM
|
|
WebProWorld New Member
|
|
Join Date: May 2008
Posts: 1
|
|
Re: Excluding the "good" BOTS from /images folder
Hi there,
I have been using robots.txt files for my website like GizGaz, it has worked fine for me.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|