WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Search Engines > Search Engine Optimization Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-03-2008, 07:58 PM
WebProWorld Member
 

Join Date: Apr 2004
Location: N.E.
Posts: 39
memaggiem RepRank 0
Default Excluding the "good" BOTS from /images folder

Hi again all! I had an issue way back with MSN trying to index my site from my /images/ folder due to the presence of an index.htm file

Since then I have excluded the MSN bot from the /images/ and VIOLA! my site is now being indexed properly!

Today I discovered that Google has my /images/index.htm page in a result for a keyword of mine. Mind you it's a pretty obscure keyword but nonetheless, it's been indexed. I see no evidence of Google trying to find my content within the /images/ folder as the MSN bot kept trying.

Question: Should I exclude the /images/ directory from all bots? If I do that then won't the cached pages not show the images? I tried to disable hotlinking once and the cached pages looked awful! Would you exclude the /images/ folder or is there another way that I can stop the contents of said folder from being displayed when some snarky user wants to see the images in the directory?

Many thanks!
Maggie
Reply With Quote
  #2 (permalink)  
Old 02-04-2008, 10:23 AM
Dubbya's Avatar
WebProWorld 1,000+ Club
 

Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,194
Dubbya RepRank 3Dubbya RepRank 3
Default Re: Excluding the "good" BOTS from /images folder

Hi Maggie,

Unless there's some reason for excluding your images from being indexed, it might just help you gain some qualified and relevant traffic. This is especially true if you're running a shopping cart.

What does it mean to opt-in to enhanced image search?

If you choose to opt in to enhanced image search, Google may use tools such as Google Image Labeler to associate the images included in your site with labels that will improve indexing and search quality of those images.

To opt in to enhanced image search:
  1. Sign into Google webmaster tools with your Google Account.
  2. Click the URL for the site you want.
  3. Click Tools, and then click Enable enhanced image search.
Once you have opted in to enhanced image search, you can opt out at any time by returning to this page and clearing the checkbox.

Source: What does it mean to opt-in to enhanced image search?
__________________
Printer ink, inkjet & toner cartridges in Canada
"Price-wise printing supplies"
inkjetOasis.ca
Reply With Quote
  #3 (permalink)  
Old 02-04-2008, 10:26 AM
kgun's Avatar
WebProWorld 1,000+ Club
 

Join Date: May 2005
Location: Norway
Posts: 4,948
kgun RepRank 3kgun RepRank 3
Default Re: Excluding the "good" BOTS from /images folder

If you know the bot's you wan't to exclude and you are on an Apache server, you can put a

.htaccess

file in that folder with the following content.

order deny,allow
allow from all
deny from

.............

You can also use robots.txt, but I have to look up the commands since I do not remember them. And .htaccess is on a lower level and as such more secure.
Reply With Quote
  #4 (permalink)  
Old 02-04-2008, 03:09 PM
WebProWorld Member
 

Join Date: Apr 2004
Location: N.E.
Posts: 39
memaggiem RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

Thanks all! I have used the robots.txt to exclude the MSNBot from the /images folder
I'd hate to exclude GoogleBot and have it impact my results. I'd also hate to get zinged for a duplicate because I have the index.htm folder in all my subdirectories so that no one can view the contents. I do this so people can't bring up the contents of the directory, esp. the images directory.

Thanks again!
Maggie
Reply With Quote
  #5 (permalink)  
Old 02-04-2008, 04:04 PM
effisk's Avatar
WebProWorld Pro
 

Join Date: May 2004
Location: Biarritz, France
Posts: 153
effisk RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

Quote:
Originally Posted by memaggiem View Post
Hi again all!
Hi Maggie!
Quote:
Originally Posted by memaggiem View Post
Since then I have excluded the MSN bot from the /images/ and VIOLA! my site is now being indexed properly!
I suppose you meant "VOILA". What you wrote is a complete different thing (verb)...
Reply With Quote
  #6 (permalink)  
Old 02-04-2008, 04:16 PM
zbatia's Avatar
WebProWorld Pro
 

Join Date: Jul 2003
Location: Baltimore, MD
Posts: 121
zbatia RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

I have excluded my /images folders at all web sites inside of a robot.txt file. First of all, I don't want my images to be indexed by the search engines, and then reused by the other web folks. Second of all, I found no problem with it since I have restricted it about 4 years ago.
I'd be glad to find out what the other folks are thinking about the impact of restricting...
__________________
The Cyber Teacher
http://www.rtek2000.com
http://www.800-webdesign.com/web-master-links.html -Free Web Master's Resources
_________________
Reply With Quote
  #7 (permalink)  
Old 02-04-2008, 04:21 PM
WebProWorld Member
 

Join Date: Sep 2007
Posts: 47
DoneInStyle RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

I usually separate out product images from the site images into two different folders and restrict site images via robots.txt. That way you get a bit of search action going from the product images, and your site images are at least a little less out there, though anyone who knows how can still snag them.
Reply With Quote
  #8 (permalink)  
Old 02-04-2008, 04:54 PM
GizGaz's Avatar
WebProWorld New Member
 

Join Date: Jan 2008
Posts: 11
GizGaz RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

Hi memaggiem,

This is the robots.txt files I'm using on my website;

# robots.txt for Example Web Page

User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
Disallow: /foo.html
Disallow: /images

I hope this works for you, keep us posted!

Thanks,

GizGaz -=*L*=-
__________________
Sirius Money
Reply With Quote
  #9 (permalink)  
Old 02-04-2008, 05:12 PM
WebProWorld Member
 

Join Date: Apr 2004
Location: N.E.
Posts: 39
memaggiem RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

Quote:
Originally Posted by effisk View Post
Hi Maggie!
I suppose you meant "VOILA". What you wrote is a complete different thing (verb)...
Hmmm.....that's not really helpful but thanks for the spelling and grammar lesson


I think that I may not have communicated exactly what I'm trying to do. Sorry I give it another try! I'm still learning.

I don't want the Googlebot to index the index.htm page that is within any of my subdirectories. I have found that Google has indexed my /images/index.htm file.

When the MSNBot did this, it ended up trying to find ALL of my content within the /images/ folder - one can only guess as to why. I took a big nosedive from MSN due to this and had to exclude the /images folder from MSNBot within my robots.txt file.

So without excluding the /images folder from Google, how might I stop people from viewing the contents of the /images folder if I don't have the index.htm file there? I don't sell anything so I cannot separate images into folders as someone mentioned (that would be a great idea tho!) I don't want to take a bad hit by excluding the Googlebot from the /images folder either!

Gosh, I'm not sure that makes any more sense than what I originally wrote

Thanks for your patience and generosity in replying!!!
Maggie
Reply With Quote
  #10 (permalink)  
Old 02-04-2008, 05:38 PM
WebProWorld New Member
 

Join Date: Dec 2004
Posts: 13
Penman RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

memaggiem this should do what your asking and not cause any problems with any of the bots.

User-agent: *
Disallow:/subdirectory/index.htm

or

User-agent: *
Disallow:/index.htm

But that might stop your home page from being indexed depending on site design.

As to the bigger question of allowing images to be indexed, we do not allow any of our images to be indexed. We do our own photography of products and want to protect them as much as possible so we completely disallow our images folder in our robots.txt file. This does not prevent the images from showing in the cache on the search engines as they use links to our images on our site instead of caching the images them selves.
__________________
http://www.coloradopen.com
Reply With Quote
  #11 (permalink)  
Old 02-04-2008, 07:47 PM
WebProWorld Member
 

Join Date: Feb 2004
Location: Stupid question. At my PC.
Posts: 99
TechEvangelist RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

No spider should ever find the index.htm file in a subfolder unless there is a link going to that folder, perhaps a broken link like http://www.mydomain.com/images/

Although the robots.txt file is useful, it is not reliable. I've seen Google and Yahoo index pages that were excluded properly in the robots.txt.

The method that I use that thus far has been 100% reliable is to use the robots meta tag with a noindex attribute. Whenever I use an index.htm file to protect a directory, it includes the following:

<meta name="robots" content="noindex,follow">

I also include a link back to the home page.
__________________
Facts are meaningless. They can be used to prove anything. - Homer Simpson
MySQL Cheatsheet :: Arizona SEO training
Reply With Quote
  #12 (permalink)  
Old 02-04-2008, 09:14 PM
WebProWorld Member
 

Join Date: Apr 2004
Location: N.E.
Posts: 39
memaggiem RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

Quote:
Originally Posted by Penman View Post
we completely disallow our images folder in our robots.txt file. This does not prevent the images from showing in the cache on the search engines as they use links to our images on our site instead of caching the images them selves.
GREAT! This is exactly what I was hoping for!

and TechEvangelist - I will do the META tag within the index.htm in the /images/ directory just for good measure too!

Thanks all very very much!
Maggie


Now, if anyone wants to take a gander at why MSN kept looking for my content within the /images/ folder when it found the index.htm file - not that I really care; it's just something I find very odd!
Reply With Quote
  #13 (permalink)  
Old 02-05-2008, 01:32 AM
Dubbya's Avatar
WebProWorld 1,000+ Club
 

Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,194
Dubbya RepRank 3Dubbya RepRank 3
Default Re: Excluding the "good" BOTS from /images folder

I wouldn't even bother with excluding the index.htm files located in your subdirectories via the robots.txt file.

Just use the following Meta tags in each document and you'll be good to go.

Code:
<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<META NAME="ROBOTS" CONTENT="NONE">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
__________________
Printer ink, inkjet & toner cartridges in Canada
"Price-wise printing supplies"
inkjetOasis.ca
Reply With Quote
  #14 (permalink)  
Old 02-05-2008, 07:19 AM
DrTandem1's Avatar
WebProWorld 1,000+ Club
 

Join Date: Oct 2003
Location: Encinitas, CA
Posts: 1,908
DrTandem1 RepRank 2
Default Re: Excluding the "good" BOTS from /images folder

Why don't you simply change the permissions (chmod) of the image directory from whatever it is (probably 644 or 755) to 711 and remove the index file? That will prevent browsing the contents of a directory.
__________________
DrTandem's San Diego Web Page Design, drtandem.com
Reply With Quote
  #15 (permalink)  
Old 02-05-2008, 07:48 AM
NetProwler's Avatar
WebProWorld Member
 

Join Date: Jan 2007
Posts: 45
NetProwler RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

You can add the following in a .htaccess in the image directory:

RewriteEngine on

RewriteRule ^$ http://your-site/ [R,NC]

What the above directive does is it sends any visitors who lands in the image folder to your site in a redirect. But it does not affect viewing of other files located in this directory - if they call the file by name.

All the above presupposes that you are on a *nix /Apache server.

You don;t need to keep an index.htm in this case.
Reply With Quote
  #16 (permalink)  
Old 02-05-2008, 11:20 AM
Dubbya's Avatar
WebProWorld 1,000+ Club
 

Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,194
Dubbya RepRank 3Dubbya RepRank 3
Default Re: Excluding the "good" BOTS from /images folder

Quote:
Originally Posted by DrTandem1 View Post
Why don't you simply change the permissions (chmod) of the image directory from whatever it is (probably 644 or 755) to 711 and remove the index file? That will prevent browsing the contents of a directory.

Yeah, I thought of that too but it's often easier said than done for some folks.

Good suggestions though!
__________________
Printer ink, inkjet & toner cartridges in Canada
"Price-wise printing supplies"
inkjetOasis.ca
Reply With Quote
  #17 (permalink)  
Old 05-20-2008, 09:19 AM
WebProWorld New Member
 

Join Date: May 2008
Posts: 1
LilacStar RepRank 0
Default Re: Excluding the "good" BOTS from /images folder

Hi there,

I have been using robots.txt files for my website like GizGaz, it has worked fine for me.
__________________
| Label Printing | - LilacStar
Reply With Quote
Reply

  WebProWorld > Search Engines > Search Engine Optimization Forum


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Using the term "Official" makes Bots Skip? shamarkaleo Search Engine Optimization Forum 7 12-11-2007 05:43 PM
Local Meetups On "Good Morning America" philena30 Breakroom (General: Any Topic) 1 11-24-2006 09:33 AM
Invisible "D" links for images illegal? Webnauts Search Engine Optimization Forum 1 06-05-2006 07:22 PM
How often does Google "bots" crawl your site? Xplozive Google Discussion Forum 4 01-25-2005 04:12 AM
Why do search engine "bots" appear to read error l StripersOnline Search Engine Optimization Forum 10 05-13-2004 09:06 AM


Search Engine Optimization by vBSEO 3.2.0