We resolve this by using a robots.txt in our main directory. It tells the bots not to search certain folders such as cgi and image.
Because we are a large tourism site, we do not really want our images listed with Google & Gang so our bandwidth is not impacted upon.
Here's an example of the robots.txt file:
User-agent: *
Sitemap:
http://www.xxx.com/ror.xml
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
Disallow: /images/
Disallow: /css/
Disallow: /eurohotels/
Disallow: /hotels/
Disallow: /photos/
Disallow: /discounts/
Disallow: /js/
Disallow: /rsscb/
Disallow: /logos/
Disallow: /eco/
Disallow: /calhotels/
Disallow: /tours/
Disallow: /sandiego/images
Disallow: /sandiego/hotels
Disallow: /sandiego/profiles
Disallow: /sandiego/css
Disallow: /sandiego/newsphotos
Disallow: /sandiego/logos
Disallow: /sandiego/js
I'm told the spiders do not wish to waste their time crawling around inside these folders and will like you better.
It's a win/win in my opinion.
Hope this helps.