View Full Version : LIVE bot looking for my pages in the images folder(?)
memaggiem
12-06-2007, 09:04 PM
Just noticed that an IP that resolves to a MSN Live bot is going through my site and looking for pages in the /images/ folder! Anyone know the whats or whys of this? Looking through the day's log, I see this has been going on all day!
Thanks!
incrediblehelp
12-07-2007, 01:50 PM
Why wouldnt MSN look for images?
memaggiem
12-07-2007, 01:56 PM
Looking for images within the images folder is the norm, yes I am aware of that :-)
However, the logs showed the bot looking for ALL my html pages within the /images/ directory and NOT with the root directory.
Therefore, my logs were filled with "file not found" images/images/index.htm etc and so forth.
The access logs did not show the bot accessing or indexing the pages at all, only the /images/gif or images/jpg files.........
Odd?
SemAdvance
12-11-2007, 04:31 PM
Looking for images within the images folder is the norm, yes I am aware of that :-)
However, the logs showed the bot looking for ALL my html pages within the /images/ directory and NOT with the root directory.
Therefore, my logs were filled with "file not found" images/images/index.htm etc and so forth.
The access logs did not show the bot accessing or indexing the pages at all, only the /images/gif or images/jpg files.........
Odd?
Without knowing the site involved at best you will receive speculation.
Your robots.txt could be causing errors, your websites pages i.e. html coding may be causing errors hard to say or as is often the case MSN is still full of bugs as are most Microsoft products...
memaggiem
12-17-2007, 09:30 PM
Thanks for your reply! I didn't know you had replied!
My site has just vanished from MSN but I was able to find it with a search term. The result is this:
MAKE HOMEMADE BABY FOOD RECIPES, HOMEMADE BABY FOOD RECIPES, EASY ...
Baby Food Recipes - Make Homemade Baby Food - Recipes for Healthy Homemade Baby Food with Tips for Making and Feeding Baby Homemade Baby Food. Step-By-Step instructions ... ...
MAKE HOMEMADE BABY FOOD RECIPES, HOMEMADE BABY FOOD RECIPES, EASY SOLID BABY FOOD TIPS, BABY NUTRITION and MORE at wholesomebabyfood.com | Making Baby Food with Wholesome Goodness & Love (http://www.wholesomebabyfood.com/images) · 12/15/2007 · Cached page
**Notice that it has my images folder in the result!** WTH is going on with that?
I have not change the robots.txt in over 1 year nor have I changed any meta-tag info
thanks !!!!
Jean-Luc
12-18-2007, 12:24 AM
The problem is that your site does not process non-existing pages correctly.
http://wholesomebabyfood.com/images should return an HTTP error code. It presently returns a "200 OK" (after redirecting to http://wholesomebabyfood.com/images/.
Another invalid situation appears when I try to go to http://wholesomebabyfood.com/does-not-exist. It redirects to http://www.wholesomebabyfood.com/error.htm that returns a "200 OK" code.
Search engines get confused because your server answers that invalid URL's are "200 OK".
This second address should return a "404 File not found". The first address should return a "404 File not found" or a "403 Forbidden" code.
Jean-Luc
memaggiem
12-18-2007, 08:21 AM
Hmmm.............so why, after 4 years on the web, would this all of a sudden be an issue? Maybe the MSNBOT changed it's crawiling practices.....
Also, why did it NOT find the pages where they actually are....within the main directory and not a sub such as images? Very odd! Some of the pages are actually indexed correctly...
I have the index.htm page in the images dir. to stop the files from being displayed. Should I remove the error.htm page (which is used to display all the links and content in my site so people might stay and find what they were looking for)? I'm not sure I want to do that! I'll have a look-see at a solution.
ETA - I just did a disallow MSNBot /images/
Maybe this will fix it!
Thanks for your reply! This is really driving me nuts!!!!
Jean-Luc
12-18-2007, 09:19 AM
Add this in your .htaccess file:
Options +FollowSymlinks
RewriteEngine on
RewriteRule ^images\/?$ http://wholesomebabyfood.com/ [L,R=301]
ErrorDocument 404 /error.htm
It will correctly redirect visitors of /images to the home page and it will send a valid error code for non-existing pages.
Jean-Luc