|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
||||
|
Another thread (MSN Live Indexes Google Ads) is discussing the appearance of Google Adsense ads in the MSN/Live index. In theory, this should not be possible, as the URLs in question have been blocked through the application of robots.txt. (Line 15 of http://www.google.com/robots.txt specifically disallows the ads.) I can think of a few possible reasons why these pages would be indexed legitimately:
Has anyone encountered this issue, where you have blocked content through robots.txt or a robots meta tag and had it indexed? Do you know of any reason why properly blocked content would still be indexed?
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Microsoft Live Search Fixes Problem with Google AdWords Ads
Quote:
|
|
|||
|
I just had a thought though...if the page is linked to from another site what happens? does google get to that page before it read robots.txt ?
__________________
Post as-it-happens crime stories of criminal behaviour at crimedigg.com |
|
||||
|
If they find enough links they will index the link,.. but not the content. Generally they use the dmoz title and description in these cases. (if available of course)
__________________
FREE SEO ! Really? YES! All you have to do is implement it! Follow me on Twitter PeterIMC Last edited by Peter (IMC); 12-20-2007 at 04:58 PM. |
|
||||
|
Quote:
The discussion is interesting though. Should they index the url or not? It's like being a celebrity. You want your privacy and don't let anybody into your house, but does that forbid magazines to write your name in their articles?
__________________
FREE SEO ! Really? YES! All you have to do is implement it! Follow me on Twitter PeterIMC |
|
||||
|
The perfect crawler should act as a human internet surfer.
It's the same as specifying keywords for the pages. Most engines will index what they consider popular and searcheable information and they will show the results for keywords based on their algorithms and not on website owner's instructions. |
|
||||
|
Just an FYI, using a robots.txt file(s) does not guarantee either indexing or not indexing. It is simply the designer's preference/request. It may be ignored by certain robots.
__________________
DrTandem's San Diego Web Page Design, drtandem.com |
|
||||
|
Quote:
Quote:
Quote:
In the same way, you do not want that URL to be indexed, because the clicks are coming from users who have no idea what they are clicking on, and could taint the results of a survey - or analytics or campaign tracking, etc. In addition, robots.txt is used to keep spiders from indexing login pages for CMS systems and other critical apps to prevent Googlehacking. If I know a certain CMS is vulnerable to attack, and Google indexes a login page for that CMS, I could potentially search for all the sites that use that CMS and attack them.
__________________
The best way to learn anything, is to question everything. |
|
|||
|
I have seen files that are blocked in the robots.txt file AFTER they are already indexed show up in Google's index for almost a year. The issue may be a matter of whether they are indexed before or after they are blocked. I have always found the robots meta tag to be more effective at eliminating URLs from search engine indexes.
MSN seems to be much better at using the robots.txt file to eliminate URLs. Yahoo appears to ignore it most of the time. I've seen blocked URLs show up in Yahoo for years after they were blocked.
__________________
Facts are meaningless. They can be used to prove anything. - Homer Simpson MySQL Cheatsheet :: Arizona SEO training :: Phoenix Managed Services Last edited by TechEvangelist; 01-01-2008 at 05:26 PM. |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Forbidden | kgun | Internet Security Discussion Forum | 0 | 08-15-2007 09:14 AM |
| 403- forbidden Error | trancehead | Search Engine Optimization Forum | 10 | 05-31-2006 06:17 AM |
| Yahoo + Open Content Alliance = Smooth Text Indexing Move | WPW_Feedbot | Search Engine Optimization Forum | 0 | 10-04-2005 12:00 PM |
| Forbidden Request | webhost1 | Web Programming Discussion Forum | 1 | 07-11-2005 11:59 AM |
| Error 403 forbidden using Checklink | martinacastro | Search Engine Optimization Forum | 0 | 03-26-2004 10:38 AM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2010 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |