View Single Post
  #4 (permalink)  
Old 05-22-2009, 10:44 AM
wige's Avatar
wige wige is offline
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,657
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: The Power of the WebProWorld SEO Community

Well, in principle I would agree that if you don't want spiders to access certain content, the best way to do so is to use authentication (to jump ahead to another point, I believe this is fairly easy to do in IIS also, it is just done with file-system permissions rather than through the .htaccess file itself). The noindex directive is unofficial, and support seems to vary. A page that is password protected is generally safe against all bots, even spambots that ignore robots.txt directives. This is particularly the case in some of the prime examples used to justify the creation of the robots.txt system - voting systems, and other pages where non-user access would cause problems.

That being said, I am not sure that the IBM patent would have any effect. There is no evidence that this patent has been picked up by any of the major search engines. It seems to me that the patent is a proposed modification to Google's existing pagerank model, and I am not sure that Google would modify that model, which so far has served them quite well. But, if anything, the current known Google model for distributing pagerank would be more adversely affected. Right now, pagerank is distributed among all outgoing links, regardless of destination, unless the link is marked as nofollow. This means that even if the link points to somewhere that Google can't crawl/index, the link still gets pagerank, which means that a 403 destination could become a pagerank black hole.

In IBM's patent, pagerank would be distributed only if the destination is crawlable, and meets certain other requirements. If you want to think of it in Google terms, the search engine could decide to apply a rel=nofollow attribute to a link on it's own, removing links from the pagerank calculations based on information about the crawlability, doc type, robots.txt status, etc. of the destination.

As a result of this, it is under the current model that a plain link to the authenticated content would be a potential problem. And, as mentioned before, I am not convinced that the IBM model is actually in use anywhere right now.
__________________
The best way to learn anything, is to question everything.

Last edited by wige; 05-22-2009 at 10:51 AM.
Reply With Quote