View Single Post
  #7 (permalink)  
Old 05-22-2009, 11:15 AM
Webnauts's Avatar
Webnauts Webnauts is offline
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,170
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: The Power of the WebProWorld SEO Community

Quote:
Originally Posted by wige View Post
Well, in principle I would agree that if you don't want spiders to access certain content, the best way to do so is to use authentication (to jump ahead to another point, I believe this is fairly easy to do in IIS also, it is just done with file-system permissions rather than through the .htaccess file itself). The noindex directive is unofficial, and support seems to vary. A page that is password protected is generally safe against all bots, even spambots that ignore robots.txt directives. This is particularly the case in some of the prime examples used to justify the creation of the robots.txt system - voting systems, and other pages where non-user access would cause problems.

That being said, I am not sure that the IBM patent would have any effect. There is no evidence that this patent has been picked up by any of the major search engines. It seems to me that the patent is a proposed modification to Google's existing pagerank model, and I am not sure that Google would modify that model, which so far has served them quite well. But, if anything, the current known Google model for distributing pagerank would be more adversely affected. Right now, pagerank is distributed among all outgoing links, regardless of destination, unless the link is marked as nofollow. This means that even if the link points to somewhere that Google can't crawl/index, the link still gets pagerank, which means that a 403 destination could become a pagerank black hole.

In IBM's patent, pagerank would be distributed only if the destination is crawlable, and meets certain other requirements. If you want to think of it in Google terms, the search engine could decide to apply a rel=nofollow attribute to a link on it's own, removing links from the pagerank calculations based on information about the crawlability, doc type, robots.txt status, etc. of the destination.

As a result of this, it is under the current model that a plain link to the authenticated content would be a potential problem. And, as mentioned before, I am not convinced that the IBM model is actually in use anywhere right now.
Excellent post! About the 403 destination, that is what I met in my response at the post of Kgun above.

About how Google handles the noindex directive, I am not waiting for Google to tell me how they handle it, since as I said above I am using is on many sites and it is very effective. It really does the same job as the noindex meta robots tag on a server level.

About IBMs patent if it is accurate or not, or if Google uses that or not, I do not really care, because I implement the necessary stuff for the case it does exist or will come exist in the future.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote