Submit Your Article Forum Rules

Page 1 of 13 12311 ... LastLast
Results 1 to 10 of 128

Thread: The Power of the WebProWorld SEO Community

  1. #1
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    8,934

    Thumbs up The Power of the WebProWorld SEO Community

    You should already know that here at WPW there are high competent SEO professionals, a lot of beginneror intermediate level knowledge members.

    What makes WPW a powerfull SEO Community? If all members no matter of which knowledge level will unite, I can't believe that exists out there an SEO professional how can compete us.

    What I am after? Well ... optimize WPW? Just kidding. Well... if they need help, we sure can give a hand.

    Before I come to my point, first I would like to make something clear:

    I do not want any "thank you for great information", or any of those filthy and annoying posts of members who are trying to draw the attention of the thread subscribers to their signature, selling laptop batteries, or what ever else...

    If my condition will be violated, and the mods will not take action, I will leave the thread imediatelly.

    Now here I want to bring up a very interesting topic, that everyone will be thankful with the come out. So please don't go off-topic.
    Please take some minutes and watch the video below:

    YouTube - Matt Cutts Discusses Webmaster Tools

    Notice: I will add the link of this thread at Youtube as a comment inviting Matt to come over if he has time.

    So I have two points I would love to ask Matt.

    1. He said that one of the best solutions (with a smiley) is to protect areas via .htaccess password protected area.

    My concerns:

    The IBM patent explores dangling nodes in depth, and provides a list of pages that may be treated by search engines as dangling nodes. And one of them is:

    - If the page requires authentication.

    What happens when a without a nofollow attribute or masked client/server side link is pointing to the protected area directory?

    2. He mentioned the use of the noindex meta tag, but that is not the best solution. I am affraid that I have to disagree there. I will explain later on why.

    My concerns:

    a. Why didn't he mention the noindex robots.txt directive? Because it is unofficially supported?

    b. If I disallow through my robots.txt a protected area with a htpasswd via .htaccess, I have a dangling/node issue, or?

    Wouldn't noindex via robots.txt be better alternative? I am still thinking about it...

    What's next?

    Shoot your thoughts!
    John S. Britsios, Forensic SEO & Social Semantic Web Consultant | My personal blog Algohunters

  2. #2
    WebProWorld MVP inertia's Avatar
    Join Date
    Apr 2006
    Posts
    1,189

    Re: The Power of the WebProWorld SEO Community

    I guess what we really need to know is Matts/Google official stance on the nofollow robots.txt directive! If google could confirm how they handle this all this debate would be resolved!
    My LinkedIn Profile -- Lancaster Builder

    Twitter: @mattbennettseo

  3. #3
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    7,697

    Re: The Power of the WebProWorld SEO Community

    John, I have other concerns:

    What about webmasters using Microsoft's IIS web server? About 30 % market share april 2009

    Links:

    Easiest way to get .htaccess like access control with IIS 6? | Ask Metafilter

    HOW TO: Migrate .Htaccess Data in a UNIX-to-Windows Migration

    Personally I combine robots.txt (disallow file types and folders) and .htaccess (deny access to a folder and sub folders).

    This discussion started here (post #159)

    http://www.webproworld.com/search-engine-optimization-forum/78553-new-canonical-tag-big-3-a-4.html#post439869

  4. #4
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    2,981

    Re: The Power of the WebProWorld SEO Community

    Well, in principle I would agree that if you don't want spiders to access certain content, the best way to do so is to use authentication (to jump ahead to another point, I believe this is fairly easy to do in IIS also, it is just done with file-system permissions rather than through the .htaccess file itself). The noindex directive is unofficial, and support seems to vary. A page that is password protected is generally safe against all bots, even spambots that ignore robots.txt directives. This is particularly the case in some of the prime examples used to justify the creation of the robots.txt system - voting systems, and other pages where non-user access would cause problems.

    That being said, I am not sure that the IBM patent would have any effect. There is no evidence that this patent has been picked up by any of the major search engines. It seems to me that the patent is a proposed modification to Google's existing pagerank model, and I am not sure that Google would modify that model, which so far has served them quite well. But, if anything, the current known Google model for distributing pagerank would be more adversely affected. Right now, pagerank is distributed among all outgoing links, regardless of destination, unless the link is marked as nofollow. This means that even if the link points to somewhere that Google can't crawl/index, the link still gets pagerank, which means that a 403 destination could become a pagerank black hole.

    In IBM's patent, pagerank would be distributed only if the destination is crawlable, and meets certain other requirements. If you want to think of it in Google terms, the search engine could decide to apply a rel=nofollow attribute to a link on it's own, removing links from the pagerank calculations based on information about the crawlability, doc type, robots.txt status, etc. of the destination.

    As a result of this, it is under the current model that a plain link to the authenticated content would be a potential problem. And, as mentioned before, I am not convinced that the IBM model is actually in use anywhere right now.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  5. #5
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    8,934

    Re: The Power of the WebProWorld SEO Community

    Quote Originally Posted by inertia View Post
    I guess what we really need to know is Matts/Google official stance on the nofollow robots.txt directive! If google could confirm how they handle this all this debate would be resolved!
    I personally do not need to ask Matt/Google official stance on the noindex robots.txt directive. By the way nofollow robots.txt does not exist.

    I do not need to ask anybody since I already implemented on all my sites and on all sites of my customers and works as it should. I hope that is fair enough, or?
    John S. Britsios, Forensic SEO & Social Semantic Web Consultant | My personal blog Algohunters

  6. #6
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    8,934

    Re: The Power of the WebProWorld SEO Community

    Quote Originally Posted by kgun View Post

    Personally I combine robots.txt (disallow file types and folders) and .htaccess (deny access to a folder and sub folders).
    Using .htaccess to deny to a folder and to sub folder is a bad thing to do. I mean if you are returning a 403. That is because your are creating a dangling/node page, which are otherwise called dead end pages/folders.
    John S. Britsios, Forensic SEO & Social Semantic Web Consultant | My personal blog Algohunters

  7. #7
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    8,934

    Re: The Power of the WebProWorld SEO Community

    Quote Originally Posted by wige View Post
    Well, in principle I would agree that if you don't want spiders to access certain content, the best way to do so is to use authentication (to jump ahead to another point, I believe this is fairly easy to do in IIS also, it is just done with file-system permissions rather than through the .htaccess file itself). The noindex directive is unofficial, and support seems to vary. A page that is password protected is generally safe against all bots, even spambots that ignore robots.txt directives. This is particularly the case in some of the prime examples used to justify the creation of the robots.txt system - voting systems, and other pages where non-user access would cause problems.

    That being said, I am not sure that the IBM patent would have any effect. There is no evidence that this patent has been picked up by any of the major search engines. It seems to me that the patent is a proposed modification to Google's existing pagerank model, and I am not sure that Google would modify that model, which so far has served them quite well. But, if anything, the current known Google model for distributing pagerank would be more adversely affected. Right now, pagerank is distributed among all outgoing links, regardless of destination, unless the link is marked as nofollow. This means that even if the link points to somewhere that Google can't crawl/index, the link still gets pagerank, which means that a 403 destination could become a pagerank black hole.

    In IBM's patent, pagerank would be distributed only if the destination is crawlable, and meets certain other requirements. If you want to think of it in Google terms, the search engine could decide to apply a rel=nofollow attribute to a link on it's own, removing links from the pagerank calculations based on information about the crawlability, doc type, robots.txt status, etc. of the destination.

    As a result of this, it is under the current model that a plain link to the authenticated content would be a potential problem. And, as mentioned before, I am not convinced that the IBM model is actually in use anywhere right now.
    Excellent post! About the 403 destination, that is what I met in my response at the post of Kgun above.

    About how Google handles the noindex directive, I am not waiting for Google to tell me how they handle it, since as I said above I am using is on many sites and it is very effective. It really does the same job as the noindex meta robots tag on a server level.

    About IBMs patent if it is accurate or not, or if Google uses that or not, I do not really care, because I implement the necessary stuff for the case it does exist or will come exist in the future.
    John S. Britsios, Forensic SEO & Social Semantic Web Consultant | My personal blog Algohunters

  8. #8
    WebProWorld MVP inertia's Avatar
    Join Date
    Apr 2006
    Posts
    1,189

    Re: The Power of the WebProWorld SEO Community

    I personally do not need to ask Matt/Google official stance on the noindex robots.txt directive. By the way nofollow robots.txt does not exist.

    I do not need to ask anybody since I already implemented on all my sites and on all sites of my customers and works as it should. I hope that is fair enough, or?
    I meant noindex robots.txt directive. It just seems odd that Google or Matt C never mention it? Why would that be the case if it was such a useful tool that would benefit everyone including Google?

    It really does the same job as the noindex meta robots tag on a server level.
    Can i just clarify... If i block a page in robots.txt with the NOINDEX directive will that stop the page being crawled, indexed and also stop it building or leaking pagerank?
    My LinkedIn Profile -- Lancaster Builder

    Twitter: @mattbennettseo

  9. #9
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    8,934

    Re: The Power of the WebProWorld SEO Community

    Quote Originally Posted by inertia View Post
    I meant noindex robots.txt directive. It just seems odd that Google or Matt C never mention it? Why would that be the case if it was such a useful tool that would benefit everyone including Google?
    I guess that Matt does not praise it because it is still unofficially supported. If I do not recall, Adam Lasnik talked about it a while ago. If I find the link I will post it here.

    Quote Originally Posted by inertia View Post
    Can i just clarify... If i block a page in robots.txt with the NOINDEX directive will that stop the page being crawled, indexed and also stop it building or leaking pagerank?
    If you block a page in the robots.txt with the noindex directive it will not stop Googlebot crawling the pages, but Google will not index it.

    What is the difference with the disallow directive?

    In case someone is linking to a page you block with the disallow directive in the robots.txt, Googlebot will return the reference in their in their index but without a snippet. And there you have a PR leak, since PR will be assigned to that page too.

    If you block that page with the noindex directive, the reference will not show up at all. And PR will not be assigned to that page, but still it will pass to other pages you link from there. But that to happen, you must have on that page at least one outbound (internal or external) link without a nofollow or so ever, so the PR can move ahead.

    If not, then you will create dangling/nodes (dead end or hanging pages), in other words as Wige said above a PR black-hole.
    John S. Britsios, Forensic SEO & Social Semantic Web Consultant | My personal blog Algohunters

  10. #10
    WebProWorld MVP inertia's Avatar
    Join Date
    Apr 2006
    Posts
    1,189

    Re: The Power of the WebProWorld SEO Community

    If you block that page with the noindex directive, the reference will not show up at all. And PR will not be assigned to that page,
    OK. I understand all the theory but one thing i dont understand regards the internal flow of pagerank... Do pages blocked with the robots noindex directive still build PR from internal links? If they dont then i can stop using nofollow tags! But as you've just started using them i guess they dont?
    My LinkedIn Profile -- Lancaster Builder

    Twitter: @mattbennettseo

Similar Threads

  1. PHP 5 Power Programming.
    By kgun in forum Web Programming Discussion Forum
    Replies: 0
    Last Post: 04-02-2008, 02:30 PM
  2. Google: Brain power, energy power, hardware and software.
    By kgun in forum Google Discussion Forum
    Replies: 2
    Last Post: 09-04-2006, 07:19 AM
  3. Hello webproworld member community....
    By Roubina in forum Introductions
    Replies: 2
    Last Post: 09-21-2004, 04:37 PM
  4. Google Power
    By MrLeN in forum Google Discussion Forum
    Replies: 10
    Last Post: 04-12-2004, 04:42 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •