To kind of restate what John has said above, and to make sure that I am properly understanding the way that pagerank works with current technologies, lets imagine two pages. One page has a link on it (we will call this the source page) and the other page is the page that the link points to (destination page). The flow of pagerank is calculated when the source page is crawled, and is not affected by any factors other than those on the source page; ie, the flow of pagerank can be affected by a nofollow attribute on the link itself, and the flow of pagerank can be affected if the source page is blocked by a disallow directive because Google can't see the links. Google does not consider anything about the destination page when calculating the pagerank a destination page receives.
EDIT to add: A page that has a link always recieves pagerank. A link that is marked as nofollow (either with a meta tag or a rel attribute) is the only exception. A page that is marked as noindex will still recieve pagerank, that pagerank simply can't be seen (since the page doesn't get put into the index, you have no way of asking Google what the pagerank of the page is - Google still knows that the url was linked to, and pagerank is still allocated for that url because the concept that the page is set to noindex is unknown to Google when the pagerank from the source page is being divided amonst the outgoing links.)
There are a few different cases that you can encounter a "dead end". The original "dead ends" were pages that had incoming links, but no outgoing links. Pages that have all of their outgoing links marked with nofollow would fall into the same category (nofollow meta tag or all links with a nofollow attribute), as would pages that can't be crawled (since Google can't see the links on the page) due to a disallow robots.txt directive, or pages that go through a 302 redirect (Google does this intentionally, so that 302 redirects can't pass pagerank).
What Google does with a dead end is they take all the pagerank from all the known dead ends in the index, and then distribute that pagerank amongst all the other pages in the entire index. As a result, a dead end provides little to no actual benefit to the website itself. In fact, it can harm the web site by draining some of the site's pagerank which could be better spent pointing to more important pages. As a result, you would want to keep the pagerank flowing anywhere but to a dead end.
However, there is no way to automatically stop pagerank from going to deadends - you can only change the flow of pagerank on the source page, never the destination page. The only option is to manually add a nofollow attribute to every link that goes to a dead end - not an ideal option.
So, what I figured out while typing this overly long summary, is that 403 pages provide a way to recover pagerank that would otherwise go to a dead end. Basically, the trick is in the error message. What you are doing is giving users a way (through authentication) to see the content that you don't want the spiders to crawl, while giving the spiders an error page that has links that recovers the pagerank and sends it to the important areas of your site.
Page A has PR 4. On page A there is a link to page B without being attributed by the nofollow attribute or so ever. Right?
Now, on page B there is a link to page C. OK?
What happens? Googlebot will follow the link on page A to page B but it will see that the page should not be taken into account, but still it will crawl the page looking for a link to pass the PR to. If there is no link or if the link(s) are attributed by a nofollow attribute or otherwise blocked, you will have a dead end (dangling/nodes).
deny from all
except white listed IP's in the /Books/ folder ?
- Will the white list create problems?
- More precisely: Can Bot's get an URI reference or a dangling node (link) via the white list?
Actually, should be rather easy to test.
Build and orphan a few pages. Four should be plenty... A,B,C,D
Link A to B... B to C... C to D
Noindex A, B, and C
Throw a single link from a page with a high toolbar PR at A. If the PR "jumps" all the way to D then it will display a high toolbar PR as well. If it passes through all the pages then it won't. Also, remove the noindex and see if there is any change.
Whether or not a page is noindex really doesn't matter. If there is an external link pointing to it then there is a "probability" that it can be reached by a random surfer. Ergo, it "has" PR once Google finds the link.
The more links a random surfer has to follow to ultimately get to the "destination" page, the less probability it will be reached... less PR.
What were discussed in these two threads: