iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #51 (permalink)  
Old 08-31-2009, 12:41 PM
TheWebDoctor(tm)'s Avatar
WebProWorld Pro
 
Join Date: Jun 2003
Location: USA
Posts: 205
TheWebDoctor(tm) RepRank 1
Default Re: Robots.txt can pass PageRank?

Both patents include "...the anchor text of backlinks to the document..." Obviously Google thinks that a backlink must have anchor text. Anchor text indicates hypertext documents.

Although patents are supposed to be written for a normal practitioner to understand, they can be confusing. Over the past two years I've had the opportunity to file for three patents. I had to ask my patent attorney to explain why he used certain phrases and words because of their broadness. Patents need to be broad enough to be enforceable but specific enough to cover the concept. And we wonder why we have attorneys.

"Because citations, or links, are ways of directing attention, the important documents correspond to those documents to which the most attention is directed. Thus, a high rank indicates that a document is considered valuable by many people or by important people. Most likely, these are the pages to which someone performing a search would like to direct his or her attention. Looked at another way, the importance of a page is directly related to the steady-state probability that a random web surfer ends up at the page after following a large number of links. Because there is a larger probability that a surfer will end up at an important page than at an unimportant page, this method of ranking pages assigns higher ranks to the more important pages."

Additionally, one can read the last paragraph of the patent. "The search engine is used to locate documents that match the specified search criteria, either by searching full text, or by searching titles only. In addition, the search can include the anchor text associated with backlinks to the page. This approach has several advantages in this context. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for images, programs, and other objects that cannot be indexed by a text-based search engine. This also makes it possible to return web pages which have not actually been crawled. In addition, the engine can compare the search terms with a list of its backlink document titles. Thus, even though the text of the document itself may not match the search terms, if the document is cited by documents whose titles or backlink anchor text match the search terms, the document will be considered a match. In addition to or instead of the anchor text, the text in the immediate vicinity of the backlink anchor text can also be compared to the search terms in order to improve the search."

What part of "or links" or "backlink anchor text" led you to the conclusion that a textual or non-hypertext reference is a citation and used to determine PR? The concept of "clicking around" by a random web surfer also indicates that a hypertext link must exist. One cannot "click" a non-hypertext word or phrase to get to another page.

I never said a robots.txt file was a protocol. I said the robots.txt file is a file that contains computer instructions for the crawlers. What the search engines determine they are going to do with the instructions is clearly up to them.

Standford and Google have two patents covering PageRank. The last was approved in 2006. So, Terry, you're right. Google did improve their opinion on how they determine PR for a page. I'm sure they are fine tuning it again and will have another patent application filed soon if not already.

Links in Gmail:
“Someone recently suggested that a link sent to a Gmail account equals one link on one page. Also not true in any way.” — Matt Cutts, November 14, 2006.
Myth busting: Links in Gmail

"...to the best of my knowledge we don’t use any urls from Gmail either." -- Matt Cutts, July 19, 2008
Generic Toolbar Indexing Debunk Post

Apparently, Google doesn't use Gmail to follow links.
Reply With Quote
  #52 (permalink)  
Old 08-31-2009, 02:02 PM
crankydave's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Aug 2004
Location: Playing with fire!
Posts: 4,243
crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9
Default Re: Robots.txt can pass PageRank?

Both patents use exactly the same reference. Precisely what is said is this...

Quote:
One aspect of the present invention is directed to taking advantage of the linked structure of a database to assign a rank to each document in the database, where the document rank is a measure of the importance of a document. Rather than determining relevance only from the intrinsic content of a document, or from the anchor text of backlinks to the document, a method consistent with the invention determines importance from the extrinsic relationships between documents. Intuitively, a document should be important (regardless of its content) if it is highly cited by other documents. Not all citations, however, are necessarily of equal significance. A citation from an important document is more important than a citation from a relatively unimportant document. Thus, the importance of a page, and hence the rank assigned to it, should depend not just on the number of citations it has, but on the importance of the citing documents as well. This implies a recursive definition of
rank: the rank of a document is a function of the ranks of the documents which cite it. The ranks of documents may be calculated by an iterative procedure on a linked database.
This does not say nor suggest (IMO) that a "link" must have anchor text. We already know that hyperlinked URL's without anchor text are considered and given value. Nor does it say or suggest that a citation must be in the form of a hyperlink for it to considered a citation. I see no reason to discount out of hand (based upon the patents) that one document citing another in a text only fashion is ignored as a citation.

Granted, the random surfer model does refer to "clicking" and "links" but that is the only reference I can recall seeing that suggests that. I can find far, far, more that references that refer to such things as "follow", "visit", "request", etc. that can easily be accomplished with out a "click". Again, I see no reason to discount the possibility out of hand.

Just to be clear too, I've not said that textual or non-hyperlinked text "is" used to determined PR. There's just no reason that I consider definitive to conclude definitively that it's not nor cannot be. Again, I've not tested it. I have tested whether or not textual or non-hyperlinked URL's are used for discovery and from those I suggest that yes they can be.

Dave
Reply With Quote
  #53 (permalink)  
Old 08-31-2009, 02:34 PM
kgun's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,684
kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9
Default Re: Robots.txt can pass PageRank?

As some of you may know, I collect links. Personally, I can find links from a lot of different sources:
  1. In traditional books. I find many good links / pages / sites in books. They are authoritative.
  2. In PDF documents.
  3. On videos.
  4. In any text document, program etc.
"Disclaimer: Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption. Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years. I’ll do the rest of my blog post in the framework of “classic PageRank” but bear in mind that it’s not a perfect analogy".
Source: PageRank sculpting

Related WPW link: Matt Cutts June 15, 2009 about PageRank sculpting

That does not imply that I say that other citations, aside from hyperlinks, can contribute to the (augusts 2009's) pagerank formulae.

Last edited by kgun; 08-31-2009 at 02:39 PM.
Reply With Quote
  #54 (permalink)  
Old 08-31-2009, 06:18 PM
Terry Van Horne's Avatar
WebProWorld Veteran
 
Join Date: Apr 2008
Location: Toronto On., Ca.
Posts: 471
Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by TheWebDoctor(tm) View Post
Both patents include "...the anchor text of backlinks to the document..." Obviously Google thinks that a backlink must have anchor text. Anchor text indicates hypertext documents.

Although patents are supposed to be written for a normal practitioner to understand, they can be confusing. Over the past two years I've had the opportunity to file for three patents. I had to ask my patent attorney to explain why he used certain phrases and words because of their broadness. Patents need to be broad enough to be enforceable but specific enough to cover the concept. And we wonder why we have attorneys.

"Because citations, or links, are ways of directing attention, the important documents correspond to those documents to which the most attention is directed. Thus, a high rank indicates that a document is considered valuable by many people or by important people. Most likely, these are the pages to which someone performing a search would like to direct his or her attention. Looked at another way, the importance of a page is directly related to the steady-state probability that a random web surfer ends up at the page after following a large number of links. Because there is a larger probability that a surfer will end up at an important page than at an unimportant page, this method of ranking pages assigns higher ranks to the more important pages."

Added
Webdoctor, understand that I'm not disagreeing entirely and neither crankydave nor I have made the claim these are anything more than ways links are discovered... at nio time has anyone indicated a link in an email counted for anything, however, your responses continually infer we are. No one is disagreeing that text citations pass anything but raise visibility and quite possibly help a page be discovered.

Additionally, one can read the last paragraph of the patent. "The search engine is used to locate documents that match the specified search criteria, either by searching full text, or by searching titles only. In addition, the search can include the anchor text associated with backlinks to the page. This approach has several advantages in this context. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for images, programs, and other objects that cannot be indexed by a text-based search engine. This also makes it possible to return web pages which have not actually been crawled. In addition, the engine can compare the search terms with a list of its backlink document titles. Thus, even though the text of the document itself may not match the search terms, if the document is cited by documents whose titles or backlink anchor text match the search terms, the document will be considered a match. In addition to or instead of the anchor text, the text in the immediate vicinity of the backlink anchor text can also be compared to the search terms in order to improve the search."

What part of "or links" or "backlink anchor text" led you to the conclusion that a textual or non-hypertext reference is a citation and used to determine PR? The concept of "clicking around" by a random web surfer also indicates that a hypertext link must exist. One cannot "click" a non-hypertext word or phrase to get to another page.

I never said a robots.txt file was a protocol. I said the robots.txt file is a file that contains computer instructions for the crawlers. What the search engines determine they are going to do with the instructions is clearly up to them.

Standford and Google have two patents covering PageRank. The last was approved in 2006. So, Terry, you're right. Google did improve their opinion on how they determine PR for a page. I'm sure they are fine tuning it again and will have another patent application filed soon if not already.

Links in Gmail:
“Someone recently suggested that a link sent to a Gmail account equals one link on one page. Also not true in any way.” — Matt Cutts, November 14, 2006.
Myth busting: Links in Gmail

"...to the best of my knowledge we don’t use any urls from Gmail either." -- Matt Cutts, July 19, 2008
Generic Toolbar Indexing Debunk Post

Apparently, Google doesn't use Gmail to follow links.
Sir, english as a second language comes to mind.

There are also google patents that discuss using apps like gmail, so... I choose to make my own decisions cite Matt Cutt's all you want that is how I choose to do things. Matt Cutt's works for Google they may have reasons for taking that stance perhaps because privacy is a touchy issue, especially email so... for someone who isn't a Google... I've seen plenty to indicate that may not be the case. I'm not disagreeing you may very well be correct but I loose nothing by keeping an open mind and discovering these things on my own. Have you looked at the links in my sig... I think I understand patents and the technology. You're preaching to the choir here only they aint buying the religion just because the preacher says it's so.
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas.

Last edited by Terry Van Horne; 08-31-2009 at 06:26 PM.
Reply With Quote
  #55 (permalink)  
Old 08-31-2009, 11:07 PM
WebProWorld New Member
 
Join Date: Jul 2009
Location: Cleveland, Ohio
Posts: 13
BoBoMisiu RepRank 2
Default Re: Robots.txt can pass PageRank?

robots.txt is a defacto protocol. There is no RFC for it, but...

A quote from google:
"You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one)."

/support/webmasters/bin/answer.py?hl=en&answer=156449
(I have less than ten posts)

Last edited by BoBoMisiu; 08-31-2009 at 11:12 PM. Reason: typo
Reply With Quote
  #56 (permalink)  
Old 08-31-2009, 11:53 PM
deepsand's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2004
Location: Philadelphia, PA
Posts: 3,226
deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by inertia View Post
What I'm still trying to get my head around is the "dangling nodes" issue as John says. I'm also now thinking of the ranking issues that this throws up...
Those "dangling nodes" are simply resources which could pass PR to others, but do not.

In the end, they're of no consequence, as the final step is to normalize PR, so that the sum of the individual PRs of all indexed resources equals "1." Thus, any "lost" PR is redistibuted among all of the indexed resources, with the effect being of a trivial value.
Reply With Quote
  #57 (permalink)  
Old 09-01-2009, 12:02 AM
Terry Van Horne's Avatar
WebProWorld Veteran
 
Join Date: Apr 2008
Location: Toronto On., Ca.
Posts: 471
Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by BoBoMisiu View Post
robots.txt is a defacto protocol. There is no RFC for it, but...
I'm defacto King of Canada got a crown and the whole deal! It's no more a proptocal then I'm.... An RFC and governing body are what make it a protocal otherwise anything can be called a protocal. Kinda what protocals are all about otherwise you get people making up the "protocal" as they go which is why Robots,txt is a BS protocal and google can decide how you use it which is kinda like the way M$ interprets the http protocal in IIS 5 which is why we get people coming in here wondering why their 301 isn't working correctly... so take defacto protocals and put them in the defacto rubbish bin... protocals are protocals there is no such thing as defacto protocal it either is or isn't... there is no room in between unless you just follow blindly. I tend to like to keep my eyes open and question the agenda of those doing the leading. Just the way I am... kinda funny like that.
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas.
Reply With Quote
  #58 (permalink)  
Old 09-01-2009, 02:48 AM
Clint1's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jun 2005
Location: Louisiana, USA
Posts: 1,306
Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by TheWebDoctor(tm) View Post
Google has never brainwashed me and never will.
But if you think, and you did say: "PageRank is a determination of link popularity", that is a misconception as I pointed out. G has brainwashed people into believing that, which slanders and ruins useful, legit, toiled-over websites with no or low PR simply because they are in a rarer or less popular niche.


Quote:
Whether domain.com, example.com or any other website, in one's opinion, has an over inflated PR value is not the question here. Domain.com is a domain registry website with only a PR 7. MyDomain.com is another domain registry site with only a PR 6.
"In one's opinion"??? So you think those inflated PR's are deserved??? C'mon. Don't pretend you don't know why they are that high Lee, the reasons are as I gave. It kind of was the question since you brought up "PageRank is a determination of link popularity", and I was just pointing out that's not the case. Example.com doesn't even exist as a legit website! The point I'm trying to makes is those sites have high PR not because they are "popular", "useful", "helpful", etc., but simply because they reap the G benefits of particular domain name names that countless millions of people link to them in emails sent all over the world, forums, blogs, message boards, forums, even from website-related websites only as examples of a URL. You can look at any website related to HTML code, htaccess file help, etc., and find OneOfThoseExampleDomainsHere.com all over their webpages, (inadvertently?) inflating the target domains' PR for the wrong reasons. You can look all over this forum for only one of many examples of that, especially in www vs non-www and 301 posts. In recent posts of mine related to lines in the htaccess file, I used "MyDomain.com" to show examples of some redirects for a member to use in their htaccess file. Instead of "mydomain.com", others, and myself sometimes use the other examples I gave: domain.com, yourdomain.com, example.com, and on and on. That is the reason why the domain examples I gave have such high PR. If it were not for these aforementioned events, their PR would be much lower. This is one of the reasons why PR is seriously flawed and erroneous.

Quote:
Doing a search for domain.com on Google returns 86+ million results. CNet with only 27 million references has a PR 9.
You can't really go by that because a search on G for domain.com doesn't return results for "domain.com" but for "domain" and some for "com". You have to go to Y and do an IBL check there. And as you can see from just the first 10 results alone, my point. You see all those websites in essentially linking to "domain.com" just by using it as a generic example on their webpages. (Matt Cutts, WordPress.org, SEO sites, SE info sites, etc.). The only page legitimately linking to domain.com with intent is Webopedia.

And as you pointed out with the White House's robots.txt file, it's not how many IBL's but the PR of those IBL's. (That's why I commented "with a PR that high sites like example.com must be linking to it" ).
__________________
God Bless,
-Clint
(Join Date: 2003)

Last edited by Clint1; 09-01-2009 at 02:52 AM.
Reply With Quote
  #59 (permalink)  
Old 09-01-2009, 05:23 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by BoBoMisiu View Post
robots.txt is a defacto protocol. There is no RFC for it, but...

A quote from google:
"You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one)."

/support/webmasters/bin/answer.py?hl=en&answer=156449
(I have less than ten posts)
My apologies for my ignorance, but have you realized that this is an advanced SEO discussion? And you are posting here tips for beginners? So please....
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #60 (permalink)  
Old 09-01-2009, 10:40 AM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Robots.txt can pass PageRank?

BoBoMisiu, a few comments in your post are I believe inaccurate...
Quote:
Originally Posted by BoBoMisiu View Post
robots.txt is a defacto protocol.
robots.txt is a standard, not a protocol. A protocol establishes a methodology for the transfer of data between systems - HTTP and HTTPS are protocols. robots.txt and sitemaps would be standards.

Quote:
Originally Posted by BoBoMisiu View Post
There is no RFC for it, but...
There really doesn't need to be an RFC for something to become a standard. What matters is, is the technology accepted and implemented by the target users. In this case, the search engines have adopted it, thus it is an acceptable standard.

And for the record, there was an RFC for it: http://www.robotstxt.org/norobots-rfc.txt

Quote:
Originally Posted by BoBoMisiu View Post
A quote from google:
"You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one)."

/support/webmasters/bin/answer.py?hl=en&answer=156449
(I have less than ten posts)
Although it is not required, a blank robots.txt is still typically recommended, even if you have nothing to block, to avoid extraneous entries in your error logs.
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #61 (permalink)  
Old 09-01-2009, 11:03 AM
Clint1's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jun 2005
Location: Louisiana, USA
Posts: 1,306
Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by wige View Post
Although it is not required, a blank robots.txt is still typically recommended, even if you have nothing to block, to avoid extraneous entries in your error logs.
Yeah, I created a couple of new sites several days ago, I had forgotten to add robots.txt files to them, and within a couple of days MSN, Slurp and some others were returning 404's hitting on the non-existing file. I added the file not so much that I needed it but because I got tired of seeing the 404's. Over time they can really populate a lot of lines in the logs.
__________________
God Bless,
-Clint
(Join Date: 2003)
Reply With Quote
  #62 (permalink)  
Old 09-01-2009, 11:49 AM
WebProWorld Member
 
Join Date: Aug 2009
Posts: 28
BoothWizard RepRank 2
Default Re: Robots.txt can pass PageRank?

wow... what a thread!
In my humble opinion I would add this.
1. Probably not. google would probably only want to pass pr on links that are visible to humans.
2. Most people think that pr doesn't matter that much anyway, and in this case, there would be no anchor text passing with it.
3. It seems like it would be a very simple thing to test, though in the end, I think this is an interesting but academic question.
Reply With Quote
  #63 (permalink)  
Old 09-01-2009, 01:00 PM
Terry Van Horne's Avatar
WebProWorld Veteran
 
Join Date: Apr 2008
Location: Toronto On., Ca.
Posts: 471
Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4
Default Re: Robots.txt can pass PageRank?

Wige, not to be argumentative but even as a standard it's not working as expected every SE has a different "take" on what should be in it and how that is used what you really have is a file that is ignored by more bots than follow it. like I said even standards to be "useful" have to be uniform otherwise you end up with what we had in the late 90's when browsers inrterpreted CSS and HTML so let's not pretend this decree from above (SE's) is anything but a guide that may or may not work as expected. That the directives may or may not be followed at the will of the crawler/bot using your bandwidth.... shall I go on about how calling this anything but a suggestion is giving it more value then it deserves as long as we "settle"... that's as good as it gets? Something this "supposedly" important is left to hit and miss and finger crossing?

By the way I knew about that site and the RFC but wouldn't tell anyone because it's BS. I've also read Alan Perkins articles (actually hold SE patents) on the subject of SE's interpretting the "standard" inappropriately, AFAIK now unavailable) and that is why I take the stand I have. As to the one 404 error if you're worried about that then.... you got much bigger issues than a 404 on a file that is not necessary for the routine functioning of the server.
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas.
Reply With Quote
  #64 (permalink)  
Old 09-01-2009, 01:52 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Robots.txt can pass PageRank?

I do agree that the implementation of robots.txt by the search engines varies widely, and many spiders do not obey the standard. However, that does not make it any less of a standard. Is the speed limit any less of a rule because only a few drivers actually obey it? Or, as a perhaps better example, is HTML 4.01 any less of a standard because there is currently no browser that fully adheres to it? Compliance with a standard is, by definition, voluntary. Bots don't have to obey, and webmasters don't have to create one.

The differences in implementation of robots.txt are also additive. All major search engines follow the basic rules of Disallow /whatever. Most add support for the elements covered in the RFC. And then finally, many add support for a few other things, such as sitemaps. Again, this doesn't mean that the standard is no good - just because many browsers implement <canvas>, HTML 4.01 isn't invalidated.

As for the RFC being useless, I would disagree. Google, Bing and Yahoo have all acknowledged that they follow the rules defined in that RFC. If you are using robots.txt for the purposes of controlling the spiders from these engines, don't you want to read the instruction manual first?
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #65 (permalink)  
Old 09-01-2009, 02:07 PM
pervezalam_mzn's Avatar
WebProWorld Member
 
Join Date: Mar 2008
Location: Delhi, India
Posts: 33
pervezalam_mzn RepRank 1
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by TheWebDoctor(tm) View Post
To answer the initial question: Can Robots.txt files pass PageRank. The answer is unfortunately, NO! Robots.txt files are a set of computer instructions designed to instruct the crawlers to avoid or pay attention to specific files, file types, directories/folders and possibly the entire site. It cannot pass PageRank because it is not a document that can contain active links.
I am also very much agree with "TheWebDoctor" that Robots.txt file will not pass page rank to any link that is in the file, because simple text doesn't pass any page rank to simple text URL, if it is then why we need to create internal linking with the anchor text of web pages and any type of other linking in link exchange, article, news submisson..etc

Any web page can get PR that is linked with other pages...it can be image..PDF...xml...Or .txt file ...
http://www.google.com/robots.txt
Reply With Quote
  #66 (permalink)  
Old 09-02-2009, 02:55 AM
Terry Van Horne's Avatar
WebProWorld Veteran
 
Join Date: Apr 2008
Location: Toronto On., Ca.
Posts: 471
Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by wige View Post
As for the RFC being useless, I would disagree. Google, Bing and Yahoo have all acknowledged that they follow the rules defined in that RFC. If you are using robots.txt for the purposes of controlling the spiders from these engines, don't you want to read the instruction manual first?
Hmmm and if I don''t drive a car why would I read the instruction manual?... like what is the purpose of that?

I use authentication if I don't want something seen indexed or whatever because I know that does stop ALL crawlers, unless you're google who decides to start executing script, from going where I don't want them. You choose your method I'll choose mine. We should just agree to disagree but if it makes you feel better.... you were right... can we drop this now? and get back to the topic which is the PR 5 and linking stuff?
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas.
Reply With Quote
  #67 (permalink)  
Old 09-17-2009, 07:29 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt can pass PageRank?

I was just reading and thought of sharing: Links Vs. Web References As Relevance Signals
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #68 (permalink)  
Old 09-17-2009, 07:42 AM
inertia's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Apr 2006
Location: Lancaster, UK
Posts: 1,021
inertia RepRank 6inertia RepRank 6inertia RepRank 6inertia RepRank 6inertia RepRank 6inertia RepRank 6inertia RepRank 6
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by Me
Maybe "brand rank" has some relation to how many times and in what context your company name appears on other sites -- regardless of linking. Obviously the possibilities for spamming on that are big but what if the algo looked for genuine mentions in related content? For example, radioshack gained some top rankings in the recent "brand rank" adjustment so maybe it was because their brand name appeared on millions of other pages.

The levels of brand name repetition would have to be really high tho, to solve the spam thing and all the other ranking factors are still in play so its not like a brand name site with no content and few links would just appear.
Google Giving a Boost To Branded Sites



__________________
Latest Blog Post: Google Consultant - Should this Job Title be Allowed? - Matt Inertia's SEO Blog - SEOers.org

"Carpe diem, seize the day boys, make your lives extraordinary"
- Dead Poets Society
Reply With Quote
  #69 (permalink)  
Old 09-17-2009, 07:43 AM
kgun's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,684
kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9
Default Re: Robots.txt can pass PageRank?

"A web reference refers to a mention of a brand, product or web site, which is not referenced in a link (or perhaps referenced as a no-followed link)".

Is that on topic John?
Reply With Quote
  #70 (permalink)  
Old 09-17-2009, 08:20 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by kgun View Post
"A web reference refers to a mention of a brand, product or web site, which is not referenced in a link (or perhaps referenced as a no-followed link)".

Is that on topic John?
If you would read some posts of Terry about citations and the article of Eric Enge, I think I did not necessarly go off topic. If I did, I apologize.

I do not have time for forums anymore, but I thought of popin for some minutes to share what I read.

Anyway, I have to get back to work.

Take care,

John
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #71 (permalink)  
Old 09-17-2009, 11:55 AM
Terry Van Horne's Avatar
WebProWorld Veteran
 
Join Date: Apr 2008
Location: Toronto On., Ca.
Posts: 471
Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4
Default Re: Robots.txt can pass PageRank?

Kgun... is an unlinked url in a Robots.txt a citation. F'in right it is! That is what the discussion is about. Are we now also the "on topic police"? If so... you got lots of work around here! I wouldn't be wastin' my time on John who obviously knows what a citation is... do you? Seems not. Now... pass me the red bull and be done with it and this time leave a meaningful comment with it please.
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas.
Reply With Quote
  #72 (permalink)  
Old 09-17-2009, 05:46 PM
kgun's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,684
kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by Terry Van Horne View Post
Kgun... is an unlinked url in a Robots.txt a citation. F'in right it is! That is what the discussion is about.
I thought the topic is:

Robots.txt can pass PageRank?

I know what traditional citations are. Pagerank is related to "link voting" or am I wrong? I am not talking about indirect, but direct effects.

Quote:
Originally Posted by Terry Van Horne View Post
Are we now also the "on topic police"? If so... you got lots of work around here! I wouldn't be wastin' my time on John who obviously knows what a citation is... do you? Seems not. Now... pass me the red bull and be done with it and this time leave a meaningful comment with it please.
I think there is a separate subforum about such topics. As far as I remember, you are a member. I am not.

More about PageRank here: Matt Cutts June 15, 2009 about PageRank sculpting

"Disclaimer: Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption. Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years. I’ll do the rest of my blog post in the framework of “classic PageRank” but bear in mind that it’s not a perfect analogy".

My bolding.
P.S.
I am not participating in this Getting high PR without links? thread.

Last edited by kgun; 09-17-2009 at 06:01 PM.
Reply With Quote
  #73 (permalink)  
Old 10-07-2009, 03:27 PM
Terry Van Horne's Avatar
WebProWorld Veteran
 
Join Date: Apr 2008
Location: Toronto On., Ca.
Posts: 471
Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by wige View Post
As for the RFC being useless, I would disagree. Google, Bing and Yahoo have all acknowledged that they follow the rules defined in that RFC. If you are using robots.txt for the purposes of controlling the spiders from these engines, don't you want to read the instruction manual first?
Now it's even useless as a reference... this is what sheeple are putting their trust in
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas.
Reply With Quote
  #74 (permalink)  
Old 10-07-2009, 04:05 PM
SemAdvance's Avatar
WebProWorld Veteran
 
Join Date: Dec 2005
Location: In Your Mind
Posts: 788
SemAdvance RepRank 3SemAdvance RepRank 3SemAdvance RepRank 3
Default Re: Robots.txt can pass PageRank?

Thinking about this more I have determined the answer to be no the robots.txt file does not pass PageRank.

URLs do not pass PageRank either, whether they are hyperllinks to links in text. the URL itself does not pass PageRank. All a URL does is allow a spider to find the page linked to.

If URLs passed PageRank then all anyone would need to do is create page after page of links to build their PageRank and then post them all over the web.

Google looks at the following from what I understand to determine PageRank, citation or merit would come into play.

Page casting the vote.

Site casting the vote.

Editorial content value of where the link is placed.

I am sure there is more to things than this, but as far as I can tell, it is the various pages and the trust associated to each, that casts the PageRank and not the URLs themselves.

Hence a robots.txt file would be filtered from the algorithm, as it has no value / merit as far as citation is concerned.

Just my 0.0001 cent,, adjusted due to economic turmoil......
Reply With Quote
  #75 (permalink)  
Old 10-07-2009, 04:29 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by Terry Van Horne View Post
Now it's even useless as a reference... this is what sheeple are putting their trust in
OMG!!! 503 - Service Not Available

How the hack can we still rely on our robots.txt?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 10-07-2009 at 04:36 PM.
Reply With Quote
  #76 (permalink)  
Old 10-07-2009, 04:29 PM
deepsand's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2004
Location: Philadelphia, PA
Posts: 3,226
deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by Terry Van Horne View Post
Now it's even useless as a reference... this is what sheeple are putting their trust in
Yields

503 - Service Not Available
Reply With Quote
  #77 (permalink)  
Old 10-07-2009, 04:38 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by deepsand View Post
Yields

503 - Service Not Available
I think I know what is going on. They are updating the robots.txt directives, adding the new directive "noindex". Maybe also "nofollow". Don't you think?

If I am right, that will be wicked, or?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #78 (permalink)  
Old 10-07-2009, 04:46 PM
SemAdvance's Avatar
WebProWorld Veteran
 
Join Date: Dec 2005
Location: In Your Mind
Posts: 788
SemAdvance RepRank 3SemAdvance RepRank 3SemAdvance RepRank 3
Default Re: Robots.txt can pass PageRank?

Who forgot to pay the hosting bill???????
Reply With Quote
  #79 (permalink)  
Old 10-07-2009, 04:46 PM
deepsand's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2004
Location: Philadelphia, PA
Posts: 3,226
deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9deepsand RepRank 9
Default Re: Robots.txt can pass PageRank?

Or, perhaps it's now a cloaked site, with only human visitors seeing "503 - Service Not Available!
Reply With Quote
  #80 (permalink)  
Old 10-07-2009, 06:12 PM
kgun's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,684
kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9
Default Re: Robots.txt can pass PageRank?

http://www.w3.org/robots.txt

http://www.google.com/robots.txt

Or bought by

http://www.yahoo.com/robots.txt Sorry, the page you requested was not found.

That redirects to Yahoo's home page.

http://www.bing.com/robots.txt
Reply With Quote
  #81 (permalink)  
Old 10-08-2009, 05:56 AM
Clint1's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jun 2005
Location: Louisiana, USA
Posts: 1,306
Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9Clint1 RepRank 9
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by SemAdvance View Post
Who forgot to pay the hosting bill???????
That's odd. All of their links are bad like that now when clicked on the cached page.
The Web Robots Pages . BTW:

503 Service Unavailable
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.


Looking at the time of the posts from Terry and Deepsand, looks like that's been happening for at least 13 hours so far.
__________________
God Bless,
-Clint
(Join Date: 2003)
Reply With Quote
  #82 (permalink)  
Old 10-08-2009, 04:19 PM
Terry Van Horne's Avatar
WebProWorld Veteran
 
Join Date: Apr 2008
Location: Toronto On., Ca.
Posts: 471
Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4Terry Van Horne RepRank 4
Default Re: Robots.txt can pass PageRank?

Quote:
Originally Posted by kgun View Post
http://www.w3.org/robots.txt

http://www.google.com/robots.txt

Or bought by

http://www.yahoo.com/robots.txt Sorry, the page you requested was not found.

That redirects to Yahoo's home page.

http://www.bing.com/robots.txt
Geez a SE with a robots.txt whod'athunkit
nobody bought nothing nor will they buy it because nobody owns it to sell it! I think they thought if the SEs are the benficiaries the SE's should pay for it! Looks good I'm glad someone finally showed it to be the sham standard it is!

Not only that... it is far from temporary... I have on reasonably good authority it's been like that for weeks!
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas.

Last edited by Terry Van Horne; 10-08-2009 at 04:29 PM.
Reply With Quote
Reply

  WebProWorld > Search Engines > Google Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt vs robots meta tag Gert Leroy Search Engine Optimization Forum 14 07-29-2009 07:48 AM
Robots.txt & PageRank Webnauts Google Discussion Forum 22 06-27-2009 06:49 PM
PageRank (PR) for Robots.txt? Webnauts Google Discussion Forum 47 08-27-2007 01:18 PM
Robots meta tags or Robots.txt? Webnauts Search Engine Optimization Forum 0 08-16-2007 01:03 AM
Toolbar Pagerank + Live Pagerank dwirken Google Discussion Forum 1 02-21-2006 05:55 PM


All times are GMT -4. The time now is 04:36 AM.



Search Engine Optimization by vBSEO 3.3.0