|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
||||
|
Both patents include "...the anchor text of backlinks to the document..." Obviously Google thinks that a backlink must have anchor text. Anchor text indicates hypertext documents.
Although patents are supposed to be written for a normal practitioner to understand, they can be confusing. Over the past two years I've had the opportunity to file for three patents. I had to ask my patent attorney to explain why he used certain phrases and words because of their broadness. Patents need to be broad enough to be enforceable but specific enough to cover the concept. And we wonder why we have attorneys. "Because citations, or links, are ways of directing attention, the important documents correspond to those documents to which the most attention is directed. Thus, a high rank indicates that a document is considered valuable by many people or by important people. Most likely, these are the pages to which someone performing a search would like to direct his or her attention. Looked at another way, the importance of a page is directly related to the steady-state probability that a random web surfer ends up at the page after following a large number of links. Because there is a larger probability that a surfer will end up at an important page than at an unimportant page, this method of ranking pages assigns higher ranks to the more important pages." Additionally, one can read the last paragraph of the patent. "The search engine is used to locate documents that match the specified search criteria, either by searching full text, or by searching titles only. In addition, the search can include the anchor text associated with backlinks to the page. This approach has several advantages in this context. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for images, programs, and other objects that cannot be indexed by a text-based search engine. This also makes it possible to return web pages which have not actually been crawled. In addition, the engine can compare the search terms with a list of its backlink document titles. Thus, even though the text of the document itself may not match the search terms, if the document is cited by documents whose titles or backlink anchor text match the search terms, the document will be considered a match. In addition to or instead of the anchor text, the text in the immediate vicinity of the backlink anchor text can also be compared to the search terms in order to improve the search." What part of "or links" or "backlink anchor text" led you to the conclusion that a textual or non-hypertext reference is a citation and used to determine PR? The concept of "clicking around" by a random web surfer also indicates that a hypertext link must exist. One cannot "click" a non-hypertext word or phrase to get to another page. I never said a robots.txt file was a protocol. I said the robots.txt file is a file that contains computer instructions for the crawlers. What the search engines determine they are going to do with the instructions is clearly up to them. Standford and Google have two patents covering PageRank. The last was approved in 2006. So, Terry, you're right. Google did improve their opinion on how they determine PR for a page. I'm sure they are fine tuning it again and will have another patent application filed soon if not already. Links in Gmail: “Someone recently suggested that a link sent to a Gmail account equals one link on one page. Also not true in any way.” — Matt Cutts, November 14, 2006. Myth busting: Links in Gmail "...to the best of my knowledge we don’t use any urls from Gmail either." -- Matt Cutts, July 19, 2008 Generic Toolbar Indexing Debunk Post Apparently, Google doesn't use Gmail to follow links.
__________________
Lee Roberts Ecommerce Software | SEO Friendly Directory | Oklahoma Business Directory |
|
||||
|
As some of you may know, I collect links. Personally, I can find links from a lot of different sources:
Source: PageRank sculpting Related WPW link: Matt Cutts June 15, 2009 about PageRank sculpting That does not imply that I say that other citations, aside from hyperlinks, can contribute to the (augusts 2009's) pagerank formulae.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 08-31-2009 at 02:39 PM. |
|
||||
|
Quote:
There are also google patents that discuss using apps like gmail, so... I choose to make my own decisions cite Matt Cutt's all you want that is how I choose to do things. Matt Cutt's works for Google they may have reasons for taking that stance perhaps because privacy is a touchy issue, especially email so... for someone who isn't a Google... I've seen plenty to indicate that may not be the case. I'm not disagreeing you may very well be correct but I loose nothing by keeping an open mind and discovering these things on my own. Have you looked at the links in my sig... I think I understand patents and the technology. You're preaching to the choir here only they aint buying the religion just because the preacher says it's so.
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas. Last edited by Terry Van Horne; 08-31-2009 at 06:26 PM. |
|
|||
|
robots.txt is a defacto protocol. There is no RFC for it, but...
A quote from google: "You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one)." /support/webmasters/bin/answer.py?hl=en&answer=156449 (I have less than ten posts) Last edited by BoBoMisiu; 08-31-2009 at 11:12 PM. Reason: typo |
|
||||
|
Quote:
In the end, they're of no consequence, as the final step is to normalize PR, so that the sum of the individual PRs of all indexed resources equals "1." Thus, any "lost" PR is redistibuted among all of the indexed resources, with the effect being of a trivial value.
__________________
The Penn State Ticket Man http://www.pennstateticketman.com http://www.happyvalleytickets.com http://www.hounddogtours.com |
|
||||
|
Quote:
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas. |
|
||||
|
But if you think, and you did say: "PageRank is a determination of link popularity", that is a misconception as I pointed out. G has brainwashed people into believing that, which slanders and ruins useful, legit, toiled-over websites with no or low PR simply because they are in a rarer or less popular niche.
Quote:
Quote:
And as you pointed out with the White House's robots.txt file, it's not how many IBL's but the PR of those IBL's. (That's why I commented "with a PR that high sites like example.com must be linking to it"
__________________
God Bless, -Clint (Join Date: 2003) Last edited by Clint1; 09-01-2009 at 02:52 AM. |
|
||||
|
Quote:
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
BoBoMisiu, a few comments in your post are I believe inaccurate...
robots.txt is a standard, not a protocol. A protocol establishes a methodology for the transfer of data between systems - HTTP and HTTPS are protocols. robots.txt and sitemaps would be standards. There really doesn't need to be an RFC for something to become a standard. What matters is, is the technology accepted and implemented by the target users. In this case, the search engines have adopted it, thus it is an acceptable standard. And for the record, there was an RFC for it: http://www.robotstxt.org/norobots-rfc.txt Quote:
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Yeah, I created a couple of new sites several days ago, I had forgotten to add robots.txt files to them, and within a couple of days MSN, Slurp and some others were returning 404's hitting on the non-existing file. I added the file not so much that I needed it but because I got tired of seeing the 404's. Over time they can really populate a lot of lines in the logs.
__________________
God Bless, -Clint (Join Date: 2003) |
|
|||
|
wow... what a thread!
In my humble opinion I would add this. 1. Probably not. google would probably only want to pass pr on links that are visible to humans. 2. Most people think that pr doesn't matter that much anyway, and in this case, there would be no anchor text passing with it. 3. It seems like it would be a very simple thing to test, though in the end, I think this is an interesting but academic question. |
|
||||
|
Wige, not to be argumentative but even as a standard it's not working as expected every SE has a different "take" on what should be in it and how that is used what you really have is a file that is ignored by more bots than follow it. like I said even standards to be "useful" have to be uniform otherwise you end up with what we had in the late 90's when browsers inrterpreted CSS and HTML so let's not pretend this decree from above (SE's) is anything but a guide that may or may not work as expected. That the directives may or may not be followed at the will of the crawler/bot using your bandwidth.... shall I go on about how calling this anything but a suggestion is giving it more value then it deserves as long as we "settle"... that's as good as it gets? Something this "supposedly" important is left to hit and miss and finger crossing?
By the way I knew about that site and the RFC but wouldn't tell anyone because it's BS. I've also read Alan Perkins articles (actually hold SE patents) on the subject of SE's interpretting the "standard" inappropriately, AFAIK now unavailable) and that is why I take the stand I have. As to the one 404 error if you're worried about that then.... you got much bigger issues than a 404 on a file that is not necessary for the routine functioning of the server.
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas. |
|
||||
|
I do agree that the implementation of robots.txt by the search engines varies widely, and many spiders do not obey the standard. However, that does not make it any less of a standard. Is the speed limit any less of a rule because only a few drivers actually obey it? Or, as a perhaps better example, is HTML 4.01 any less of a standard because there is currently no browser that fully adheres to it? Compliance with a standard is, by definition, voluntary. Bots don't have to obey, and webmasters don't have to create one.
The differences in implementation of robots.txt are also additive. All major search engines follow the basic rules of Disallow /whatever. Most add support for the elements covered in the RFC. And then finally, many add support for a few other things, such as sitemaps. Again, this doesn't mean that the standard is no good - just because many browsers implement <canvas>, HTML 4.01 isn't invalidated. As for the RFC being useless, I would disagree. Google, Bing and Yahoo have all acknowledged that they follow the rules defined in that RFC. If you are using robots.txt for the purposes of controlling the spiders from these engines, don't you want to read the instruction manual first?
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
Any web page can get PR that is linked with other pages...it can be image..PDF...xml...Or .txt file ... http://www.google.com/robots.txt |
|
||||
|
Quote:
I use authentication if I don't want something seen indexed or whatever because I know that does stop ALL crawlers, unless you're google who decides to start executing script, from going where I don't want them. You choose your method I'll choose mine. We should just agree to disagree but if it makes you feel better.... you were right... can we drop this now? and get back to the topic which is the PR 5 and linking stuff?
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas. |
|
||||
|
I was just reading and thought of sharing: Links Vs. Web References As Relevance Signals
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Quote:
__________________
Latest Blog Post: Google Consultant - Should this Job Title be Allowed? - Matt Inertia's SEO Blog - SEOers.org "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society |
|
||||
|
"A web reference refers to a mention of a brand, product or web site, which is not referenced in a link (or perhaps referenced as a no-followed link)".
Is that on topic John? |
|
||||
|
Quote:
I do not have time for forums anymore, but I thought of popin for some minutes to share what I read. Anyway, I have to get back to work. Take care, John
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Kgun... is an unlinked url in a Robots.txt a citation. F'in right it is! That is what the discussion is about. Are we now also the "on topic police"? If so... you got lots of work around here! I wouldn't be wastin' my time on John who obviously knows what a citation is... do you? Seems not. Now... pass me the red bull and be done with it and this time leave a meaningful comment with it please.
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas. |
|
||||
|
Quote:
Robots.txt can pass PageRank? I know what traditional citations are. Pagerank is related to "link voting" or am I wrong? I am not talking about indirect, but direct effects. Quote:
More about PageRank here: Matt Cutts June 15, 2009 about PageRank sculpting "Disclaimer: Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption. Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years. I’ll do the rest of my blog post in the framework of “classic PageRank” but bear in mind that it’s not a perfect analogy". My bolding. P.S. I am not participating in this Getting high PR without links? thread.
__________________
Mini Network:: Financial information at your fingertips Learn object oriented programming where it started Last edited by kgun; 09-17-2009 at 06:01 PM. |
|
||||
|
Quote:
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas. |
|
||||
|
Thinking about this more I have determined the answer to be no the robots.txt file does not pass PageRank.
URLs do not pass PageRank either, whether they are hyperllinks to links in text. the URL itself does not pass PageRank. All a URL does is allow a spider to find the page linked to. If URLs passed PageRank then all anyone would need to do is create page after page of links to build their PageRank and then post them all over the web. Google looks at the following from what I understand to determine PageRank, citation or merit would come into play. Page casting the vote. Site casting the vote. Editorial content value of where the link is placed. I am sure there is more to things than this, but as far as I can tell, it is the various pages and the trust associated to each, that casts the PageRank and not the URLs themselves. Hence a robots.txt file would be filtered from the algorithm, as it has no value / merit as far as citation is concerned. Just my 0.0001 cent,, adjusted due to economic turmoil...... |
|
||||
|
Quote:
How the hack can we still rely on our robots.txt?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 10-07-2009 at 04:36 PM. |
|
||||
|
Quote:
503 - Service Not Available
__________________
The Penn State Ticket Man http://www.pennstateticketman.com http://www.happyvalleytickets.com http://www.hounddogtours.com |
|
||||
|
I think I know what is going on. They are updating the robots.txt directives, adding the new directive "noindex". Maybe also "nofollow". Don't you think?
If I am right, that will be wicked, or?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Or, perhaps it's now a cloaked site, with only human visitors seeing "503 - Service Not Available!
__________________
The Penn State Ticket Man http://www.pennstateticketman.com http://www.happyvalleytickets.com http://www.hounddogtours.com |
|
||||
|
http://www.w3.org/robots.txt
http://www.google.com/robots.txt Or bought by http://www.yahoo.com/robots.txt Sorry, the page you requested was not found. ![]() That redirects to Yahoo's home page. http://www.bing.com/robots.txt |
|
||||
|
That's odd. All of their links are bad like that now when clicked on the cached page.
The Web Robots Pages . BTW: 503 Service Unavailable The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response. Looking at the time of the posts from Terry and Deepsand, looks like that's been happening for at least 13 hours so far.
__________________
God Bless, -Clint (Join Date: 2003) |
|
||||
|
Quote:
nobody bought nothing nor will they buy it because nobody owns it to sell it! I think they thought if the SEs are the benficiaries the SE's should pay for it! Looks good I'm glad someone finally showed it to be the sham standard it is! Not only that... it is far from temporary... I have on reasonably good authority it's been like that for weeks!
__________________
Follow me on Twitter! On the Trail with SOSG How I became a Social Media Convert and Twitter and Agents of Influence and now regular poster at Cloudmixer where We're Mixing New Media Ideas. Last edited by Terry Van Horne; 10-08-2009 at 04:29 PM. |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| robots.txt vs robots meta tag | Gert Leroy | Search Engine Optimization Forum | 14 | 07-29-2009 07:48 AM |
| Robots.txt & PageRank | Webnauts | Google Discussion Forum | 22 | 06-27-2009 06:49 PM |
| PageRank (PR) for Robots.txt? | Webnauts | Google Discussion Forum | 47 | 08-27-2007 01:18 PM |
| Robots meta tags or Robots.txt? | Webnauts | Search Engine Optimization Forum | 0 | 08-16-2007 01:03 AM |
| Toolbar Pagerank + Live Pagerank | dwirken | Google Discussion Forum | 1 | 02-21-2006 05:55 PM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |