|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
||||
|
I assume I know now how the whole thing works.
First you need to clean up those urls with mod_rewrite and then you should write a script that automatically sets a unique link relationship for every single page. If that solution is easier to implement than just writing an appropriate robots.txt, for me it is not. Am I correct? The canonical link relationship tells search engines that the preferred location of this url (the “canonical” location, in search engine speak) is http://example.com/page.html instead of http://www.example.com/page.html?sid=asdf314159265 But the canonical link relationship tag is only a "hint" to the search engine. While they'll probably use it 99% of the time, they reserve the right to handle things any way they want, in case of errors etc. If I use the robots.txt, using in addition the "noindex" directive, I feel it will be a far much better solution. Or am I still missing something?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Quote:
Quote:
Any answer? -- Quote:
__________________
My LinkedIn Profile - Matt Inertia's SEO Blog - SEOers.org - My Twitter "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society |
|
||||
|
Quote:
Quote:
Quote:
2. The duplicate pages will still be in the index and continue to get indexed but they should only appear when we hit the "repeat the search with the omitted results included." option. Can you explain which pages show up hitting the link "repeat the search with the omitted results included"? Aren't they pages of the supplemental index? If yes, what happened to PageRank Sculpting (Siloing)? Is that dead now? Quote:
Can you clarify?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-03-2009 at 07:43 PM. |
|
|||||
|
Quote:
Quote:
Quote:
This is page rank sculpting on a lower level, its sending pagerank to the canonical. I think your thinking about it way too much, the tag does a simple thing as ive repeated many a times in this thread... Quote:
Quote:
But my other point was... whats the need to block a page if its reassigning its pagerank to the canonical? Is there any harm in having a duplicate in the supplemental index if all its pagerank is going to the canonical?
__________________
My LinkedIn Profile - Matt Inertia's SEO Blog - SEOers.org - My Twitter "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society |
|
|||||||||
|
Quote:
Quote:
I can only say to that, that Google tells a lot when the day is long. If you believe everything they say, I would like to ask a very simple question: How is it possible that a monster search engine (biggest on the Internet) like Google, does not even understand 300 Multiple Choices HTTP Headers and show them as web site pages in their index. But still they provide us the list of the HTTP Status Codes http://www.google.com/support/webmas...n&answer=40132 How kind of them. So what the hell are we talking about? Are they kidding? I have posted previously in the thread a screenshot taken on the 27th of February 2008. But I am not the only one who is suffering. I was reading that others have the same problem, which is also very harmful: http://www.webmasterworld.com/google/3210610.htm So come on brother... ![]() Quote:
Quote:
1. Matt Cutts, head of the Google Webspam Team’s quote, speaks for itself: Quote:
2. Thanks to Andy Beal, you can hear Matt Cutts say: Quote:
So my question to you: Does this canonical tag help me to boost my PageRank as I can achieve with PageRank Sculpting? After all I read in the thread so far, not! Don't the existing PageRank sculpting methods already do what that "canonical attribute does, PLUS boosts the web sites PageRank? That said, simply using or relying on that attribute we are going backwards instead of forward. Do you get my point now? ![]() Quote:
If I am spoiling the thread, or I am getting boring, I will drop the subject right here man. No problem. Quote:
But we took this to another level some hours ago, which my CTO and SEO Technician will explain in a few. It is about misspelled IBLs. Quote:
P.S. Please have a look at this wonderful info a Google too: http://www.google.com/support/webmas...y?answer=66359 (Please read before responding to this post.) And for the last time: "The canonical link relationship tag is only a "hint" to the search engine. While they'll probably use it 99% of the time, they reserve the right to handle things any way they want, in case of errors etc." Personally I cannot find peace with that! I will go on researching and developing new methods to have as much control as possible over my own sites and my customers over theirs. We own our web sites and not the search engines. The search engines should knock our door and ask: Which pages may I enter and share with the public. And not the way around. Is that just me? If yes, then I am very proud about that.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-03-2009 at 11:04 PM. |
|
||||
|
Quote:
Quote:
That said, it still consumes and can pass PR. Please study this very carefully: http://www.seobook.com/robots-txt-vs...obots-nofollow
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-03-2009 at 11:37 PM. |
|
||||
|
As John (Webnauts) previously suggested, I came in to post the present official position of SEO Workers about the use of the canonical tag.
The shiny new canonical link tag may have its uses, but I'm not sure that pagerank sculpting is one of those uses. As Webnauts indicated in his quotes from Matt Cutts and others, the presence of the tag on a page that is not the canonical location will not remove that page from the index, and from what I can see, will not result in a recovery of any pagerank lost from the non-canonical page. As Matt Cutts indicated, the canonical link relationship is a "hint" , not a directive/mandate/requirement. The search engines are not required to follow the instruction given them in the link tag. Matt Cutts also says, "If you're a power user, exhaust alternatives first." Personally, I think of SEO professionals as "power users" in this case. If that is the case, I prefer to primarily use existing methods to eliminate canonicalization issues and perform pagerank sculpting before relying on the canonical attribute. This certainly does not mean the canonical link relationship tag is without its uses. In addition to implementing the tag on the bulk of the pages at SEO Workers (reinforcing our existing methods), we have used it to assist in another problem which Google has yet to tackle. Google's indexing routines, or perhaps Googlebot on its own, does not make a distinction between an http server response code of 200 ("OK", meaning that the requested document was found) and a response code of 300 ("Multiple Choices", meaning that the requested URI does not exist, but one or more close matches are available). I'm sure that many people are not familiar with this response code. It is a response given when the Apache "mod_speling" module has been enabled, and it finds one or more partial matches to the requested URI, but not a perfect match. In some cases, such as simple misspelling, letter transposition, or capitalization, the module will generate a 301 (permanent redirect) to the correct URI. You can see this in action by going to the link http://www.seoworkers.com/SITEMAP.HTML which will automatically redirect you to the URI with proper capitalization. Sometimes "mod_speling" finds a situation in which it cannot be certain about the correct URI. This can occur when the given URI matches the filename, but not the extension, or when more than one filename closely match the given URI. When this is the case, the server generates a "300 Multiple Choices" page, presenting links to the close matches. This page, as generated by the server, is just as stylish as a default "404 Not Found" page. The problem with Google is that it will index the "300 Multiple Choices" pages. This is definitely not what you want to happen. This is akin to indexing 404 error pages on your site. Google is now indexing a URI that does not actually exist on your server. Most sites now present customized 404 response pages when a file is not found. Sometimes a comical picture is shown, or a pithy quote, or even a sitemap to assist users in finding what they were looking for. How do you make a custom 300 response page? Custom "error" pages are set up on an Apache webserver by using an .htaccess file to specify the location of the HTML file to present in the event of such an error. With a 404 error code, this is fairly straightforward. A static HTML file can suffice quite nicely. With a 300 response, a static HTML file will not show the list of links that mod_speling generates. This can be solved by using a PHP file instead. When the 300 response is given, the server also generates a variable (REDIRECT_VARIANTS) which contains a list of the URIs which closely match the requested URI, and the reasons why mod_speling believes they match. Using PHP, we parse that list into an array which is then presented as a list within the body of the response page. So what about the canonical relationship tag? In the event that the list of URIs contains only one element, we implement the canonical tag using that single element as the canonical URI. In all cases, the response page also uses a robots meta tag with noindex, follow, noarchive, and nosnippet directives. By using the follow directive, all links on the page will be followed by the search engines. However, only the links within the list are the ones which we wish to use to pass any possible advantage to the correct page. It may not be desirable to have search engines index some URIs which may appear in the list. Such files should be excluded using the robots.txt file. As the response page also contains our sites primary navigation elements, we wish to exclude those links from being followed. Because we strictly do not use the "nofollow" attribute, we follow the alternative advise of Google, redirecting such links with a 301 to an intermediate page that is blocked from search engines with a robots.txt file (see Paid links - Webmasters/Site owners Help). This ensures that only the links to the suspected matches will be followed by the search engines. For an example of a 300 response page with a single match, see http://www.seoworkers.com/contact.php. For an example of a 300 response page with multiple matches, see http://www.seoworkers.com/sitemap.php. Though this may not be what Google, et al., had in mind when implementing support for the canonical link relationship tag, we think that such a use will help us in guiding the search engines through our sites in a way that best reflects our wishes. --Dan Johnson, CTO, SEO Workers
__________________
- Dexterity Unlimited Web Design and SEO in Champaign/Urbana - Create your own special cards and invitations Last edited by Narasinha; 03-04-2009 at 01:24 AM. Reason: Clarification. |
|
||||
|
Thank you Dan (Narasihna) for your valuable time and effort for explaining our company position about the pros and cons of the "canonical tag."
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||||
|
Quote:
Quote:
Quote:
Quote:
Quote:
I assume that you are posting this link because you believe the article to be 100% correct. If that is the case then can you explain to me how blocking a duplicate page from indexing with robots.txt and a meta noindex is pagerank sculpting? A noindex page can still build pagerank(according to aaron wall)... Quote:
Thanks for the info on 300s and thank you for the official position of the SEO Workers.
__________________
My LinkedIn Profile - Matt Inertia's SEO Blog - SEOers.org - My Twitter "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society Last edited by inertia; 03-04-2009 at 06:54 AM. |
|
|||||||
|
I started somewhere in the the thread mentioning about the use of the "canonical" tag for misspelling and for nothing else. We did not implement for the purpose discussed in the thread, e.g. PageRank Sculpting or canonical issues. We only added that as an additional feature (reinforcing our existing methods) for possible misspelled IBLs, which our Apache Module might cannot handle.
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Just a couple examples: http://www.gameshop.gr/robots.txt or http://www.seoworkers.com/robots.txt And about the alternative to the nofollow" attribute, Dan tried to explain that above, but for more details have a look at my post in our forums: Googlebot will follow and index these links? - Page 3 - SEO Workers Forums Quote:
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-04-2009 at 08:00 AM. |
|
||||
|
What are your opinions on this Rand article: SEOmoz | 12 Easy Mistakes that Plague Newcomers to the SEO Field?
Specifically these points... Quote:
Quote:
__________________
My LinkedIn Profile - Matt Inertia's SEO Blog - SEOers.org - My Twitter "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society |
|
|||||
|
Quote:
Quote:
Quote:
Quote:
But it would have been also a nice idea if Rand, you or someone else could explain how could we deal with that efficiently. For example with an online shop with ugly non-mod-rewritten urls . Quote:
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-04-2009 at 10:09 AM. |
|
||||
|
Quote:
__________________
My LinkedIn Profile - Matt Inertia's SEO Blog - SEOers.org - My Twitter "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society |
|
||||
|
I know, I know, I'm slow in replying and responding to topics like this. Had to do some reading and research.
Like this new tag, there are a lot of conventions, both new and old, which Google has a "typical" way of handling, but with which there are often exceptions. For example, when there is a 301 redirect, the destination URL is usually shown in the results page, however, sometimes the original URL is shown. I think this new tag needs to be looked at in light of these other conventions, as well as Sitemaps (also called hints by Google) and internal link structures. Google has a few different goals that they seem to be trying to achieve with this new tag. The first thing is that they want to have a way of creating a cluster of the URLs that are likely to be the same content. This is the overall goal, really. From that, they also want to know what is the preferred URL to display in the search results. When you use this tag, you are explicitly telling Google "for every duplicate URL you find in this cluster, I prefer the URL /blahblah.bla." Now, that doesn't mean Google will always follow your stated preference. If every other site on the web uses one of the other URLs in the cluster, for example, Google is likely to use that externally-preferred variation over the webmaster preference. The other sub-goal is to keep up with changes to the content. As you know, Google crawls web sites asynchrously. By using this new tag, if Google sees one page in the cluster has changed, it knows what the rest of the pages in the cluster are, and can check all or some of the others to see if they have also changed - this gives Google a few benefits. Pages merged because of duplicate content would no longer bounce in and out of the merge when content is updated but not fully crawled, and Google can crawl less pages. Once Google knows that 5 pages are in a cluster, when it detects one page is updated, it could simply pick a random second page to check to verify the change is cluster-wide (as opposed to the updated page no longer being in the cluster) and then assume the change applies to the entire cluster. Like sitemaps, this system seems to me to be a way for webmasters to provide the search engines with additional information about the structure of your own web site when the search engines have no other way of getting that information. I don't think the information and preferences from this tag will be considered more authoritative than external links for example, but I do think it will be weighted equivalently. At the very least, this gives a much cleaner alternative to the method of pure guesswork that search engines have been using up to now. For example, to handle session IDs, according to the instruction manual for Google's database engine, each element of the query string would be compared to the known session identifier names from a wide range of known server technologies, and removed if they were on the list. Of course, if you named a product id field the same as a known session id, you would risk that portion of the query string being automatically removed. All this notwithstanding, if you want to ensure that Google indexes your content with the exact URLs you want, and without the possibility of duplication, the only way to do it would be with 301 redirects. The (seemingly) hardest part would probably be preventing duplication in query strings because the parts of the string are out of order. This could probably be handled with a simple PHP code.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
Quote:
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-05-2009 at 04:23 AM. |
|
||||
|
Wige I see the dagling (links, pages) came into play, so I was wondering what about the following alternative:
Google advises: Quote:
As we mentioned above, we have internal and/or external links we do not want search engines to follow, so we redirect them to an intermediate page which is blocked from search engines with a robots.txt file. The alternative to the "nofollow" attribute. If we would hide those links from the search engines but not from the users, will the search engine consider that we are trying to deceive them? As far I understand Googles guideline seems to be fine. Or not? If that would be allowed, and we would hide those links, we would not that the dagling issues. Any thoughts?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
I thought I must clarify this once again. We implemented the tag only on the 300 Multiple Choices page of SEO Workers (reinforcing our methods), which we have used to assist in another problem which Google has yet to tackle (not understand 300 HTTP Responses.)
Just to avoid any misunderstandings.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Wige a question. Matt Cutts posted:
Quote:
Where do you see the difference between the "nofollow" and to what we are doing? Can you explicit? Or didn't I understand something from your previous posts?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
|||||
|
Quote:
Quote:
Quote:
Quote:
Quote:
__________________
My LinkedIn Profile - Matt Inertia's SEO Blog - SEOers.org - My Twitter "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society |
|
||||
|
Too sure to be true. Or did I miss any?
Quote:
![]() Quote:
Image replacement? We used to do that, but we found a better alternative to overcome its need.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-05-2009 at 07:16 AM. |
|
||||
|
Quote:
__________________
My LinkedIn Profile - Matt Inertia's SEO Blog - SEOers.org - My Twitter "Carpe diem, seize the day boys, make your lives extraordinary" - Dead Poets Society |
|
||||
|
Quote:
So, lets say you have a pagerank of 4 (using the simple example of TBPR rather than internal PR) which contains only two links. One link is paid, the other is not. If you have the paid link go through an interstitial page, and the other link is direct, Google sees the page as having two valid outgoing links. The page's pagerank is divided in half, and two points are reserved for each of the links. Because Google can't go to the interstitial page, it can't credit those two points to the link buyer. However, Google now drops those two points of pagerank. 4points available = 2points to the natural link + 2points for paid link ignored. Now lets say you have the same page, but mark the paid link with nofollow. When Google crawls the page, it only sees one valid link on the page. That link receives all of the pagerank available from the page, a full four points. 4points available = 4points to the natural link + 0points reserved for paid link. In both scenarios, Google's goal of pagerank not going to paid links is accomplished. However, the second scenario is more beneficial to the webmaster because all of the page's pagerank is being distributed. None gets discarded.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
And what does Google do when it encounters the 300 page? Does it consider it an HTTP error and not index the page, or index it with an error message, or treat it as something else? My big wish is that Google would support 400 response codes, but since most webmasters don't how to implement or use them, I doubt it would ever happen.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
Dave |
|
||||
|
Quote:
But thats no all. Read the post of Narasinha New Canonical Tag from the big 3 Quote:
See screenshot and facts: New Canonical Tag from the big 3 My big wish is that Google will learn to understand the real temporary redirects like 307, instead of that 302.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Quote:
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-05-2009 at 05:30 PM. |
|
||||
|
Wige, about the redirect technique we use for links we do not want to pass PR, Matt Cutts claims:
Quote:
About Google showing the url reference is not possible, since I have additionally implemented the in the robots.txt the "noindex" directive. What is Google doing different when someone uses the nofollow attribute? Can't they see the links to the pages, but they just don't crawl them? Can you please explain?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 03-05-2009 at 05:33 PM. |
|
||||
|
Quote:
The link should be treated as a dangling link and thus, removed from PR caculations. As long as your visitors can get to the page you'd be fine IMO. As long as the SE's can't get to and don't index the page you should be fine from a PR perspective also. I don't see where would be any issues. Dangling links aren't a problem as I see it John. They are simply removed from PR calculations and then available PR is split among the remaining links much like the "nofollow" does. Dave Last edited by crankydave; 03-05-2009 at 05:36 PM. |
|
||||
|
My understanding is that it works in the opposite way. Once Google has decided how much pagerank should go to each link, if it later determines that it can't deliver the pagerank for one of the links, it will never go back and recalculate how the pagerank is distributed. It might be easier to think of it as the pagerank goes to the redirect page, but since the redirect page is not indexed, those two points just "die" there.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
Probably saying that you are "hiding" the link is not the best way of phrasing it in my post. What you are doing is telling Google you don't want the link to be considered in certain calculations and processes.
__________________
The best way to learn anything, is to question everything. Last edited by wige; 03-05-2009 at 05:45 PM. |
|
||||
|
Quote:
Quote:
Dave Last edited by crankydave; 03-05-2009 at 05:58 PM. |
|
||||
|
Andy Beard tells:
Quote:
So whats next?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
I know, I am going out of order. Some people type too fast...
Quote:
Just a review of the (pertinent to this discussion) response codes, for those following along. This is in the format Code - Official definition - Translation: 300 - Multiple Choice - "The URL you requested doesn't exist. I found several matches." or "The URL you requested used to exist. The contents have been split into multiple files." 301 - Moved Permanently - "The requested URL used to exist, but the file is now at another URL" 302 - Found - The URL you requested does not exist. However, I found a match." 400 - Gone - "The URL you requested used to exist. It doesn't anymore. You don't need to ask again." 404 - Not Found - "The URL you requested is not available. I don't know if it ever existed, but I can't find anything similar to what you are looking for. Keep checking though, it might be created/come back." Now, only two of these error codes are supposed to be considered permanent, 301 and 400. These are used when files are deleted or moved. All of the other codes are intended to indicate temporary situations. Well, 404 is considered both - the server is not sure if the condition is temporary or permanent. If it knew, it would respond with one of the other mentioned codes. So what does this mean in practice? Lets say you have a page, contact.html. Now lets say someone requests contact.php. If you have the right software, your server should respond with a 302 message ("You requested contact.php. That is invalid. I FOUND contact.html. Go there.") or it could respond with a 300 message ("You requested contact.php. That is invalid. I found contact.html and contact.asp. CHOOSE one.") In both situations, the server is guessing what the user is looking for. So, for the way search engines handle 300, I am not suprised that they index the page. There is no reason not to, since 1) the page doesn't exist, 2) it has links to pages that do exist, 3) it is subject to change, 4) in this case, it requires user action - the user has to decide which page they actually want. Now, with all that said, in general use, 301, 302 and 404 are used much differently, and 300 and 400 are rarely used at all. The search engines have to cater to both the power users who actually know how this stuff works, and the casual webmaster who just has the basics down and just wants things to work. For 300 pages even, even browsers don't supprt it, and there are no standards for implementation. Ideally, upon seeing the 300 response, a supporting browser would just show the user the list of options and then the user would pick and continue browsing. I believe that was the original intent. However, the code is so rarely used it isn't supported at all.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
2. Can you pass PR with 302 redirects? Quote:
Quote:
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
I'd read Andy's thoughts on it John but thank you for reminding me.
Here's part of the problem as I see it... Once you start "removing" PR from the system, you start deviating from the sum of all PR=1... PR preservation. A deviation that would increase with each successive iteration. Any PR that is removed, the remaining PR becomes less and less and less with each successive iteration because of the dampening factor. While it's certainly possible that's how it works, the amount of PR that would be transferred by a dangling link gets "lost", it would really surprise me if it did. Additionally, we already know that "nofollow" links do not cause the amount of PR they would ordinarily pass to be lost. It doesn't make sense (to me) to treat a dangling link any differently. Dave Last edited by crankydave; 03-05-2009 at 06:27 PM. |
|
||||
|
Quote:
First, no I am not rereading the patent, its too late in the day, so I am hoping I am fairly accurate in this. I can reread it tommorow and fact check this post then. The first thing is that the quote in question relates to the recursion aspect of applying pagerank. Basically, pagerank constantly changes because after each crawl cycle, the system reapplies the pagerank to take into account the pages discovered in that crawl. During the recursive process, dead end pages represent an anomaly so they are removed, the recursion is processed, and then they are reinserted. But this is something that happens later, after Google decides how much of the pages rank gets shared However, Google no longer does recursive processing when calculating pagerank. They now use a system called everflux. As a result, I think the entire passage is outdated. Also, I believe it was stated that robots.txt excluded pages were one of very few special case that were not addressed in the original paper, and that they were one of a few sources of "pr leak". Not sure how I would find that document at this point though.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
One point wige... I read it as the links are removed and not the pages.
If Google did indeed remove the PR from the calculations for every "dead link" they had in their system their PR calculation wouldn't even be close. Dave |
|
||||
|
Quote:
Mostly, yes. But you said above that the search engines shouldn't index that page. However, there are a few reasons why it should - the matches discovered could change as pages are added/removed/altered, your server can't figure out which is the best match so the page rank might as well get divided up anyway, and if the indexing of the page means that for some search term that page ends up showing up in the SERPs, why not let users have that option of selecting which page is the best match - at that point, neither your server nor Google could decide. Quote:
Deal with, all of them. However, "supporting" the code would mean simply presenting the user with a list of pages to choose from - no displaying or rendering of a web page, no requesting linked files, etc. I don't think "defaulting to pretending the 300 was a 200" is the same thing as actually fully supporting the code.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
I skimmed the patent, and couldn't find it. I seem to remember this as pertaining to another part of the calculation, I thought it was the recursion, it might have been the normalization. I don't think it was the raw distribution, it had to do with the averaging and recalculating of the pagerank values, something that happened later than the initial calculation I was trying to describe in my post. Edit: Never mind, I found it. This passage has to do with the random surfer idea. If a page has no links, it becomes a "sink". This is resolved by taking the pagerank that has been received by that page, and dividing it among every page in the index - because each page has an equal chance of being visited by the random surfer. However, this represents an anomoly when recalculating pagerank. At that time, the dead ends are removed, the index is normalized, and the pages are replaced. Bah, I'll read it closer tommorrow.
__________________
The best way to learn anything, is to question everything. Last edited by wige; 03-05-2009 at 06:58 PM. |
|
||||
|
Eric Enge interview with Matt Cutts:
Quote:
Code:
GET /mailing-list/?p=unsubscribe&id=3&bots=nocrawl HTTP/1.1 Host: www.seoworkers.com User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: UTF-8,* Keep-Alive: 300 Connection: keep-alive Referer: http://www.seoworkers.com/ Cookie: __utma=215523349.1795271455.1236145108.1236285263.1236290809.17; __utmz=215523349.1236191584.8.6.utmccn=(referral)|utmcsr=seowatchblog.com|utmcct=/|utmcmd=referral; _csuid=49ae09512794c5b5; __utmc=215523349; PHPSESSID=e95s1j11u2ktchf65pcai8t075; __utmb=215523349 X-lori-time-1: 1236292418383 HTTP/1.x 200 OK Date: Thu, 05 Mar 2009 22:33:47 GMT Server: Apache Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Content-Encoding: gzip Vary: Accept-Encoding P3P: policyref="/w3c/p3p.xml" Pics-label: (pics-1.1 "http://www.icra.org/pics/vocabularyv03/" l gen true for "http://seoworkers.com" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 3) gen true for "http://www.seoworkers.com" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 3)) Imagetoolbar: no X-Robots-Tag: noindex,nofollow,noarchive,nosnippet Content-Length: 1372 Connection: close Content-Type: text/html Content-Language: en-us So when Google crawls that page and finds the X-Robots directive "NoIndex", so they won't even return that page. In addition, the link to that page found on all our pages have that "bots=nocrawl" in the url, which in the robots.txt we have: User-agent: Googlebot Disallow: *bots=nocrawl$ Noindex: *bots=nocrawl$ So what is happening in this case. Is that link still accumulating PR?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by rah; 03-06-2009 at 12:43 PM. Reason: Requested by webnauts |
|
||||
|
Does it matter? My thinking is that whether or not the PR gets credited to the blocked page, that page's portion of the linking page's PR is not being distributed to the other pages linked to.
I reread that, and I know its time for me to go home. See you all tommorrow.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
POST CONTINUED...
Quote:
Whats next?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Quote:
As you see the page shall not be indexed since we implemented for that page the X-Robots directives "noindex,nofollow,noarchive,nosnippet".
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Hi wige, long time no see.
Quote:
The conditions that mod_speling finds correspond to one of the following: "identical", "miscapitalized", "transposed characters", "character missing", "extra character", "mistyped character", "common basename". When redirection is called for, HTTP_MOVED_PERMANENTLY is the only redirection used, and it is hard-coded in mod_speling.c. From the source code: "Conditions for immediate redirection: a) the first candidate was not found by stripping the suffix AND b) there exists only one candidate OR the best match is not ambiguous then return a redirection right away." Note that the "identical" case does not redirect to the found URL. Why not? This is what it says in comments in the source: "If we end up with a "fixed" URL which is identical to the requested one, we must have found a broken symlink or some such. Do _not_ try to redirect this, it causes a loop!" I agree that a 302 redirect might be more appropriate than 301 for some results. However, the 300 response is just as temporary as a 404: maybe not at all. If someone were to compile their Apache server from the source code, they could change the response to 302. Quote:
(Hey, I just thought of a tool that I want! Maybe someone already makes it. For a given web page, display a list of the response codes given by each link on said page.) Quote:
__________________
- Dexterity Unlimited Web Design and SEO in Champaign/Urbana - Create your own special cards and invitations Last edited by Narasinha; 03-05-2009 at 10:59 PM. Reason: Clarification on 302 |
|
||||
|
Quote:
Some dead end pages have no benefit - 302 redirectors, robots.txt blocked pages, etc. - to the webmaster. In these cases, a nofollow on the linking page is the only way you can use that pagerank in a way that benefits your site. I think the misunderstanding is on this point: if a page accumulates pagerank but has a nofollow tag, thus becoming a dead end, the pagerank does not go back to the pages that linked to it. It goes forward and is divided among every page in the index. I think this is where the quote crankydave posted comes into play. Lets assume Google's index is composed of three pages, A B and C. A contains no links, but B and C both link to A. A is thus a dead end (same as if it were not indexable because of robots.txt). When Google calculates PR, at the first iteration, all three pages have a pagerank of .3333 since they are each 1/3 of the known web. At the next iteration, Google takes into account the links. The .333 that B and C have all gets sent to A, since both pages only link there. But how does A's outgoing PR get calculated? A's rank of .333 needs to be given to A, B and C - every page in the index, including itself. But if it gives itself pagerank, it has more pagerank to give, so it should be giving more pagerank... This becomes circular logic, and would make for an endless loop. The solution is that the dead end is removed from the equation as the pagerank is calculated (and page A gets a rank of ~ .900 after dampening and other factors). Now, the pagerank of A can be safely distributed to all of the other pages in the index.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
I realize mod_speling does not have the option to change the way it redirects, my question was more theoretical. Clients are supposed to remember 301 redirects. If they are told to go to a URL they no is permanently redirected somewhere else, they should not even bother requesting the original URL. This is something that search engines strongly adhere to. Is it really appropriate that an internal search function is telling clients never to request that URL? What if you at some point decide to create a landing page to capitalize on the misspelling? It will take a long time to convince the search engines to return and index that new landing page. Personally, I think an automated search function should only be giving temporary responses (300, 302 and 404 are not supposed to be cached. 301 and 400 are). Quote:
Quote:
Quote:
300: Browser only processes the headers. No body content is parsed or displayed, and is only served by the server if the browser doesn't support 300s. The user just sees a dialog box prompting to select the appropriate page. This allows the server to cut down on the overhead of dynamically generating the disambiguation page. 301: The browser should actually cache the redirection (for how long, who knows) the way search engines already do. Any time a link to a known 301 is followed, or the URL of a 301 page is manually entered, the browser should automatically request the destination page. This would cut down the bandwidth of repeat requests for moved documents. Only if the destination gets a 300, 301, 302, 400 or 404 should the original URL be rechecked. 302: Browsers and search engines handle this fine. 400: Again, this should be cached. Future requests for a URL that is GONE should display either a browser error message, or the cached error page. This would again cut down on repeat requests for removed content. 404: This is handled correctly. It's nice to dream, you know?
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Wige lets take this scenario:
I block the bots to crawl a link with the above mentioned method i.e bots=nocrawl. Then I add in the targeted page the canonical element. For example I have this link: Code:
http://www.seoworkers.com/mailing-list/?p=unsubscribe&id=3&bots=nocrawl Code:
<link rel="Canonical" href="http://www.seoworkers.com" /> Or if I send the bots to another page than I send the users, which page has robots meta tag directives "noindex,follow,noarchive,nosnippet" and on that page I have the links of my sitemap. Would that be an alternative?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Quote:
If a page is marked as nofollow, or noindex, I don't see any reason why Google would care if it has the canonical tag or not - it doesn't pass pagerank or get displayed in the regular index, so why would Google merge it with an existing page that doesn't have these limitations?
__________________
The best way to learn anything, is to question everything. |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Canonical issue in MSN | greg artim | MSN Search Discussion Forum | 1 | 04-06-2008 10:06 AM |
| Canonical Issue | Techyolk | Google Discussion Forum | 1 | 09-13-2007 12:46 PM |
| More Canonical Stuff | Psychobel | Google Discussion Forum | 1 | 05-17-2006 10:56 AM |
| Canonical issues...again | crankydave | Google Discussion Forum | 1 | 02-03-2006 06:23 PM |
| Canonical Hostnames Mod Rewrite | bjbtexas | Web Programming Discussion Forum | 2 | 04-22-2005 05:55 PM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2010 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |