Submit Your Article Forum Rules

Page 4 of 12 FirstFirst ... 23456 ... LastLast
Results 31 to 40 of 114

Thread: Canonicalization Prevention Guide

  1. #31

    Re: Canonicalization Prevention Guide

    Is it smart to expect more, for $43.05 a year?

  2. #32

  3. #33

    Re: Canonicalization Prevention Guide

    So I just called them and they said that yes, their $43.05 shared hosting supports htaccess files. They checked a sample of the htaccess we had written and said it needed fixing. So the mistake is ours, not theirs... (It's against their policy to rewrite code supplied by the client, otherwise the guy would have fixed it for me.)
    For the sake of Wige's compilation of solutions to canonicalization problems, if ever we come up with that file and it works, we'll copy it here.
    Thanks to all for your help.

  4. #34

    Re: Canonicalization Prevention Guide

    Hello Wige, it's me again, back from Google hell with total forgiveness of all my sins...

    We put this into an htaccess file outside of the secure folder:
    RewriteEngine On
    RewriteCond %{SERVER_PORT} !80
    RewriteRule ^(.*)$ http://www.spauno.com/$1 [R,L]

    We put this into another htaccess file inside the secure folder:
    RewriteEngine On
    RewriteCond %{SERVER_PORT} 80
    RewriteCond %{REQUEST_URI} secure
    RewriteRule ^(.*)$ https://www.spauno.com/secure/$1 [R,L]

    You can now navigate back and forth between secure and unsecured content without creating duplicate content and Google has reindexed the pages. So to be able to say that this works for sites in shared hosting, specifically GoDaddy Linux hosting, all we need is Wige's blessing.

  5. #35
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Re: Canonicalization Prevention Guide

    Looks good to me, and it should work in most Linux setups. The only thing I would change is from
    RewriteCond %{REQUEST_URI} secure
    to
    RewriteCond %{REQUEST_URI} ^/secure/

    That should prevent issues if you have another page on the site that includes the word secure in the filename from triggering an endless loop of redirects.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  6. #36

    Re: Canonicalization Prevention Guide

    Does flash content creates conical issue?

  7. #37
    WebProWorld MVP Webnauts's Avatar
    Join Date
    Aug 2003
    Location
    European Community
    Posts
    9,028

    Re: Canonicalization Prevention Guide

    Quote Originally Posted by wige View Post
    I am glad people have already found the information useful. To address some of the comments....

    edhan, I suspect the issue you are experiencing is arising because of a problem in the .php script itself, possibly because the script in question can't handle the filename not being in the REQUEST_URI. If you post (or PM me with) a URL where I can test it and replicate the issue, I may be able to confirm. The order of mod-rewrite directives vs other directives in your .htaccess file should not have any impact on the functionality. Error 500 though indicates an error in the script itself.

    Webnauts, I have a question for clarification about one of the snippets you posted, specifically,
    Code:
     ########## Require to add trailing slash if not present to avoid cannonicalization issues ###
    RewriteCond %{HTTP_HOST}   !^www\.yoursite\.com [NC]
    RewriteCond %{HTTP_HOST}   !^$
    RewriteRule ^/(.*)         http://www.yoursite.com/$1 [L,R]
    The comment indicates that this will add a trailing slash if it is missing, but the tests only look at the HTTP_HOST field, which would not contain a trailing slash. Only the domain name is contained in that field. The request (the portion after get or post in the request) is contained in REQUEST_URI. Did I misinterpret this?

    Please note, when doing redirects, R uses a 302 redirect (at least in Apache 2.0-2.2), using R=301 will force the server to respond with a permanent redirect, which is generally preferable so that search engines properly process the redirect.
    Wige my apologies for responding too late. Looks like I lost the track.

    Well, I added that option with the trailing slash, since it is technically different than without it.
    SEO advice: url canonicalization

    Also many try to get IBLs adding the trailing slash in their URLs like http://www.justanexample.com/, aiming to flow the PageRank only to their homepage. If you probably have noticed, many web directories forbid that already. Besides, if you have such IBLs, do you exclude the possibility that Google or other SE will not see that as a canonical issue?

  8. #38
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Re: Canonicalization Prevention Guide

    Quote Originally Posted by Webnauts View Post
    Also many try to get IBLs adding the trailing slash in their URLs like http://www.justanexample.com/, aiming to flow the PageRank only to their homepage. If you probably have noticed, many web directories forbid that already. Besides, if you have such IBLs, do you exclude the possibility that Google or other SE will not see that as a canonical issue?
    I do exclude that possibility, because that is not how the spiders work. In order for a spider (or a web browser or any other system) to request a file, it breaks the link into two parts, the hostname (which will be either a domain name or IP address) and a request URI, which always starts with a "/". So, to give a few examples of how a spider looks at different links:

    http://domain.tld/somepage.html
    Spider sees
    Protocol: http
    Hostname: domain.tld
    Request URI: /somepage.html

    http://domain.tld/
    Spider sees
    Protocol: http
    Hostname: domain.tld
    Request URI: /

    http://domain.tld
    Spider sees
    Protocol: http
    Hostname: domain.tld
    Request URI: /
    (The request URI can NEVER be blank, and MUST ALWAYS start with a /, so if no URI is included in the URL, a slash is used by default.)

    Even when the spider stores the information about the retrieved page in the index, (think giant relational database) that slash is always added, simply because the field containing the request URI can't be blank. The same happens when storing a list of links - the slash is added if not already present.

    Specifying a leading slash at the beginning of the Request URI is also expected by the server. If a request reaches your server without that leading slash, the server may simply give a bad request message, or ignore the request, depending on how your server is set up.

    I have also gotten indications from Google, Yahoo and MSN that their systems always add a leading slash if it is not already present, as does every malbot and spider system I have ever worked with. Even wget, which was the foundation for many spiders, automatically adds the slash. It is simply a default part of the HTTP protocol.

    Regarding the link you mentioned, I take it you are referring to the following:
    Quote Originally Posted by Matt Cutts
    • www.example.com
    • example.com/
    I have seen this used in numerous subsequent articles on canonicalization, used as the basis of an argument for taking steps to handle missing slashes. However, what he was highlighting was the absence of the subdomain in the second version, not the presence of the slash. In the rest of the article, he makes no mention of the slash at all, which leads me to believe this was only a typo. It was addressed in the comments, where Matt suggested selecting a preferred format for links, but it was not addressed beyond that.

    Above all, it is important to remember that to a server, the requests for www.example.com and www.example.com/ both look identical (GET / HOST: www.example.com) so anything you do on the server to redirect from one to the other is pointless anyway.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  9. #39
    Senior Member
    Join Date
    Jul 2003
    Posts
    398

    Re: Canonicalization Prevention Guide

    [QUOTE=wige;358772]
    If you want to add a trailing slash / if no file name is specified (domain.com/file becomes domain.com/file/) use the following:

    Code:
    RedirectMatch 301 ^/([a-zA-Z0-9/]*)$ http://domain.com/$1/
    "domain.com/file/" does make little sense to me, because the trailing slash in a URL indicates the index of the directory (See Apache DirectoryIndex directive).

    The simplest use of a webserver is to point it to directory tree and let it serve the files there. Typical conventions are that you do specify a default file extension, for example .html so that domain.com/abc serves domain.com/abc.html. The second common convention is that the domain.com/edf/ shows the list of files available, unless the DirectoryIndex directive (or equivalent for non Apache) is set and the file specified is present, such as domain.com/edf/ actually serves domain/.com/edf/index.html.

    To clarify when I say serve, I mean it returns thes specified content and not a redirect. This is evident by no change in the URL entry field of the browser.

    In the context of this thread, this can lead to duplicate content, as for example domain.com/edf/ returns the same content as domain.com/edf/index.html (However, if no one ever links to .../index.html the search engine would never discover the URL)

    By the way your redirect script would also change domain.com/abc.html into domain.com/abc.html/

    K<o>

  10. #40
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Re: Canonicalization Prevention Guide

    Quote Originally Posted by Conficio View Post
    By the way your redirect script would also change domain.com/abc.html into domain.com/abc.html/
    It shouldn't, but I will take another look. The pattern I am using:
    RedirectMatch 301 ^/([a-zA-Z0-9/]*)$ http://domain.com/$1/

    contains ([a-zA-Z0-9/]) which should only be true if the string does not contain a .. As a result, if a file extension is specified, the user should not be redirected. (The request URI must only contain those characters shown in brackets for the user to be redirected.)

    This is really intended to counteract a server setting that can allow the server to respond to requests omitting the trailing slash as though the slash was there, although it does also remove a possible error condition that would need to be handled with a 404 by redirecting the user to what is most likely the most desired location.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

Page 4 of 12 FirstFirst ... 23456 ... LastLast

Similar Threads

  1. URL Canonicalization
    By adisonclay in forum Search Engine Optimization Forum
    Replies: 5
    Last Post: 04-27-2010, 06:13 PM
  2. Canonicalization
    By gbb011 in forum Google Discussion Forum
    Replies: 10
    Last Post: 12-06-2007, 08:49 AM
  3. Click fraud prevention?
    By A. Smith in forum Marketing Strategies Discussion Forum
    Replies: 0
    Last Post: 07-26-2006, 06:30 PM
  4. Spam Prevention Tip
    By colr in forum Web Programming Discussion Forum
    Replies: 19
    Last Post: 08-25-2004, 10:14 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •