Submit Your Article Forum Rules

Page 1 of 12 12311 ... LastLast
Results 1 to 10 of 114

Thread: Canonicalization Prevention Guide

  1. #1
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Canonicalization Prevention Guide

    Canonicalization, or more specifically the creation of duplicate content due to the way web server software handles variations on URLs, has become a much discussed topic here. Specific code snippets to resolve the problem in different situations have been widely discussed, but there is no single place that contains a list of different methods. I created this thread to list several of the more common ways of eliminating the two main types of canonicalization, www vs non-www duplication, and / vs /index.html canonicalization.

    All of the examples I post will handle subdomain issues (www vs non-www), directory root issues (/ vs /index.html) and secure server (http vs https) issues if possible.

    If you have any suggestions for other methods, please let me know and add them to the list.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  2. The Following 4 users agree with wige:
  3. #2
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Apache Server Specific

    Apache Server Specific

    Requirements:
    Apache Server with mod_rewrite enabled
    The ability to modify server settings using either .htaccess or access to the server configuration files.

    www vs non-www
    Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
    Code:
    RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
    RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]
    Remove /index.html or index.php from requests for the root of a folder
    Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
    Code:
    RewriteRule ^/index\.(php|html)$ http://www.yourdomain.com/ [R=301, L]
    RewriteRule ^(.*)/index\.(php|html)$ http://www.yourdomain.com/$1/ [R=301, L]
    Handling secure content, securely
    To prevent duplication of content due to the use of a secure connection (https), the content that should be available as secure should be in a seperate folder. Unfortunately, this is generally not possible. For the sake of completeness, however, here is how to do it. Generally, you would put the secure content in a subfolder of the root folder of your web site. For simplicity, you could name this folder "secure". You would then add the following directives to the .htaccess file in the root directory of your server.
    Code:
    <Directory /secure/>
    Order Deny,Allow
    Deny from All
    </Directory>
    This will prevent anything from crawling your secure content over an http connection. However, if you want some files to be available on both connections (/style.css, /favicon.ico and all the images in /img/, for this example) you will need to create a second htaccess file in the /secure folder, with the following directives:
    Code:
    Alias /favicon.ico /absolute/path/to/favicon.ico
    Alias /style.css /absolute/path/to/style.css
    Alias /img/ /absolute/path/to/img/
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  4. The Following 4 users agree with wige:
  5. #3
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    PHP Pages

    PHP Code

    If you do not have the ability to create or modify your server settings, and use PHP to generate your pages, you can accomplish the same thing by adding a code snippet to the beginning of your scripts. The following code must be the first thing in the script, before any output is sent to the browser. Note that this code should work even if there is an internal mod_rewrite or other URL mapping or aliasing in place.

    PHP Code:
    <?php
    if ($_SERVER['HTTP_HOST'] != 'www.yourdomain.com') { // First correct the domain issue
       
    header('Location: http://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301);
       exit();
    }
    if (
    eregi('/index.(html|htm|php)$'$_SERVER['REQUEST_URI'])) { // Then correct the directory root issue
       
    $redirect 'http://www.yourdomain.com'.eregi_replace('/index.(html|htm|php)''/'$_SERVER['REQUEST_URI']);
       
    header('Location: '.$redirect301);
       exit();
    }
    ?>
    For simplicity, this could be added to a remote library script, and simply called by each PHP page on your site.

    Handling HTTP and HTTPS
    If you have HTTPS on your server, and do not want all of your content mirrored on both the HTTP and HTTPS versions, you can add the following lines to the top of every script, below the code I lay out above:
    PHP Code:
    <?php
    $SHOULD_BE_SECURE 
    true// This should be true if the file should be available over HTTPS, false otherwise.
    require_once('/path/to/file/below.php');
    ?>
    Elsewhere, create a PHP file with the following lines. This is the file that the require_once will point to.
    PHP Code:
    <?php
    if (($_SERVER['HTTPS'] == 'off' && $SHOULD_BE_SECURE) || ) {
       
    header('Location: https://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301);
       exit();
    }
    if ((
    $_SERVER['HTTPS'] == 'on' && !$SHOULD_BE_SECURE) || ) {
       
    header('Location: http://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301);
       exit();
    }
    ?>
    You could also add the code at the top of this post to this file, and have every file on your site call this script to check the URL and do the appropriate redirections.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  6. The Following 3 users agree with wige:
  7. #4
    WebProWorld MVP Dubbya's Avatar
    Join Date
    Nov 2006
    Location
    Steinbach, Manitoba
    Posts
    1,323

    Re: Canonicalization Prevention Guide

    Here's a thorough guide to setting up 301 redirects under IIS.

    IIS 301 Redirect for SEO - McAnerin International Inc.

  8. The Following 2 users agree with Dubbya:
  9. #5

    Re: Canonicalization Prevention Guide

    There's a pretty good article here http://www.webconfs.com/how-to-redirect-a-webpage.php

  10. The Following 4 users agree with jboeckman:
  11. #6
    WebProWorld MVP mjtaylor's Avatar
    Join Date
    Dec 2003
    Posts
    6,237

    Re: Canonicalization Prevention Guide

    Great idea, wige. Someone give him some more rep points!

    Quote Originally Posted by jboeckman View Post
    That is a good article, thanks.

    And Jaan's page has lots of resources for 301 redirects. I found this one most useful: How to Create Redirects.

    How about making this a Sticky?

    Cheers,

    MJ
    SEO Friendly Premium Web Directory - Submit Now| Need to write a love letter to Google? I'm an SEO Copywriter who knows Search Smart DesignŽ. | Travel Gypsy in Key West.

  12. The Following 2 users agree with mjtaylor:
  13. #7
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Re: Canonicalization Prevention Guide

    Some great suggestions and links already, thanks!

    Anyone have any ideas on how to accomplish this type of redirect on ASP .net sites? Generally if your site is on a shared Windows host you won't be able to access the IIS control panel, so this would need to be done programatically. I have seen instructions on doing the redirect:

    Code:
    <%@ Language=VBScript %>
    <%
    Response.Status="301 Moved Permanently";
    Response.AddHeader("Location","http://www.new-url.com/");
    %>
    But I am not sure what the variables are that contain the host name, requested file name and query string so that these items can be tested. Any thoughts?
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  14. The Following 2 users agree with wige:
  15. #8
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Perl CGI

    PERL CGI Method

    This is more commonly going to be used with older scripts and shopping cart systems if the server does not support htaccess. The following code should work with most PERL processors. This code must precede any other output to the browser.

    Code:
    if ($ENV{"HTTP_HOST"} != "http://www.yourdomain.com") {
       $q = new CGI;
       print "Status: 301 Found\nLocation: http://www.yourdomain.com".$ENV{"REQUEST_URI"}."\n\n";
       exit();
    }
    Unfortunately, it has been quite some time since I have done regular expressions in PERL. Any suggestions for the syntax to check if a request ends with "/index.php", "/index.cgi" or "index.html"?
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  16. The Following 3 users agree with wige:
  17. #9
    WebProWorld MVP Dubbya's Avatar
    Join Date
    Nov 2006
    Location
    Steinbach, Manitoba
    Posts
    1,323

    Re: Canonicalization Prevention Guide

    This ASP redirect method allows you to pass along querystrings if you need to.
    It fires back valid HTTP Status codes with "HTTP/ 1.1 200 OK"

    *Caveat: To prevent hacking, It's a good idea to parse querystrings for illegal characters.

    Check your headers: Check Server Headers Tool - HTTP Status Codes Checker

    Code:
    <%@LANGUAGE="VBSCRIPT"%>
    <%
    '*****************************************
    ' 301 Redirect for non-www domain entries
    '*****************************************
     host = Request.ServerVariables("HTTP_HOST")
     page = Request.Servervariables("URL")
     pageData = Request.ServerVariables("QUERY_STRING")
    
    'allow passing of querystrings to the redirected URL
     if pageData <> "" Then
    	pageData = "?" & pageData
     end if
    
    'substitute yoursite.com for your own URL.
     if host = "yoursite.com" then 
    	host = "http://www.yoursite.com"
    	newUrl = host & page & pageData
    	Response.Status = "301 Moved Permanently"
    	Response.AddHeader "Location", newUrl
    	Response.End
     end if
    '*******************
    ' End 301 Redirect 
    %>

  18. The Following 3 users agree with Dubbya:
  19. #10
    Member
    Join Date
    Mar 2006
    Posts
    79

    Re: Apache Server Specific

    Quote Originally Posted by wige View Post
    Apache Server Specific
    www vs non-www
    Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
    Code:
    RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
    RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]
    I've tried it a few times, but it hasn't worked yet. Maybe it has something to do with what is in my .htaccess already:
    Code:
    ErrorDocument 500 http://www.insuranceshoppers.net/500.html
    Should I put it before the "ErrorDocument" line? Including the RewriteEngine on directive, how should it look?

  20. The Following 5 users agree with lukkyjay:
Page 1 of 12 12311 ... LastLast

Similar Threads

  1. URL Canonicalization
    By adisonclay in forum Search Engine Optimization Forum
    Replies: 5
    Last Post: 04-27-2010, 06:13 PM
  2. Canonicalization
    By gbb011 in forum Google Discussion Forum
    Replies: 10
    Last Post: 12-06-2007, 08:49 AM
  3. Click fraud prevention?
    By A. Smith in forum Marketing Strategies Discussion Forum
    Replies: 0
    Last Post: 07-26-2006, 06:30 PM
  4. Spam Prevention Tip
    By colr in forum Web Programming Discussion Forum
    Replies: 19
    Last Post: 08-25-2004, 10:14 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •