 |
|

01-17-2008, 10:23 AM
|
 |
Moderator
|
|
Join Date: Jun 2006
Location: United States
Posts: 1,783
|
|
Canonicalization Prevention Guide
Canonicalization, or more specifically the creation of duplicate content due to the way web server software handles variations on URLs, has become a much discussed topic here. Specific code snippets to resolve the problem in different situations have been widely discussed, but there is no single place that contains a list of different methods. I created this thread to list several of the more common ways of eliminating the two main types of canonicalization, www vs non-www duplication, and / vs /index.html canonicalization.
All of the examples I post will handle subdomain issues (www vs non-www), directory root issues (/ vs /index.html) and secure server (http vs https) issues if possible.
If you have any suggestions for other methods, please let me know and add them to the list.
__________________
The best way to learn anything, is to question everything.
Last edited by wige : 01-17-2008 at 02:45 PM.
|

01-17-2008, 10:24 AM
|
 |
Moderator
|
|
Join Date: Jun 2006
Location: United States
Posts: 1,783
|
|
Apache Server Specific
Apache Server Specific
Requirements:
Apache Server with mod_rewrite enabled
The ability to modify server settings using either .htaccess or access to the server configuration files.
www vs non-www
Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
Code:
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]
Remove /index.html or index.php from requests for the root of a folder
Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
Code:
RewriteRule ^/index\.(php|html)$ http://www.yourdomain.com/ [R=301, L]
RewriteRule ^(.*)/index\.(php|html)$ http://www.yourdomain.com/$1/ [R=301, L]
Handling secure content, securely
To prevent duplication of content due to the use of a secure connection (https), the content that should be available as secure should be in a seperate folder. Unfortunately, this is generally not possible. For the sake of completeness, however, here is how to do it. Generally, you would put the secure content in a subfolder of the root folder of your web site. For simplicity, you could name this folder "secure". You would then add the following directives to the .htaccess file in the root directory of your server.
Code:
<Directory /secure/>
Order Deny,Allow
Deny from All
</Directory>
This will prevent anything from crawling your secure content over an http connection. However, if you want some files to be available on both connections (/style.css, /favicon.ico and all the images in /img/, for this example) you will need to create a second htaccess file in the /secure folder, with the following directives:
Code:
Alias /favicon.ico /absolute/path/to/favicon.ico
Alias /style.css /absolute/path/to/style.css
Alias /img/ /absolute/path/to/img/
__________________
The best way to learn anything, is to question everything.
Last edited by wige : 01-17-2008 at 02:57 PM.
|

01-17-2008, 10:25 AM
|
 |
Moderator
|
|
Join Date: Jun 2006
Location: United States
Posts: 1,783
|
|
PHP Pages
PHP Code
If you do not have the ability to create or modify your server settings, and use PHP to generate your pages, you can accomplish the same thing by adding a code snippet to the beginning of your scripts. The following code must be the first thing in the script, before any output is sent to the browser. Note that this code should work even if there is an internal mod_rewrite or other URL mapping or aliasing in place.
PHP Code:
<?php if ($_SERVER['HTTP_HOST'] != 'www.yourdomain.com') { // First correct the domain issue header('Location: http://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301); exit(); } if (eregi('/index.(html|htm|php)$', $_SERVER['REQUEST_URI'])) { // Then correct the directory root issue $redirect = 'http://www.yourdomain.com'.eregi_replace('/index.(html|htm|php)', '/', $_SERVER['REQUEST_URI']); header('Location: '.$redirect, 301); exit(); } ?>
For simplicity, this could be added to a remote library script, and simply called by each PHP page on your site.
Handling HTTP and HTTPS
If you have HTTPS on your server, and do not want all of your content mirrored on both the HTTP and HTTPS versions, you can add the following lines to the top of every script, below the code I lay out above:
PHP Code:
<?php $SHOULD_BE_SECURE = true; // This should be true if the file should be available over HTTPS, false otherwise. require_once('/path/to/file/below.php'); ?>
Elsewhere, create a PHP file with the following lines. This is the file that the require_once will point to.
PHP Code:
<?php if (($_SERVER['HTTPS'] == 'off' && $SHOULD_BE_SECURE) || ) { header('Location: https://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301); exit(); } if (($_SERVER['HTTPS'] == 'on' && !$SHOULD_BE_SECURE) || ) { header('Location: http://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301); exit(); } ?>
You could also add the code at the top of this post to this file, and have every file on your site call this script to check the URL and do the appropriate redirections.
__________________
The best way to learn anything, is to question everything.
Last edited by wige : 01-17-2008 at 03:11 PM.
|

01-17-2008, 10:40 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,261
|
|
Re: Canonicalization Prevention Guide
Here's a thorough guide to setting up 301 redirects under IIS.
IIS 301 Redirect for SEO - McAnerin International Inc.
|

01-17-2008, 12:05 PM
|
 |
WebProWorld Member
|
|
Join Date: Jan 2008
Location: Oklahoma City, OK
Posts: 42
|
|
Re: Canonicalization Prevention Guide
|

01-17-2008, 12:20 PM
|
 |
Moderator
|
|
Join Date: Dec 2003
Location: Florida Keys/Western NC
Posts: 1,789
|
|
Re: Canonicalization Prevention Guide
Great idea, wige. Someone give him some more rep points!
Quote:
Originally Posted by jboeckman
|
That is a good article, thanks.
And Jaan's page has lots of resources for 301 redirects. I found this one most useful: How to Create Redirects.
How about making this a Sticky?
Cheers,
MJ
|

01-17-2008, 01:00 PM
|
 |
Moderator
|
|
Join Date: Jun 2006
Location: United States
Posts: 1,783
|
|
Re: Canonicalization Prevention Guide
Some great suggestions and links already, thanks!
Anyone have any ideas on how to accomplish this type of redirect on ASP .net sites? Generally if your site is on a shared Windows host you won't be able to access the IIS control panel, so this would need to be done programatically. I have seen instructions on doing the redirect:
Code:
<%@ Language=VBScript %>
<%
Response.Status="301 Moved Permanently";
Response.AddHeader("Location","http://www.new-url.com/");
%>
But I am not sure what the variables are that contain the host name, requested file name and query string so that these items can be tested. Any thoughts?
__________________
The best way to learn anything, is to question everything.
Last edited by wige : 01-17-2008 at 03:12 PM.
|

01-17-2008, 02:29 PM
|
 |
Moderator
|
|
Join Date: Jun 2006
Location: United States
Posts: 1,783
|
|
Perl CGI
PERL CGI Method
This is more commonly going to be used with older scripts and shopping cart systems if the server does not support htaccess. The following code should work with most PERL processors. This code must precede any other output to the browser.
Code:
if ($ENV{"HTTP_HOST"} != "http://www.yourdomain.com") {
$q = new CGI;
print "Status: 301 Found\nLocation: http://www.yourdomain.com".$ENV{"REQUEST_URI"}."\n\n";
exit();
}
Unfortunately, it has been quite some time since I have done regular expressions in PERL. Any suggestions for the syntax to check if a request ends with "/index.php", "/index.cgi" or "index.html"?
__________________
The best way to learn anything, is to question everything.
Last edited by wige : 01-17-2008 at 02:34 PM.
|

01-17-2008, 04:47 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,261
|
|
Re: Canonicalization Prevention Guide
This ASP redirect method allows you to pass along querystrings if you need to.
It fires back valid HTTP Status codes with "HTTP/ 1.1 200 OK"
*Caveat: To prevent hacking, It's a good idea to parse querystrings for illegal characters.
Check your headers: Check Server Headers Tool - HTTP Status Codes Checker
Code:
<%@LANGUAGE="VBSCRIPT"%>
<%
'*****************************************
' 301 Redirect for non-www domain entries
'*****************************************
host = Request.ServerVariables("HTTP_HOST")
page = Request.Servervariables("URL")
pageData = Request.ServerVariables("QUERY_STRING")
'allow passing of querystrings to the redirected URL
if pageData <> "" Then
pageData = "?" & pageData
end if
'substitute yoursite.com for your own URL.
if host = "yoursite.com" then
host = "http://www.yoursite.com"
newUrl = host & page & pageData
Response.Status = "301 Moved Permanently"
Response.AddHeader "Location", newUrl
Response.End
end if
'*******************
' End 301 Redirect
%>
|

01-17-2008, 06:29 PM
|
|
WebProWorld Member
|
|
Join Date: Mar 2006
Location: Colorado
Posts: 91
|
|
Re: Apache Server Specific
Quote:
Originally Posted by wige
Apache Server Specific
www vs non-www
Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
Code:
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]
|
I've tried it a few times, but it hasn't worked yet. Maybe it has something to do with what is in my .htaccess already:
Code:
ErrorDocument 500 http://www.insuranceshoppers.net/500.html
Should I put it before the "ErrorDocument" line? Including the RewriteEngine on directive, how should it look?
|

01-17-2008, 06:31 PM
|
|
WebProWorld Member
|
|
Join Date: Sep 2007
Posts: 47
|
|
Re: Canonicalization Prevention Guide
Very helpful, and I learned something useful. Thank you.
I will mention though that it's considered more forward thinking to redirect www.domain.com into domain.com rather than the other way around since even Tim Berners Lee considers the addition of www to any URL to be "the old web".
|

01-17-2008, 09:54 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Aug 2003
Location: Singapore
Posts: 553
|
|
Re: Canonicalization Prevention Guide
Thank you very much for sharing this as I was looking for how to go without both the index.html and index.php.
Great topic.
|

01-17-2008, 11:10 PM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Worldwide
Posts: 7,399
|
|
Re: Canonicalization Prevention Guide
Allow me to add mine here too:
########## Require the www to avoid cannonicalization issues ###
RewriteCond %{HTTP_HOST} ^yoursite.com [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [L,R=301]
########## Require to add trailing slash if not present to avoid cannonicalization issues ###
RewriteCond %{HTTP_HOST} !^www\.yoursite\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.yoursite.com/$1 [L,R]
########## Redirect index.html to / ##########
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ http://www.yoursite.com/$1 [R=301,L]
########## Redirect https to http ###
RewriteCond %{SERVER_PORT} ^443$
RewriteRule (.*) http://www.yoursite.com/$1 [R=301,L]
If you have https pages indexed because you have done a mistake, you can create an additional robots.txt calling it for example robots-secure.txt disallowing the indexed files and this in your .htaccess file:
########## To get rid of https files and cannonicalization issues ###
#RewriteCond %{SERVER_PORT} ^443$
#RewriteRule ^robots.txt$ robots-secure.txt
|

01-17-2008, 11:19 PM
|
 |
WebProWorld Veteran
|
|
Join Date: Aug 2003
Location: Singapore
Posts: 553
|
|
Re: Canonicalization Prevention Guide
Hi wige
After I add:
RewriteRule ^/index\.(php|html)$ http://www.yourdomain.com/ [R=301, L]
RewriteRule ^(.*)/index\.(php|html)$ http://www.yourdomain.com/$1/ [R=301, L]
It gave me Error 500.
I have no problem when I use this:
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]
Any idea?
Last edited by edhan : 01-17-2008 at 11:22 PM.
|

01-17-2008, 11:58 PM
|
|
WebProWorld New Member
|
|
Join Date: Mar 2007
Location: Mauritius
Posts: 16
|
|
Re: Canonicalization Prevention Guide
Hi,
Since i am not a programmer, i cannot add my pinch to this forum which nontheless speaks out to me since i am having a new version of my website under development and rewriting / redirection / duplicate content issues are being considered at the moment.
All i can say is THANKS for sharing your knowledge and i look forward to get to a topic where i will be able to bring savy advice ( SEO)
Nice day to all
|

01-18-2008, 12:02 AM
|
 |
WebProWorld 1,000+ Club
|
|
Join Date: Aug 2003
Location: Worldwide
Posts: 7,399
|
|
Re: Canonicalization Prevention Guide
Quote:
Originally Posted by Bookmauritius
Hi,
Since i am not a programmer, i cannot add my pinch to this forum which nontheless speaks out to me since i am having a new version of my website under development and rewriting / redirection / duplicate content issues are being considered at the moment.
All i can say is THANKS for sharing your knowledge and i look forward to get to a topic where i will be able to bring savy advice ( SEO)
Nice day to all
|
You do not need to be a programmer to copy and paste the above mods. I am not a programmer either.
|

01-18-2008, 02:13 AM
|
|
WebProWorld Member
|
|
Join Date: Aug 2006
Posts: 84
|
|
Re: Canonicalization Prevention Guide
nice.
|

01-18-2008, 09:18 AM
|
|
WebProWorld New Member
|
|
Join Date: Nov 2007
Posts: 19
|
|
Re: Canonicalization Prevention Guide
Very helpful, and I learned something useful. Thank you.
| |