WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Search Engines > Search Engine Optimization Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-17-2008, 10:23 AM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,783
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Canonicalization Prevention Guide

Canonicalization, or more specifically the creation of duplicate content due to the way web server software handles variations on URLs, has become a much discussed topic here. Specific code snippets to resolve the problem in different situations have been widely discussed, but there is no single place that contains a list of different methods. I created this thread to list several of the more common ways of eliminating the two main types of canonicalization, www vs non-www duplication, and / vs /index.html canonicalization.

All of the examples I post will handle subdomain issues (www vs non-www), directory root issues (/ vs /index.html) and secure server (http vs https) issues if possible.

If you have any suggestions for other methods, please let me know and add them to the list.
__________________
The best way to learn anything, is to question everything.

Last edited by wige : 01-17-2008 at 02:45 PM.
Reply With Quote
  #2 (permalink)  
Old 01-17-2008, 10:24 AM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,783
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Apache Server Specific

Apache Server Specific

Requirements:
Apache Server with mod_rewrite enabled
The ability to modify server settings using either .htaccess or access to the server configuration files.

www vs non-www
Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
Code:
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]
Remove /index.html or index.php from requests for the root of a folder
Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
Code:
RewriteRule ^/index\.(php|html)$ http://www.yourdomain.com/ [R=301, L]
RewriteRule ^(.*)/index\.(php|html)$ http://www.yourdomain.com/$1/ [R=301, L]
Handling secure content, securely
To prevent duplication of content due to the use of a secure connection (https), the content that should be available as secure should be in a seperate folder. Unfortunately, this is generally not possible. For the sake of completeness, however, here is how to do it. Generally, you would put the secure content in a subfolder of the root folder of your web site. For simplicity, you could name this folder "secure". You would then add the following directives to the .htaccess file in the root directory of your server.
Code:
<Directory /secure/>
Order Deny,Allow
Deny from All
</Directory>
This will prevent anything from crawling your secure content over an http connection. However, if you want some files to be available on both connections (/style.css, /favicon.ico and all the images in /img/, for this example) you will need to create a second htaccess file in the /secure folder, with the following directives:
Code:
Alias /favicon.ico /absolute/path/to/favicon.ico
Alias /style.css /absolute/path/to/style.css
Alias /img/ /absolute/path/to/img/
__________________
The best way to learn anything, is to question everything.

Last edited by wige : 01-17-2008 at 02:57 PM.
Reply With Quote
  #3 (permalink)  
Old 01-17-2008, 10:25 AM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,783
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default PHP Pages

PHP Code

If you do not have the ability to create or modify your server settings, and use PHP to generate your pages, you can accomplish the same thing by adding a code snippet to the beginning of your scripts. The following code must be the first thing in the script, before any output is sent to the browser. Note that this code should work even if there is an internal mod_rewrite or other URL mapping or aliasing in place.

PHP Code:
<?php
if ($_SERVER['HTTP_HOST'] != 'www.yourdomain.com') { // First correct the domain issue
   
header('Location: http://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301);
   exit();
}
if (
eregi('/index.(html|htm|php)$'$_SERVER['REQUEST_URI'])) { // Then correct the directory root issue
   
$redirect 'http://www.yourdomain.com'.eregi_replace('/index.(html|htm|php)''/'$_SERVER['REQUEST_URI']);
   
header('Location: '.$redirect301);
   exit();
}
?>
For simplicity, this could be added to a remote library script, and simply called by each PHP page on your site.

Handling HTTP and HTTPS
If you have HTTPS on your server, and do not want all of your content mirrored on both the HTTP and HTTPS versions, you can add the following lines to the top of every script, below the code I lay out above:
PHP Code:
<?php
$SHOULD_BE_SECURE 
true// This should be true if the file should be available over HTTPS, false otherwise.
require_once('/path/to/file/below.php');
?>
Elsewhere, create a PHP file with the following lines. This is the file that the require_once will point to.
PHP Code:
<?php
if (($_SERVER['HTTPS'] == 'off' && $SHOULD_BE_SECURE) || ) {
   
header('Location: https://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301);
   exit();
}
if ((
$_SERVER['HTTPS'] == 'on' && !$SHOULD_BE_SECURE) || ) {
   
header('Location: http://www.yourdomain.com'.$_SERVER['REQUEST_URI'], 301);
   exit();
}
?>
You could also add the code at the top of this post to this file, and have every file on your site call this script to check the URL and do the appropriate redirections.
__________________
The best way to learn anything, is to question everything.

Last edited by wige : 01-17-2008 at 03:11 PM.
Reply With Quote
  #4 (permalink)  
Old 01-17-2008, 10:40 AM
Dubbya's Avatar
WebProWorld 1,000+ Club
 

Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,261
Dubbya RepRank 3Dubbya RepRank 3
Default Re: Canonicalization Prevention Guide

Here's a thorough guide to setting up 301 redirects under IIS.

IIS 301 Redirect for SEO - McAnerin International Inc.
__________________
Printer ink, inkjet & toner cartridges in Canada
"Price-wise printing supplies"
inkjetOasis.ca
Reply With Quote
  #5 (permalink)  
Old 01-17-2008, 12:05 PM
jboeckman's Avatar
WebProWorld Member
 

Join Date: Jan 2008
Location: Oklahoma City, OK
Posts: 42
jboeckman RepRank 0
Default Re: Canonicalization Prevention Guide

There's a pretty good article here http://www.webconfs.com/how-to-redirect-a-webpage.php
Reply With Quote
  #6 (permalink)  
Old 01-17-2008, 12:20 PM
mjtaylor's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Dec 2003
Location: Florida Keys/Western NC
Posts: 1,789
mjtaylor RepRank 3mjtaylor RepRank 3
Default Re: Canonicalization Prevention Guide

Great idea, wige. Someone give him some more rep points!

Quote:
Originally Posted by jboeckman View Post
That is a good article, thanks.

And Jaan's page has lots of resources for 301 redirects. I found this one most useful: How to Create Redirects.

How about making this a Sticky?

Cheers,

MJ
__________________
M.-J. Taylor
SEO Web Design by Cyber Key Search Smart DesignŽ SEO Copywriter & Traveling Vacation Gypsy
Reply With Quote
  #7 (permalink)  
Old 01-17-2008, 01:00 PM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,783
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Re: Canonicalization Prevention Guide

Some great suggestions and links already, thanks!

Anyone have any ideas on how to accomplish this type of redirect on ASP .net sites? Generally if your site is on a shared Windows host you won't be able to access the IIS control panel, so this would need to be done programatically. I have seen instructions on doing the redirect:

Code:
<%@ Language=VBScript %>
<%
Response.Status="301 Moved Permanently";
Response.AddHeader("Location","http://www.new-url.com/");
%>
But I am not sure what the variables are that contain the host name, requested file name and query string so that these items can be tested. Any thoughts?
__________________
The best way to learn anything, is to question everything.

Last edited by wige : 01-17-2008 at 03:12 PM.
Reply With Quote
  #8 (permalink)  
Old 01-17-2008, 02:29 PM
wige's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jun 2006
Location: United States
Posts: 1,783
wige RepRank 4wige RepRank 4wige RepRank 4wige RepRank 4
Default Perl CGI

PERL CGI Method

This is more commonly going to be used with older scripts and shopping cart systems if the server does not support htaccess. The following code should work with most PERL processors. This code must precede any other output to the browser.

Code:
if ($ENV{"HTTP_HOST"} != "http://www.yourdomain.com") {
   $q = new CGI;
   print "Status: 301 Found\nLocation: http://www.yourdomain.com".$ENV{"REQUEST_URI"}."\n\n";
   exit();
}
Unfortunately, it has been quite some time since I have done regular expressions in PERL. Any suggestions for the syntax to check if a request ends with "/index.php", "/index.cgi" or "index.html"?
__________________
The best way to learn anything, is to question everything.

Last edited by wige : 01-17-2008 at 02:34 PM.
Reply With Quote
  #9 (permalink)  
Old 01-17-2008, 04:47 PM
Dubbya's Avatar
WebProWorld 1,000+ Club
 

Join Date: Nov 2006
Location: Steinbach, Manitoba, Canada
Posts: 1,261
Dubbya RepRank 3Dubbya RepRank 3
Default Re: Canonicalization Prevention Guide

This ASP redirect method allows you to pass along querystrings if you need to.
It fires back valid HTTP Status codes with "HTTP/ 1.1 200 OK"

*Caveat: To prevent hacking, It's a good idea to parse querystrings for illegal characters.

Check your headers: Check Server Headers Tool - HTTP Status Codes Checker

Code:
<%@LANGUAGE="VBSCRIPT"%>
<%
'*****************************************
' 301 Redirect for non-www domain entries
'*****************************************
 host = Request.ServerVariables("HTTP_HOST")
 page = Request.Servervariables("URL")
 pageData = Request.ServerVariables("QUERY_STRING")

'allow passing of querystrings to the redirected URL
 if pageData <> "" Then
	pageData = "?" & pageData
 end if

'substitute yoursite.com for your own URL.
 if host = "yoursite.com" then 
	host = "http://www.yoursite.com"
	newUrl = host & page & pageData
	Response.Status = "301 Moved Permanently"
	Response.AddHeader "Location", newUrl
	Response.End
 end if
'*******************
' End 301 Redirect 
%>
__________________
Printer ink, inkjet & toner cartridges in Canada
"Price-wise printing supplies"
inkjetOasis.ca
Reply With Quote
  #10 (permalink)  
Old 01-17-2008, 06:29 PM
WebProWorld Member
 

Join Date: Mar 2006
Location: Colorado
Posts: 91
lukkyjay RepRank 0
Default Re: Apache Server Specific

Quote:
Originally Posted by wige View Post
Apache Server Specific
www vs non-www
Add the following code to the .htaccess file in the root folder of your web content, or to the appropriate area of your server configuration, after the RewriteEngine on directive:
Code:
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]
I've tried it a few times, but it hasn't worked yet. Maybe it has something to do with what is in my .htaccess already:
Code:
ErrorDocument 500 http://www.insuranceshoppers.net/500.html
Should I put it before the "ErrorDocument" line? Including the RewriteEngine on directive, how should it look?
Reply With Quote
  #11 (permalink)  
Old 01-17-2008, 06:31 PM
WebProWorld Member
 

Join Date: Sep 2007
Posts: 47
DoneInStyle RepRank 0
Default Re: Canonicalization Prevention Guide

Very helpful, and I learned something useful. Thank you.

I will mention though that it's considered more forward thinking to redirect www.domain.com into domain.com rather than the other way around since even Tim Berners Lee considers the addition of www to any URL to be "the old web".
Reply With Quote
  #12 (permalink)  
Old 01-17-2008, 09:54 PM
edhan's Avatar
WebProWorld Veteran
 

Join Date: Aug 2003
Location: Singapore
Posts: 553
edhan RepRank 1
Default Re: Canonicalization Prevention Guide

Thank you very much for sharing this as I was looking for how to go without both the index.html and index.php.

Great topic.
Reply With Quote
  #13 (permalink)  
Old 01-17-2008, 11:10 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
 

Join Date: Aug 2003
Location: Worldwide
Posts: 7,399
Webnauts RepRank 3Webnauts RepRank 3
Default Re: Canonicalization Prevention Guide

Allow me to add mine here too:

########## Require the www to avoid cannonicalization issues ###
RewriteCond %{HTTP_HOST} ^yoursite.com [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [L,R=301]

########## Require to add trailing slash if not present to avoid cannonicalization issues ###
RewriteCond %{HTTP_HOST} !^www\.yoursite\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.yoursite.com/$1 [L,R]

########## Redirect index.html to / ##########
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ http://www.yoursite.com/$1 [R=301,L]

########## Redirect https to http ###
RewriteCond %{SERVER_PORT} ^443$
RewriteRule (.*) http://www.yoursite.com/$1 [R=301,L]

If you have https pages indexed because you have done a mistake, you can create an additional robots.txt calling it for example robots-secure.txt disallowing the indexed files and this in your .htaccess file:

########## To get rid of https files and cannonicalization issues ###
#RewriteCond %{SERVER_PORT} ^443$
#RewriteRule ^robots.txt$ robots-secure.txt
Reply With Quote
  #14 (permalink)  
Old 01-17-2008, 11:19 PM
edhan's Avatar
WebProWorld Veteran
 

Join Date: Aug 2003
Location: Singapore
Posts: 553
edhan RepRank 1
Question Re: Canonicalization Prevention Guide

Hi wige

After I add:

RewriteRule ^/index\.(php|html)$ http://www.yourdomain.com/ [R=301, L]
RewriteRule ^(.*)/index\.(php|html)$ http://www.yourdomain.com/$1/ [R=301, L]

It gave me Error 500.

I have no problem when I use this:

RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com
RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]

Any idea?

Last edited by edhan : 01-17-2008 at 11:22 PM.
Reply With Quote
  #15 (permalink)  
Old 01-17-2008, 11:58 PM
WebProWorld New Member
 

Join Date: Mar 2007
Location: Mauritius
Posts: 16
Bookmauritius RepRank 0
Default Re: Canonicalization Prevention Guide

Hi,

Since i am not a programmer, i cannot add my pinch to this forum which nontheless speaks out to me since i am having a new version of my website under development and rewriting / redirection / duplicate content issues are being considered at the moment.

All i can say is THANKS for sharing your knowledge and i look forward to get to a topic where i will be able to bring savy advice (SEO)

Nice day to all
Reply With Quote
  #16 (permalink)  
Old 01-18-2008, 12:02 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
 

Join Date: Aug 2003
Location: Worldwide
Posts: 7,399
Webnauts RepRank 3Webnauts RepRank 3
Default Re: Canonicalization Prevention Guide

Quote:
Originally Posted by Bookmauritius View Post
Hi,

Since i am not a programmer, i cannot add my pinch to this forum which nontheless speaks out to me since i am having a new version of my website under development and rewriting / redirection / duplicate content issues are being considered at the moment.

All i can say is THANKS for sharing your knowledge and i look forward to get to a topic where i will be able to bring savy advice (SEO)

Nice day to all
You do not need to be a programmer to copy and paste the above mods. I am not a programmer either.
Reply With Quote
  #17 (permalink)  
Old 01-18-2008, 02:13 AM
WebProWorld Member
 

Join Date: Aug 2006
Posts: 84
imsickofwebpro RepRank -1
Default Re: Canonicalization Prevention Guide

nice.
__________________
www.jacksonville-website-design.com
High-end Websites and Branding
Reply With Quote
  #18 (permalink)  
Old 01-18-2008, 09:18 AM
WebProWorld New Member
 

Join Date: Nov 2007
Posts: 19
ursfehr RepRank 0
Default Re: Canonicalization Prevention Guide

Very helpful, and I learned something useful. Thank you.
Reply With Quote