PDA

View Full Version : SEO and Localization (L10N/I18N) for a multi-language PHP based website



bobitza
03-29-2009, 04:57 PM
Hello fellow forumists,

I'm working on a site that will be a multi-language site (for start it will have at least 2 languages).

The site will be developed with PHP/HTML.

The structure of the pages will be like index.php, about.php, etc. All the languages will have the same page layout, the difference is that the text of the pages will be stored into separate files and depending on the language selected, when called, the index.php, about.php, etc. will display different texts (with the same layout).

The language selection will be stored in a cookie, so once a user has selected the language there will be no GET or SESSION parameters passed, the users just browse the pages ... (i.e. About page will called about.php for ALL the languages; it's the variable in the cookie that will "tell" about.php what text to "load" so the users will get different results in the browser).

However, there will be links to change the language that will use GET parameters (i.e. index.php?lang=en) where index.php is in fact current php page (it's dynamically generated).

My question is: how this website structure (dynamically generated content based on a stored cookie) affects my SEO?

For example index.php can have two different text outputs depending on what cookie is stored in the user's browser. If the SE spiders will index both index.php?lang=en and index.php?lang=fr, they will still see the two different contents ... but will they do that? Should I use a permanent GET variabe passed along instead of a cookie?

What is the best SEO wise approach for developing multi language websites?

Thank you

====
Code Example:
Let's say the index.php "loads" a html template that has the following code inside:

< h1 > {TITLE1} < /h1 >

... and the title variable is dynamically replaced with texts from the selected language (i.e. {TITLE1} is replaced with 'Hello' for EN and 'Bonjour' for FR)

Jean-Luc
03-29-2009, 10:08 PM
Search engines will see the sites as visitors who disabled cookies. If the language code is not visible in the URL for all pages, search engines will only see one language version.

I would use either

- www.example.com and www.example.fr
but only French companies and individuals can get a .fr and it will be seen by Canadians or Belgians as a French site, not as an international site in the French language

- www.example.com and fr.example.com
but www.example.fr would be easier to remember for French visitors and it would probably rank slightly better in France then fr.example.com

Jean-Luc

bobitza
03-30-2009, 09:23 AM
Merci Jean-Luc, I was also considering having two "different" sites, something like example.com/en/ and example.com/fr/ ... but my understanding is that I will need 2 folders with 2 different set of files.

However I would like the idea of just having one layout template and the text "taken" dynamically from language files for a few reasons:

1) Easy to edit & translate. I can just throw the language files to someone that has no web programming skills whatsoever and ask him to edit or translate them.

2) Easy future implementations. I can easily add another language if needed ... I just need to create a "set of" language files and do minor modifications of the code that changes the languages.

3) Easy to debug. If I make 2 front-end templates folders, any changes I make in one language ... I need to go and make it in all the templates for other languages.

These are my arguments for keeping the current structure. So far it seems that the easiest way is to move around the site with the lang parameter in a GET variable ... but I would like to avoid that if possible because the URL is not "user-friendly" :)

The thing is that links like index.php?lang=fr, about.php?lang=en, etc. will still appear on each page (the links that allow the user to change the language) and can be crawled by the search engines bots. But because the default lang is en, I'm worried that index.php and index.php?lang=en will be seen as duplicate content by SE. Perhaps the canonical tag can help here?