Contact Us Forum Rules Search Archive
WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Search Engines > Search Engine Optimization Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-02-2006, 01:11 PM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default Special Characters in URL (Ü,ö,ä)

I am not sure where this problem is but I am even having a hard time posting a URL here with special characters.

So far MSN has indexed these pages

search.msn.com/results.aspx?q=site%3Awww.optek.com%2Fde%2FApplica tion_Notezen&Form=MSNH

Notice how the SERP displays the URL correctly with any special characters. But if you click the result, it sends you to a URL with the special characters replaced.

For example (you have to copy and paste these URLs):
http://www.optek.com/de/Application_...Kühlwasser.asp

http://www.optek.com/de/Application_...ühlwasser.asp

Both links send the user to the same page, but with my system you will notice that the 1st example has the proper Title where as the 2nd one has my default Title.

Somehow my 404 page is routing both request correctly, but the SEO system I built sees these as 2 different pages since the URLs are different.

I can easily set the 2nd one up to display the same Title information as the first, but what I am wondering is will these pages get flagged as duplicates?

I am hoping someone can help, I would like to fix this before G gets on it.
Reply With Quote
  #2 (permalink)  
Old 06-02-2006, 03:24 PM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

No one has any ideas?
Reply With Quote
  #3 (permalink)  
Old 06-02-2006, 03:54 PM
crankydave's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Aug 2004
Location: Playing with fire!
Posts: 3,220
crankydave RepRank 4crankydave RepRank 4crankydave RepRank 4crankydave RepRank 4
Default

If I'm not mistaken, the source code tells me those are 2 different pages.

If you don't 301 one page to the other, one will likely be seen as a duplicate and cause problems.

Dave
Reply With Quote
  #4 (permalink)  
Old 06-02-2006, 04:05 PM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

You are correct dave, looking at the source code they do appear to be different pages.

I developed a system that dynamically creates all the title, keyword, description tags.

Essentially when a user comes to a page on my site, there is a db lookup for the requested URL, if the lookup finds the URL requested it pulls all the Title, keyword, description tags from other tables and displays them.

I could 301 one of the pages but I think a better solution would be to figure out why/how msn (and probably other SE's) are displaying the special characters in the url, but linking to a URL with encoded special characters

for example: ö = ö

You know what I am trying to say?
Reply With Quote
  #5 (permalink)  
Old 06-02-2006, 04:19 PM
crankydave's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Aug 2004
Location: Playing with fire!
Posts: 3,220
crankydave RepRank 4crankydave RepRank 4crankydave RepRank 4crankydave RepRank 4
Default

Quote:
Originally Posted by pablowerk
You are correct dave, looking at the source code they do appear to be different pages.

I developed a system that dynamically creates all the title, keyword, description tags.

Essentially when a user comes to a page on my site, there is a db lookup for the requested URL, if the lookup finds the URL requested it pulls all the Title, keyword, description tags from other tables and displays them.

I could 301 one of the pages but I think a better solution would be to figure out why/how msn (and probably other SE's) are displaying the special characters in the url, but linking to a URL with encoded special characters i.e. öl = ö

You know what I am trying to say?
Yes, but in the mean time, using a 301 would save yourself potential problems while you're tring to figure why/how. You could always remove it.

I'm not being taken to any pages with the special characters being replaced. They are taking me right to the page they list in all 8 examples.

Sorry can't be of too much more help. You might want to send Faglork a PM about this thread. He'll likely be of better help than I.

Dave
Reply With Quote
  #6 (permalink)  
Old 06-02-2006, 04:20 PM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

Well I just took a look at Yahoo, and they too encode these special characters, but they must encode them differently than MSN, because the correct page (w/ the proper Title tags etc) is displayed.

In short:

Yahoo sends a user to
http://www.optek.com/de/Application_...BChlwasser.asp

which my server interperates as
http://www.optek.com/de/Application_...Kühlwasser.asp
Reply With Quote
  #7 (permalink)  
Old 06-02-2006, 04:22 PM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

Thanks dave I will try and PM Faglork
Reply With Quote
  #8 (permalink)  
Old 06-02-2006, 06:55 PM
WebProWorld Veteran
 

Join Date: Dec 2005
Location: In Your Mind
Posts: 663
SemAdvance RepRank 2
Default

Possible reasons

Different encoding based upon the country code in your head tags??

Another

MSN on windows

Yahoo on Unix if I remember.

Different server environments and

three completely different crawlers between the top 3 search engines.

Have you looked at inkitomi and teoma and how they handle the URLs?
Reply With Quote
  #9 (permalink)  
Old 06-02-2006, 08:10 PM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 990
Faglork RepRank 0
Default

No easy answer. I have to look into that. But I need some time ... which I don't have right now. Sorry.

I never have these problems, because I do not use "Umlauts" my urls:

ä --> ae
ö --> oe
ü --> ue
ß --> ss
...

Even if you tackle the problem with your URLs in SEs and Browsers, you never know where the problems will get back to you. You may try to burn a backup-CD of this site and your CD burning prog will freak out. Or your internal server backup will get hiccups. Or you try and mail an URL to a prospective client, whose mail program will mess it up. Or or or ...

As far as URLs are in question: Do not use umlauts. It should be quite easy to modify your CMS to take care of that.

Cheers,
faglork

BTW: Who did the translation of that site? It is a bit awkward ... it is *understandable*, ok, but it is not good German language. Do you know the German term "radebrechen"? It means somthing like "a foreign visitor with a limited knowlegde of the language is desperately trying to speak German". This makes the website "look" cheap ... better get an experienced native speaker to look over the text.


[/code]
Reply With Quote
  #10 (permalink)  
Old 06-02-2006, 09:21 PM
WebProWorld Veteran
 

Join Date: Dec 2005
Location: In Your Mind
Posts: 663
SemAdvance RepRank 2
Default

In reply to your first post yes Google will see one page as a duplicate of the other.

Second both title tags from the pages are the same

<title>Spuren von Schmieröl im Kühlwasser</title>

<title>Spuren von Schmieröl im Kühlwasser</title>

Did you have another title in mind?

================================================== ===
Yes the search engine displays the URL just as gthe search spider "read" the URL...the same as the rest of the text and links on your page that it reads and displays.

However processing the URL, is a server side directive (action) and not what a spider read.

----------------------------------------------------
Next for what reason are you going to this steps in your "seo system???" ??

a search of the keyword term

Spuren von Schmieröl im Kühlwasse

Returns just 164 pages

Ergebnisse 1 - 10 von ungefähr 164 für Spuren von Schmieröl im Kühlwasse

on the other term

Spuren von Schmieröl im Kühlwasser

Es wurden keine mit Ihrer Suchanfrage - Spuren von Schmieröl im Kühlwasser - übereinstimmenden Dokumente gefunden.

There are no competitive pages.

I cannot imagine that the amount of searches for these terms would justify the time you have spent trying to build an seo system.

Also as Faglork stated, no matter what you do.... there is always going to be an issue with using special characters, some search spiders can handle them others cannot.

Lastly I have to go back to your website and it's purpose. You are not selling these items in an e-commerce fashion, and so the object of your site then should be, to push the visitor to your offline sales efforts. Perhaps I am mistaken..

And being as what you are dealing in seems to be not standard fare.... people who search for you should find you easily... by you using simple basic seo steps and less of a "system".

My thoughts
Reply With Quote
  #11 (permalink)  
Old 06-03-2006, 12:16 AM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,733
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

Just to back up a bit, why in the first place are you using special characters in the URL in the first place. Would it better to fix that from the beginning?

If you are doing it because that is the way those characters are displayed in German, then consider not doing it.

Also I am sure this is not a new issues for the SE and they probably each handle (translate those characters) it differently and should not be an issue. Sort of like when people mistakenly leave spaces in the middle of URLs
Reply With Quote
  #12 (permalink)  
Old 06-03-2006, 04:47 AM
WebProWorld 1,000+ Club
 

Join Date: Jul 2003
Location: United Kingdom
Posts: 1,897
TrafficProducer RepRank 1
Default Search Umlauts urls

Google Search Umlauts urls

Google search Umlauts display in tile

ICANN Internet Corporation for Assigned Names and Numbers (ICANN)

I believe domain registars where discusiing issues about domain names with Umlauts in them, not sure what is happening about this.

e.g.

Code:
http://www.NoUmlauts.com
http://www.UmlaÜts.com
ASCII, American Code for Information Interchange. Character Symbols. Hexadecimal Binary Octal etc..

OPERATOR DIFFERENCES The .. range operator treats certain character ranges with care on EBCDIC machines. For example the following array will have twenty six elements on either an EBCDIC machine or an ASCII machine. Find out about cJ, cI, (You may see these when loading Excel characters codes via PERL), etc...
__________________
Videos to Watch and Video Publishing
Affiliate Program! Our Affiliate Program Pays 50.00% Business Support
Reply With Quote
  #13 (permalink)  
Old 06-03-2006, 05:11 AM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 990
Faglork RepRank 0
Default Re: Search Umlauts urls

Quote:
Originally Posted by TrafficProducer
I believe domain registars where discusiing issues about domain names with Umlauts in them, not sure what is happening about this.

e.g.

Code:
http://www.NoUmlauts.com
http://www.UmlaÜts.com
You can register domains with umlauts, depending on the tld. Here is a list, and a very good explanatory article as well:

http://en.wikipedia.org/wiki/Interna...d_domain_names

Problem is, mailservers don't handle this, so it will be confusing - you need a maildomain without umlaut.

I personally do not recommend umlaut-domains, unless you got a special reason.

Cheers,
faglork
Reply With Quote
  #14 (permalink)  
Old 06-05-2006, 06:29 PM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

Quote:
Originally Posted by Faglork
BTW: Who did the translation of that site? It is a bit awkward ... it is *understandable*, ok, but it is not good German language. Do you know the German term "radebrechen"? It means somthing like "a foreign visitor with a limited knowlegde of the language is desperately trying to speak German". This makes the website "look" cheap ... better get an experienced native speaker to look over the text.
It's funny you say that, because the translation was done by one of our employees in our German headquarters (a native speaker!). I am having another person review these pages over here, he also noticed the improper German.

I am also checking with our German office about substituting those characters for the Umlaut characters. Thanks for the tip.
Reply With Quote
  #15 (permalink)  
Old 06-05-2006, 06:59 PM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

Sorry for not replying quicker.

Just to give everyone a little more background info, I created a CMS that allows both our German and American salespeople to enter these application notes. The URL is created according to the title of the application along with its language and industry codes.

All these notes were originally written in English then translated into German. This is when I first learned about the issue with the special characters.

Currently I feel the best solution would be to switch these special characters using the substitutes given by Faglork, then 301 any of the incorrectly indexed URLs.

Quote:
Originally Posted by Faglork
ä --> ae
ö --> oe
ü --> ue
ß --> ss
Is it common for German searchers to search using these substitutions?

Regarding the "seo system", this system handles every page on the website, it makes it easy to track changes made to any page. A simple querry and I can check when and who changed any content to the title, description, and keyword tags.
Reply With Quote
  #16 (permalink)  
Old 06-06-2006, 05:52 AM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 990
Faglork RepRank 0
Default

Quote:
Originally Posted by pablowerk
It's funny you say that, because the translation was done by one of our employees in our German headquarters (a native speaker!). I am having another person review these pages over here, he also noticed the improper German.
Perhaps he was trying to keep too close to the original , which often results in a sort of "word by word" translation.

The best solution would be to get an acknowledged translator. It is not very expensive, but you sure get the best results.

Cheers,
faglork
Reply With Quote
  #17 (permalink)  
Old 06-06-2006, 05:58 AM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 990
Faglork RepRank 0
Default

Quote:
Originally Posted by pablowerk

Currently I feel the best solution would be to switch these special characters using the substitutes given by Faglork, then 301 any of the incorrectly indexed URLs.

Quote:
Originally Posted by Faglork
ä --> ae
ö --> oe
ü --> ue
ß --> ss
Almost all CMSes do it that way.

Quote:
Originally Posted by pablowerk
Is it common for German searchers to search using these substitutions?
No, they use the umlauts. Don't confuse this: the substitution is just for the filename, so the umlauts stay in the text. No problem for search engines.


hth,
faglork
Reply With Quote
  #18 (permalink)  
Old 06-06-2006, 09:53 AM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

Sounds good, thanks for all the help!
Reply With Quote
  #19 (permalink)  
Old 06-06-2006, 11:28 AM
WebProWorld Member
 

Join Date: May 2005
Location: Wisco
Posts: 45
pablowerk RepRank 0
Default

Faglork, I was wondering if there is a list of these character substitutions somewhere? I have been trying to find one, but I don't actually know what these substitutions would be called?
Reply With Quote
  #20 (permalink)  
Old 06-06-2006, 12:15 PM
Faglork's Avatar
WebProWorld Veteran
 

Join Date: Feb 2005
Location: Forchheim, Germany
Posts: 990
Faglork RepRank 0
Default

I don't know of any list, and the "conversion" is just a convention, sort of.

These are all, as far as the German language is concerned:

ä --> ae
ö --> oe
ü --> ue
Ä --> AE
Ö --> OE
Ü --> UE
ß --> ss

The last one ("sz-ligatur") does have no capital.

On the other hand, in the HTML code of your copy text and navigation links, you should use the correct HTML/XHTML entities, as listed e.g. in
http://www.cookwood.com/html/extras/entities.html


hth,
faglork
Reply With Quote