We use cookies to improve your experience. No personal information is gathered and we don't serve ads. Cookies Policy.

ExpressionEngine Logo ExpressionEngine
Features Pricing Support Find A Developer
Partners Upgrades
Blog Add-Ons Learn
Docs Forums University
Log In or Sign Up
Log In Sign Up
ExpressionEngine Logo
Features Pro new Support Find A Developer
Partners Upgrades
Blog Add-Ons Learn
Docs Forums University Blog
  • Home
  • Forums

URL Title Foreign Character Conversion

Development and Programming

sigork's avatar
sigork
155 posts
18 years ago
sigork's avatar sigork

There’re several systems to transliterate some characters. One of these systems is here:

/* -------------------------------------
        /*  Create Foreign Character Conversion JS
        /* -------------------------------------*/
       
        $foreign_characters = array('223'    =>    "ss", // ß
'1072'    =>    "a",
'1073'    =>    "b",
'1074'    =>    "v",
'1075'    =>    "g",
'1076'    =>    "d",
'1077'    =>    "e",
'1105'    =>    "yo",
'1078'    =>    "zh",
'1079'    =>    "z",
'1080'    =>    "i",
'1081'    =>    "j",
'1082'    =>    "k",
'1083'    =>    "l",
'1084'    =>    "m",
'1085'    =>    "n",
'1086'    =>    "o",
'1087'    =>    "p",
'1088'    =>    "r",
'1089'    =>    "s",
'1090'    =>    "t",
'1091'    =>    "u",
'1092'    =>    "f",
'1093'    =>    "h",
'1094'    =>    "c",
'1095'    =>    "ch",
'1096'    =>    "sh",
'1097'    =>    "sch",
'1099'    =>    "y",
'1101'    =>    "e",
'1102'    =>    "yu",
'1103'    =>    "ya",
'1040'    =>    "a",
'1041'    =>    "b",
'1042'    =>    "v",
'1043'    =>    "g",
'1044'    =>    "d",
'1045'    =>    "e",
'1025'    =>    "yo",
'1046'    =>    "zh",
'1047'    =>    "z",
'1048'    =>    "i",
'1049'    =>    "j",
'1050'    =>    "k",
'1051'    =>    "l",
'1052'    =>    "m",
'1053'    =>    "n",
'1054'    =>    "o",
'1055'    =>    "p",
'1056'    =>    "r",
'1057'    =>    "s",
'1058'    =>    "t",
'1059'    =>    "u",
'1060'    =>    "f",
'1061'    =>    "h",
'1062'    =>    "c",
'1063'    =>    "ch",
'1064'    =>    "sh",
'1065'    =>    "sch",
'1067'    =>    "y",
'1069'    =>    "e",
'1070'    =>    "yu",
'1071'    =>    "ya",

But that is a hack that should be restored after each update.

So, the extension request is to create an extension that will allow users to use their favorite conversion system without restoring cp.publish.php each time.

Thanks.

       
Caleydon's avatar
Caleydon
17 posts
17 years ago
Caleydon's avatar Caleydon

The same thing here, Iam editing these files:

/core/core.regex.php; start line: 858 /cp/cp.publish.php; start line: 660 /modules/weblog/mod.weblog_standalone.php; start line: 307

'138'    =>    "s",    // Š
'142'    =>    "z",    // Ž
'154'    =>    "s",    // š
'158'    =>    "z",    // ž
'159'    =>    "y",    // Ÿ
'192'    => "a", '193'    => "a", '194'    => "a", '195'    => "a", '196'    => "a", '197'    => "a", '198'    => "a",    // À, Á, Â, Ã, Ä, Å, Æ
'200'    =>    "e", '201'    =>    "e", '202'    =>    "e", '203'    =>    "e",    // È, É, Ê, Ë 
'204'    =>    "i", '205'    =>    "i", '206'    =>    "i", '207'    =>    "i",    // Ì, Í, Î, Ï 
'210'    => "o", '211'    => "o", '212'    => "o", '213'    => "o", '214'    => "o",    // Ò, Ó, Ô, Õ, Ö
'217'    => "u", '218'    => "u", '219'    => "u", '220'    => "u",    // Ù, Ú, Û, Ü
'221'    => "y",    // Ý
'223'    =>    "s", // ß
'224'    =>    "a", '225'    => "a", '226'    => "a", '229'    => "a",    // à, á, â, å
'227'    =>    "a", '228'    => "a", '230'    =>    "a",    // ã, ä, æ
'199'    =>    "c", '231'    =>    "c",    // Ç, ç
'232'    =>    "e", '233'    =>    "e", '234'    =>    "e", '235'    =>    "e",    // è, é, ê, ë 
'236'    => "i", '237'    => "i", '238'    => "i", '239'    => "i",    // ì, í, î, ï
'241'    =>    "n",    // ñ
'242'    => "o", '243'    => "o", '244'    => "o", '245'    => "o", '246'    =>    "o",    // ò, ó, ô, õ, ö
'249'    => "u", '250'     => "u", '251'    => "u", '252'    =>    "u",    // ù, ú, û, ü
'253'    =>    "y", '255'    =>    "y",    // ý, ÿ
'256'    =>    "a", '257'    =>    "a",     // Ā, ā
'268'    =>    "c", '269'    =>    "c",     // Č, č
'270'    =>    "d", '271'    =>    "d",     // Ď, ď
'274'    =>    "e", '275'    =>    "e",     // Ē, ē
'276'    =>    "e", '277'    =>    "e",     // Ě, ě
'282'    =>    "e", '283'    =>    "e",     // Ě, ě
'290'    =>    "g", '291'    =>    "g",     // Ģ, ģ
'298'    =>    "i", '299'    =>    "i",     // Ī, ī
'310'    =>    "k", '311'    =>    "k",     // Ķ, ķ
'313'    =>    "l", '314'        =>    "l",     // Ĺ, ĺ
'315'    =>    "l", '316'        =>    "l",     // Ļ, ļ
'317'    =>    "l", '318'        =>    "l",     // Ľ, ľ
'321'    =>    "l", '322'        =>    "l",     // Ł, ł
'325'    =>    "n", '326'    =>    "n",    // Ņ, ņ
'327'    =>    "n", '328'    =>    "n",    // Ň, ň
'340'    =>    "r", '341'    =>    "r",    // Ŕ, ŕ
'344'    =>    "r", '345'    =>    "r",    // Ř, ř
'352'    =>    "s", '353'    =>    "s",    // Š, š
'356'    =>    "t", '357'    =>    "t",    // Ť, ť
'362'    =>    "u", '363'    =>    "u",    // Ū, ū
'366'    =>    "u", '367'    =>    "u",    // Ů, ů
'381'    =>    "z", '382'    =>    "z",    // Ž, ž
       
Ingmar Greil's avatar
Ingmar Greil
29,243 posts
17 years ago
Ingmar Greil's avatar Ingmar Greil

You really should make this a feature request. The developers will probably listen, but they are monolingual, so they would not know which letters to transiterate to what.

We have support for most characters in the Latin-1 (ISO-8859-1) charset already. Since we get “ü -> ue” in urls alrady, there is no reason why other accented characters shouldn’t work.

       
Caleydon's avatar
Caleydon
17 posts
17 years ago
Caleydon's avatar Caleydon

Thanks Ingmar, I’ll make feature request!

       
Caleydon's avatar
Caleydon
17 posts
17 years ago
Caleydon's avatar Caleydon

Regarding this topic I would like to request about changing part of code in:

/core/core.regex.php; start line: 858 /cp/cp.publish.php; start line: 660 /modules/weblog/mod.weblog_standalone.php; start line: 307

which will be support multilingual converting URL Title.

'138'    =>    "s",    // Š
'142'    =>    "z",    // Ž
'154'    =>    "s",    // š
'158'    =>    "z",    // ž
'159'    =>    "y",    // Ÿ
'192'    => "a", '193'    => "a", '194'    => "a", '195'    => "a", '196'    => "a", '197'    => "a", '198'    => "a",    // À, Á, Â, Ã, Ä, Å, Æ
'200'    =>    "e", '201'    =>    "e", '202'    =>    "e", '203'    =>    "e",    // È, É, Ê, Ë 
'204'    =>    "i", '205'    =>    "i", '206'    =>    "i", '207'    =>    "i",    // Ì, Í, Î, Ï 
'210'    => "o", '211'    => "o", '212'    => "o", '213'    => "o", '214'    => "o",    // Ò, Ó, Ô, Õ, Ö
'217'    => "u", '218'    => "u", '219'    => "u", '220'    => "u",    // Ù, Ú, Û, Ü
'221'    => "y",    // Ý
'223'    =>    "ss", // ß
'224'    =>    "a", '225'    => "a", '226'    => "a", '229'    => "a",    // à, á, â, å
'227'    =>    "a", '228'    => "a", '230'    =>    "a",    // ã, ä, æ
'199'    =>    "c", '231'    =>    "c",    // Ç, ç
'232'    =>    "e", '233'    =>    "e", '234'    =>    "e", '235'    =>    "e",    // è, é, ê, ë 
'236'    => "i", '237'    => "i", '238'    => "i", '239'    => "i",    // ì, í, î, ï
'241'    =>    "n",    // ñ
'242'    => "o", '243'    => "o", '244'    => "o", '245'    => "o", '246'    =>    "o",    // ò, ó, ô, õ, ö
'249'    => "u", '250'     => "u", '251'    => "u", '252'    =>    "u",    // ù, ú, û, ü
'253'    =>    "y", '255'    =>    "y",    // ý, ÿ
'256'    =>    "a", '257'    =>    "a",     // Ā, ā
'268'    =>    "c", '269'    =>    "c",     // Č, č
'270'    =>    "d", '271'    =>    "d",     // Ď, ď
'274'    =>    "e", '275'    =>    "e",     // Ē, ē
'276'    =>    "e", '277'    =>    "e",     // Ě, ě
'282'    =>    "e", '283'    =>    "e",     // Ě, ě
'290'    =>    "g", '291'    =>    "g",     // Ģ, ģ
'298'    =>    "i", '299'    =>    "i",     // Ī, ī
'310'    =>    "k", '311'    =>    "k",     // Ķ, ķ
'313'    =>    "l", '314'        =>    "l",     // Ĺ, ĺ
'315'    =>    "l", '316'        =>    "l",     // Ļ, ļ
'317'    =>    "l", '318'        =>    "l",     // Ľ, ľ
'321'    =>    "l", '322'        =>    "l",     // Ł, ł
'325'    =>    "n", '326'    =>    "n",    // Ņ, ņ
'327'    =>    "n", '328'    =>    "n",    // Ň, ň
'340'    =>    "r", '341'    =>    "r",    // Ŕ, ŕ
'344'    =>    "r", '345'    =>    "r",    // Ř, ř
'352'    =>    "s", '353'    =>    "s",    // Š, š
'356'    =>    "t", '357'    =>    "t",    // Ť, ť
'362'    =>    "u", '363'    =>    "u",    // Ū, ū
'366'    =>    "u", '367'    =>    "u",    // Ů, ů
'381'    =>    "z", '382'    =>    "z",    // Ž, ž
       
Derek Jones's avatar
Derek Jones
7,561 posts
17 years ago
Derek Jones's avatar Derek Jones

No need. From the 1.6 changelog:

“Added foreign_character_conversion_array extension hook to allow developers to use a custom foreign character conversion array for URL titles.”

       
Gabriel's avatar
Gabriel
130 posts
17 years ago
Gabriel's avatar Gabriel

I’ve made extension based on “foreign_character_conversion_array” hook.

Title: Strange URL Interpreter Purpose: Gives ability to convert foreign-language characters in entry’s URL Title to proper characters.

Version 1.1.0 support more than Slovakian, Czech, Hungarian and Russian alphabet. If there are any unsupported, or wrong converted characters in your native language, let me know.

Version: 1.1.0 (25/06/2007) Version: 1.2.0 (07/07/2007) - Added: Lira, Degree, Yen, Pound and Cent sign - Fixed: Update and Disable function Version: 1.3.0 (22/09/2007) - Added: Character set supporting Poland language (Thanks to Gabriel Borkowski).

EDIT: The current version of the extension is here

       
Derek Jones's avatar
Derek Jones
7,561 posts
17 years ago
Derek Jones's avatar Derek Jones

Fast work, Gabriel! Say, if some others can check the accuracy of this conversion array, and if you name it something more specific than “Foreign URL Title”, we can probably add this to the repository. Though me being rather ignorant of the alphabets of non-latin languages, I do not have anything good to suggest that would cover those four alphabets.

       
Ingmar Greil's avatar
Ingmar Greil
29,243 posts
17 years ago
Ingmar Greil's avatar Ingmar Greil

Nice work, of course I totally missed that hook in the changelog.

As a German native speaker, I’d like to add that German is fully supported as well (as has, in fact, been by EE natively for some time), also French, as far as I can tell.

       
Gabriel's avatar
Gabriel
130 posts
17 years ago
Gabriel's avatar Gabriel

I prepared Conversion Table for Strange URL Interpreter v1.3.0

       
Derek Jones's avatar
Derek Jones
7,561 posts
17 years ago
Derek Jones's avatar Derek Jones

lol, that’s not quite what I meant for changing the extension name. Perhaps some others with knowledge of these alphabets will have a better idea.

       
Ingmar Greil's avatar
Ingmar Greil
29,243 posts
17 years ago
Ingmar Greil's avatar Ingmar Greil

Also, I don’t quite agree with some of the transliterations. I think “ä” should be “ae”, not “a”. Same for ö (oe), ü (ue) and probably æ (ae).

       
Gabriel's avatar
Gabriel
130 posts
17 years ago
Gabriel's avatar Gabriel

Ingmar: Is not possible made any universal conversion, but why “ä” should be “ae” and not “a”? I think, URL title was developed as power tool for Search engines (SEO) and not for people. Who reading titles in URL, especially entries title???

For example: I’ve got slovak word “mäso” (meat). When I put to Google word “maso” I return searching resutls related to word “mäso” and “maso”. Nobody will be searching “maeso” because this word doesn’t exist! It mean, conversion “ä” >> “ae” simply and powerful harms Pagerank of your website.

Iam sure, you will find equal example in your language. If not, let me know.

       
Gabriel's avatar
Gabriel
130 posts
17 years ago
Gabriel's avatar Gabriel

I’ve got idea. Maybe will be useful create various conversion sets in dependence on different languages. In the settings of the extension you’ll have option choose conversion set, as you want.

       
ms's avatar
ms
274 posts
17 years ago
ms's avatar ms

Gabriel, I think Ingmar is right regarding german umlauts ä, ö, ü - it is the common transliteration, even search engines are honoring this form. Following your thoughts: Nobody will search for “nurnberg”, but someone who doesn’t have a german keyboard might search for “Nuernberg” instead of “Nürnberg”. The built-in url transliteration in EE replaces ü with ue as well.

Taking our example in consideration, that shows that there perhaps a editable translation table would be favourable - depending on the language one might want another transliteration for slovak and german.

EDIT: Too late … I see you had the same idea a couple of minutes before my post 😉

       
1 2

Reply

Sign In To Reply

ExpressionEngine Home Features Pro Contact Version Support
Learn Docs University Forums
Resources Support Add-Ons Partners Blog
Privacy Terms Trademark Use License

Packet Tide owns and develops ExpressionEngine. © Packet Tide, All Rights Reserved.