There’re several systems to transliterate some characters. One of these systems is here:
/* -------------------------------------
/* Create Foreign Character Conversion JS
/* -------------------------------------*/
$foreign_characters = array('223' => "ss", // ß
'1072' => "a",
'1073' => "b",
'1074' => "v",
'1075' => "g",
'1076' => "d",
'1077' => "e",
'1105' => "yo",
'1078' => "zh",
'1079' => "z",
'1080' => "i",
'1081' => "j",
'1082' => "k",
'1083' => "l",
'1084' => "m",
'1085' => "n",
'1086' => "o",
'1087' => "p",
'1088' => "r",
'1089' => "s",
'1090' => "t",
'1091' => "u",
'1092' => "f",
'1093' => "h",
'1094' => "c",
'1095' => "ch",
'1096' => "sh",
'1097' => "sch",
'1099' => "y",
'1101' => "e",
'1102' => "yu",
'1103' => "ya",
'1040' => "a",
'1041' => "b",
'1042' => "v",
'1043' => "g",
'1044' => "d",
'1045' => "e",
'1025' => "yo",
'1046' => "zh",
'1047' => "z",
'1048' => "i",
'1049' => "j",
'1050' => "k",
'1051' => "l",
'1052' => "m",
'1053' => "n",
'1054' => "o",
'1055' => "p",
'1056' => "r",
'1057' => "s",
'1058' => "t",
'1059' => "u",
'1060' => "f",
'1061' => "h",
'1062' => "c",
'1063' => "ch",
'1064' => "sh",
'1065' => "sch",
'1067' => "y",
'1069' => "e",
'1070' => "yu",
'1071' => "ya",
But that is a hack that should be restored after each update.
So, the extension request is to create an extension that will allow users to use their favorite conversion system without restoring cp.publish.php each time.
Thanks.
The same thing here, Iam editing these files:
/core/core.regex.php; start line: 858 /cp/cp.publish.php; start line: 660 /modules/weblog/mod.weblog_standalone.php; start line: 307
'138' => "s", // Š
'142' => "z", // Ž
'154' => "s", // š
'158' => "z", // ž
'159' => "y", // Ÿ
'192' => "a", '193' => "a", '194' => "a", '195' => "a", '196' => "a", '197' => "a", '198' => "a", // À, Á, Â, Ã, Ä, Å, Æ
'200' => "e", '201' => "e", '202' => "e", '203' => "e", // È, É, Ê, Ë
'204' => "i", '205' => "i", '206' => "i", '207' => "i", // Ì, Í, Î, Ï
'210' => "o", '211' => "o", '212' => "o", '213' => "o", '214' => "o", // Ò, Ó, Ô, Õ, Ö
'217' => "u", '218' => "u", '219' => "u", '220' => "u", // Ù, Ú, Û, Ü
'221' => "y", // Ý
'223' => "s", // ß
'224' => "a", '225' => "a", '226' => "a", '229' => "a", // à, á, â, å
'227' => "a", '228' => "a", '230' => "a", // ã, ä, æ
'199' => "c", '231' => "c", // Ç, ç
'232' => "e", '233' => "e", '234' => "e", '235' => "e", // è, é, ê, ë
'236' => "i", '237' => "i", '238' => "i", '239' => "i", // ì, í, î, ï
'241' => "n", // ñ
'242' => "o", '243' => "o", '244' => "o", '245' => "o", '246' => "o", // ò, ó, ô, õ, ö
'249' => "u", '250' => "u", '251' => "u", '252' => "u", // ù, ú, û, ü
'253' => "y", '255' => "y", // ý, ÿ
'256' => "a", '257' => "a", // Ā, ā
'268' => "c", '269' => "c", // Č, č
'270' => "d", '271' => "d", // Ď, ď
'274' => "e", '275' => "e", // Ē, ē
'276' => "e", '277' => "e", // Ě, ě
'282' => "e", '283' => "e", // Ě, ě
'290' => "g", '291' => "g", // Ģ, ģ
'298' => "i", '299' => "i", // Ī, ī
'310' => "k", '311' => "k", // Ķ, ķ
'313' => "l", '314' => "l", // Ĺ, ĺ
'315' => "l", '316' => "l", // Ļ, ļ
'317' => "l", '318' => "l", // Ľ, ľ
'321' => "l", '322' => "l", // Ł, ł
'325' => "n", '326' => "n", // Ņ, ņ
'327' => "n", '328' => "n", // Ň, ň
'340' => "r", '341' => "r", // Ŕ, ŕ
'344' => "r", '345' => "r", // Ř, ř
'352' => "s", '353' => "s", // Š, š
'356' => "t", '357' => "t", // Ť, ť
'362' => "u", '363' => "u", // Ū, ū
'366' => "u", '367' => "u", // Ů, ů
'381' => "z", '382' => "z", // Ž, ž
You really should make this a feature request. The developers will probably listen, but they are monolingual, so they would not know which letters to transiterate to what.
We have support for most characters in the Latin-1 (ISO-8859-1) charset already. Since we get “ü -> ue” in urls alrady, there is no reason why other accented characters shouldn’t work.
Regarding this topic I would like to request about changing part of code in:
/core/core.regex.php; start line: 858 /cp/cp.publish.php; start line: 660 /modules/weblog/mod.weblog_standalone.php; start line: 307
which will be support multilingual converting URL Title.
'138' => "s", // Š
'142' => "z", // Ž
'154' => "s", // š
'158' => "z", // ž
'159' => "y", // Ÿ
'192' => "a", '193' => "a", '194' => "a", '195' => "a", '196' => "a", '197' => "a", '198' => "a", // À, Á, Â, Ã, Ä, Å, Æ
'200' => "e", '201' => "e", '202' => "e", '203' => "e", // È, É, Ê, Ë
'204' => "i", '205' => "i", '206' => "i", '207' => "i", // Ì, Í, Î, Ï
'210' => "o", '211' => "o", '212' => "o", '213' => "o", '214' => "o", // Ò, Ó, Ô, Õ, Ö
'217' => "u", '218' => "u", '219' => "u", '220' => "u", // Ù, Ú, Û, Ü
'221' => "y", // Ý
'223' => "ss", // ß
'224' => "a", '225' => "a", '226' => "a", '229' => "a", // à, á, â, å
'227' => "a", '228' => "a", '230' => "a", // ã, ä, æ
'199' => "c", '231' => "c", // Ç, ç
'232' => "e", '233' => "e", '234' => "e", '235' => "e", // è, é, ê, ë
'236' => "i", '237' => "i", '238' => "i", '239' => "i", // ì, í, î, ï
'241' => "n", // ñ
'242' => "o", '243' => "o", '244' => "o", '245' => "o", '246' => "o", // ò, ó, ô, õ, ö
'249' => "u", '250' => "u", '251' => "u", '252' => "u", // ù, ú, û, ü
'253' => "y", '255' => "y", // ý, ÿ
'256' => "a", '257' => "a", // Ā, ā
'268' => "c", '269' => "c", // Č, č
'270' => "d", '271' => "d", // Ď, ď
'274' => "e", '275' => "e", // Ē, ē
'276' => "e", '277' => "e", // Ě, ě
'282' => "e", '283' => "e", // Ě, ě
'290' => "g", '291' => "g", // Ģ, ģ
'298' => "i", '299' => "i", // Ī, ī
'310' => "k", '311' => "k", // Ķ, ķ
'313' => "l", '314' => "l", // Ĺ, ĺ
'315' => "l", '316' => "l", // Ļ, ļ
'317' => "l", '318' => "l", // Ľ, ľ
'321' => "l", '322' => "l", // Ł, ł
'325' => "n", '326' => "n", // Ņ, ņ
'327' => "n", '328' => "n", // Ň, ň
'340' => "r", '341' => "r", // Ŕ, ŕ
'344' => "r", '345' => "r", // Ř, ř
'352' => "s", '353' => "s", // Š, š
'356' => "t", '357' => "t", // Ť, ť
'362' => "u", '363' => "u", // Ū, ū
'366' => "u", '367' => "u", // Ů, ů
'381' => "z", '382' => "z", // Ž, ž
No need. From the 1.6 changelog:
“Added foreign_character_conversion_array extension hook to allow developers to use a custom foreign character conversion array for URL titles.”
I’ve made extension based on “foreign_character_conversion_array” hook.
Title: Strange URL Interpreter Purpose: Gives ability to convert foreign-language characters in entry’s URL Title to proper characters.
Version 1.1.0 support more than Slovakian, Czech, Hungarian and Russian alphabet. If there are any unsupported, or wrong converted characters in your native language, let me know.
Version: 1.1.0 (25/06/2007) Version: 1.2.0 (07/07/2007) - Added: Lira, Degree, Yen, Pound and Cent sign - Fixed: Update and Disable function Version: 1.3.0 (22/09/2007) - Added: Character set supporting Poland language (Thanks to Gabriel Borkowski).
Fast work, Gabriel! Say, if some others can check the accuracy of this conversion array, and if you name it something more specific than “Foreign URL Title”, we can probably add this to the repository. Though me being rather ignorant of the alphabets of non-latin languages, I do not have anything good to suggest that would cover those four alphabets.
Ingmar: Is not possible made any universal conversion, but why “ä” should be “ae” and not “a”? I think, URL title was developed as power tool for Search engines (SEO) and not for people. Who reading titles in URL, especially entries title???
For example: I’ve got slovak word “mäso” (meat). When I put to Google word “maso” I return searching resutls related to word “mäso” and “maso”. Nobody will be searching “maeso” because this word doesn’t exist! It mean, conversion “ä” >> “ae” simply and powerful harms Pagerank of your website.
Iam sure, you will find equal example in your language. If not, let me know.
Gabriel, I think Ingmar is right regarding german umlauts ä, ö, ü - it is the common transliteration, even search engines are honoring this form. Following your thoughts: Nobody will search for “nurnberg”, but someone who doesn’t have a german keyboard might search for “Nuernberg” instead of “Nürnberg”. The built-in url transliteration in EE replaces ü with ue as well.
Taking our example in consideration, that shows that there perhaps a editable translation table would be favourable - depending on the language one might want another transliteration for slovak and german.
EDIT: Too late … I see you had the same idea a couple of minutes before my post 😉
Packet Tide owns and develops ExpressionEngine. © Packet Tide, All Rights Reserved.