Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Marc Stöcker 96 posts 532 karma points c-trib
    Aug 08, 2011 @ 09:54
    Marc Stöcker
    0

    Transliterating cyrillic URLs with umbracoSettings.config

    While modern systems (on server and client side) work quite well with non-latin (UTF-8) URLs, they still present some technical risk. Transliteration of cyrillic URLs to their latin representation can be accomplished quite easily with the standard umbraco URL-Replacing in "umbracoSettings.config" (in directory "/config/").

    It is not possible to account for some of the finer details, but using a simple replacement table will handle 95%+ of most document URLs. Here's the list to insert below the already existing "char"-Items inside the urlReplacing-Tag:

    <!-- GOST 7.79 transliteration for cyrillic characters -->
    <char org="">A</char>
    <char org=""></char>
    <char org="">B</char>
    <char org="">b</char>
    <char org="B">V</char>
    <char org="">v</char>
    <char org="">G</char>
    <char org="">g</char>
    <char org="">G`</char>
    <char org="">G`</char>
    <char org="">g`</char>
    <char org="">g`</char>
    <char org="">D</char>
    <char org="">d</char>
    <char org="">E</char>
    <char org="">e</char>
    <char org="">Yo</char>
    <char org="">yo</char>
    <char org="">Ye</char>
    <char org="">ye</char>
    <char org="">Zh</char>
    <char org="">zh</char>
    <char org="">Z</char>
    <char org="">z</char>
    <char org="">I</char>
    <char org="">i</char>
    <char org="">J</char>
    <char org="">j</char>
    <char org="I">I</char>
    <char org="">i</char>
    <char org="">Yi</char>
    <char org="">yi</char>
    <char org="">K</char>
    <char org="">k</char>
    <char org="">K`</char>
    <char org="">k`</char>
    <char org="">L</char>
    <char org="">l</char>
    <char org="">L`</char>
    <char org="">l`</char>
    <char org="">M</char>
    <char org="">m</char>
    <char org="">N</char>
    <char org="">n</char>
    <char org="">N`</char>
    <char org="">n`</char>
    <char org="">O</char>
    <char org=""></char>
    <char org="">P</char>
    <char org="">p</char>
    <char org="">R</char>
    <char org="">r</char>
    <char org="">S</char>
    <char org="">s</char>
    <char org="">T</char>
    <char org="">t</char>
    <char org="">U</char>
    <char org="">u</char>
    <char org="">U`</char>
    <char org="">u`</char>
    <char org="">F</char>
    <char org="">f</char>
    <char org="">X</char>
    <char org="">x</char>
    <char org="">Cz</char>
    <char org="">cz</char>
    <char org="">Ch</char>
    <char org="">ch</char>
    <char org="">Dh</char>
    <char org="">dh</char>
    <char org="">Sh</char>
    <char org="">sh</char>
    <char org="">Shh</char>
    <char org="">sht</char>
    <char org="">A`</char>
    <char org="">``</char>
    <char org="">Y`</char>
    <char org="">y`</char>
    <char org="">`</char>
    <char org="">`</char>
    <char org="">E`</char>
    <char org="">e`</char>
    <char org="">Yu</char>
    <char org="">yu</char>
    <char org="">Ya</char>
    <char org=""></char>
    <char org="’">'</char>
    <char org="">Ye</char>
    <char org=""></char>
    <char org="">Fh</char>
    <char org="">fh</char>
    <char org="">Yh</char>
    <char org="">yh</char>
    <char org="">O`</char>
    <char org="">`</char>
    <char org="">(No)</char>

    Maybe this is of use for someone. I also hope, the forum editor will handle this characters well ... ;)

    Remember to reload the umbraco config (i.e. by touching the web.config in the root) after applying!

  • Marc Stöcker 96 posts 532 karma points c-trib
    Aug 08, 2011 @ 10:00
    Marc Stöcker
    0

    Editor waving the flag, edit post not working ("XSLT error"), so here's a link: http://hello.mindrevolution.com/umbraco/cyrillic-url-transliteration.txt

  • Marc Stöcker 96 posts 532 karma points c-trib
    Aug 08, 2011 @ 10:16
    Marc Stöcker
    0

    Feedback welcome, specially on the apostrophes. Should they be kept or removed?

  • Alexander Bryukhov 19 posts 68 karma points c-trib
    Dec 12, 2011 @ 15:36
    Alexander Bryukhov
    0

    Hello Marc,

    let me try to post my transliteration piece of umbracoConfig here as a picture, hope the ugly code editor will keep it as is... :-D

    As you can see, there are no apostrophes in ascii parts of the pairs at all... Your file also contains some of the chars that we still not added to the config ("i" with double dots above etc.), these symbols maybe all from cyrillic part of unicode set, but not part of the russian alphabet, so we not needed them for the russian sites we made for the moment :-) If these specific chars can possibly appear in the URLs of your projects, you can keep them all with your current transliteration or without apostrophes...

    Hope it will help...

    WBR

  • Marc Stöcker 96 posts 532 karma points c-trib
    Dec 12, 2011 @ 15:47
    Marc Stöcker
    0

    Hi Alexander,

    thank you for the feedback. I will remove the apostrophes, never liked such URLs anyways. The non-russian, but nonetheless cyrillic letters I want to keep for a broader reusability.

    I guess you had the same trouble with the editor I had with my first post ...  ;-)

    Could you upload your translit-config and provide a link? That would be great. Thanks!

    Marc.

  • Alexander Bryukhov 19 posts 68 karma points c-trib
    Dec 12, 2011 @ 16:05
    Alexander Bryukhov
    0

    Marc,

    I made one change to my post, I hope now the evil code editor will not be able to dig in the image :-)

  • Marc Stöcker 96 posts 532 karma points c-trib
    Dec 12, 2011 @ 16:28
    Marc Stöcker
    0

    Perfect. Thank you, Alexander!

  • Marc Stöcker 96 posts 532 karma points c-trib
    May 16, 2014 @ 10:30
    Marc Stöcker
    0

    Due to some requests, here's the current version we use with several multi-lingual sites: https://gist.github.com/esn303/f181a66bb701be1bd304

  • Vitali 1 post 71 karma points
    5 days ago
    Vitali
    0

    Hi guys,

    I made the same as described by Marc, but without success. My version of umbraco is 7.10.3. So I added chars to umbracoSettings.config, then opened web.config and saved it. Could someone explain to me what I'm doing wrong?

    Thanks a lot!

Please Sign in or register to post replies

Write your reply to:

Draft