Friday, January 16, 2015

using Oracle REPLACE() function to generate Java unicode escape codes

On my current project the UI is a corporate web app. The user base will be global and the supported languages at this point will be English and some Western European languages. The localization setup is a standard Java messages.properties, with per-language messages_fr.properties for French for example.

The developers tokenize the UI screens and put the English content into messages.properties. an external team delivers the translated tokens, which development then puts into the messages_X.properties to complete the process.

With the Western European languages there are sometimes non-ASCII characters with accents or whatnot from the translators. In the .properties files I wanted to convert these accented and non-ASCII characters to \u Java escape codes.

I have access to Oracle and SQL*Developer so I decided to get the database to do the work. A nice thing about SQL*Developer is that it just works to copy in accented characters to the SQL window so that makes it easy. I got Oracle to do the work by using the Oracle REPLACE function which did what I wanted. Now replace() is handy because the function can be nested with itself to call it repeatedly using replace() itself as the input. This makes it really easy to add new characters and build it up as they appear in the translation content. So here's the example code.


select
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
'Complété avec succès',
'é', '\u00E9'),
'Ú', '\u00DA'),
'í', '\u00ED'),
'ä', '\u00E4'),
'ü', '\u00FC'),
'ß', '\u00DF'),
'á', '\u00E1'),
'¿', '\u00BF'),
'È', '\u00C8'),
'ì', '\u00EC'),
'ó', '\u00F3'),
'´', '\u00B4'),
'ú', '\u00FA'),
'ë', '\u00EB'),
'à', '\u00E0'),
'è', '\u00E8'),
'ç', '\u00E7'),
'ê', '\u00EA')
from dual;


the output for the above will be

Compl\u00E9t\u00E9 avec succ\u00E8s

In Eclipse there is a handy feature you can check by hovering the cursor over the \u escaped text in a .properties file and Eclipse will show it in its displayed form with the accented characters.

I don't doubt there's lots of ways to do this. Which makes it a bit of an interesting or fun problem in a way. This is what I did which worked well for me.