Using native2ascii tool to convert i18n files

instructions from


A misnamed utility for converting files encoded in various way to ASCII with awkward characters encoded with \uxxxx. It can also be used to convent such ASCII files back to various national encodings or UTF-8.

native2ascii.exe is included with the JDK in J:\Program Files\java\jdk1.6.0_24\bin. It converts files from any encoding to 8-bit printable form, and back. 8-bit printable using ASCII characters plus forms like \u95e8 for the exotic characters.

Here is how you would take a file in the old DOS IBMOEM encoding and bring it up to UTF-8 snuff for posting on the

REM convert IBM OEM fil to ASCII with \escapes
native2ascii -encoding Cp437 ibm.txt intermediate.txt

REM convert intermediate ASCII with \escapes to windows-1252
native2ascii -encoding windows-1252 -reverse intermediate.txt web.txt

Here is how you would take an UTF-8 file and convert it back to a native format, e.g. in this case windows-1252 for W2K.

REM convert UTF-8 to ascii with \escapes
native2ascii -encoding UTF-8 utf.txt intermediate.txt

REM convert intermediate ASCII with \escapes to CP1252
native2ascii -encoding Cp1252 -reverse intermediate.txt nt.txt

You can

REM export part of the registry
regedit /E java.reg "HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft"

REM convert unicode to readable ASCII with escapes.
native2ascii -encoding UnicodeLittle java.reg java.asc

You can convert between any two encodings, by going in two steps, via printable. Someone should probably write an improved version of this little utility that can convert from anything to anything in one step and that can will put the input back on top of the input file by default. You do this by creating a temporary output file in the same directory as the input file, then renaming when safely done.

You can also use native2ascii to compose arbitrary binary files. Compose using \uxxxx sequences and then convert them to binary.

native2ascii -encoding UnicodeBigUnmarked will give you a straightforward transform to binary. It will be raw 16-bit chars, not DataInputStream.readUTF() format. Other encodings will let you create a variety of exotic files.

If you have BOMs use x-UTF-16LE-BOM, X-UTF-32BE-BOM and X-UTF-32LE-BOM which have a definite BOM. The encodings where it is optional seem to flummox native2ascii.