Unicode Issue - Random �� characters

We have had an issue throughout all versions of Openfire with unicode characters (such as Chinese) getting corrupted in the messages, group names and JID’s.

For instance the JID:

坏脾气是我@ourserver.com

would sometimes (but not always) change to something like this:

坏��气是我@ourserver.com

and generate errors like this:

2010.02.02 04:10:32 [org.jivesoftware.openfire.handler.IQRosterHandler.handleIQ(IQRosterHandler.jav a:128)] Internal server error
java.lang.IllegalArgumentException: Illegal JID: 坏��气是我@ourserver.com

Obviously this results in ongoing issues and lots of other random error messages.

Also if we send a message with chinese characters in a message it will often get these �� inserted randomly in the message, and will be losing some characters as a result.

It isn’t always �� characters - sometimes its other funny characters.

I have tried from various different clients and still have the same issues so I’m pretty sure the issue is in Openfire.


I assume this issue is related to MINA (but it may not be) and have seen other discussions about unicode issues but I am not sure if any of them are specifically related to this one. We have tried updating the MINA library to v1.1.7 and it actually seems to help a bit, but we still have the same issues.

Our service is multi-language and it is important that we fully support these character sets.

Thanks for your help with this,

Daniel

Hi,

it looks like you are hitting OF-92.

LG

Thanks, I will try out the issue patch and see if it resolves this issue.

Hi,

I couldn’t reproduce this problem but I did identify a problem in the current code. Your dump in OF-92 contains 226 bytes while the UTF8Buffer was intended to contain 2-4 bytes. So I did upload a new version with these three fixes.

LG

...
/* if needed: complete previous incomplete UTF8 char */
if (missingUTF8bytes > 0)
{
    // FIX 2010.04.04 missingUTF8bytes_tmp as missingUTF8bytes is modified in loop (missingUTF8bytes--;)
    int missingUTF8bytes_tmp = missingUTF8bytes;
    for (int i = 0; i <= missingUTF8bytes_tmp; i++)
    {
        if (len == i)
        {
            return; /* not enough data to complete char */
        }
        /* fill the buffer */
        UTF8Buffer.put(byteBuffer.get(i));
        incompleteUTF8bytes++;
        missingUTF8bytes--;
        newbyteBufferPosition++;
        // FIX 2010.04.04 break loop after filling buffer completely
        if ( missingUTF8bytes == 0)
        {
            break;
        }
    }
    /* read the buffer */
    UTF8Buffer.flip();
    // FIX 2010.04.04 read the whole UTF8Buffer (should make no diffence)
    // -- buffer.append(UTF8Buffer.getString(incompleteUTF8bytes, decoder));
    buffer.append(UTF8Buffer.getString(decoder));
    UTF8Buffer = null;
    ...

This is still a major issue for us so I will try implementing your updated code asap.

If you send a few paragraphs of Chinese characters through Jabber just in a message you will see this problem occuring. You can just copy a bunch of characters from any website. You may need to send it a couple of time before it happens. You will see the square or some other pair of incorrect characters come through on the receiving client every now and then.

And as you can see in my dump it also effects JID nodes and it also effects group names. Interestingly it doesn’t seem to be effecting other things like Nicknames - they never get corrupted? Obviously the JID node corruption causes a bunch of secondary errors in Openfire.

I’ll let you know how it goes after implementing your update. Thanks again.

Daniel

Hi Daniel,

I did upload another improvment (no fix).

If this does not improve things then one needs to review the Openfire code. Unless every connection uses it own parser one will get these errors as I use the parser to store the last UTF-8 chars. My tests are single-threaded so they use always the same parser.

LG

I couldn’t post the code to the issue OF-92 so have attached it here.

Please let me know how you go. This has entirely solved the issue in my test environment.
XMLLightweightParser.java.zip (4150 Bytes)

1 Like