XMPPDecoder has a decode problem for UTF-8

hanguokai · May 6, 2011, 4:08am

A utf8 character ( as char in Java) is usually composed of 1-3 bytes ( max is 6 bytes), see http://en.wikipedia.org/wiki/UTF-8 .

From now on, I assume a character that is 3 bytes.

Openfire use mina nio process network stream, and implement a XMPPDecoder for docode bytes to String/Stanza.

When decode a bytebuffer, it’s may be incomplete bytes for a character. eg. In bytebuffer’s last few bytes, you may receive one or two or three

bytes for a character, if there’s 1 or 2 bytes then it’s incomplete. It’s Random happen incomplete state. If input long 3bytes character, the random probability significantly increased.

let’s see org.jivesoftware.openfire.nio.XMLLightweightParser ( openfire 3.6.4 ):

Charset encoder = Charset.forName(charset);

CharBuffer charBuffer = encoder.decode(byteBuffer.buf());

char[] buf = charBuffer.array();

int readByte = charBuffer.remaining();

char lastChar = buf[readByte-1];

if (lastChar >= 0xfff0) { // you think it’s incomplete, then position-1, readByte-1

byteBuffer.position(byteBuffer.position()-1); //error

readByte–; //error

}

The above code is not properly handled the case that is incomplete for UTF-8.

If a character is 3 bytes, there is incomplete for one or two bytes at the end of bytebuffer.

If one byte incomplete, bb’s position should -1. If two bytes position incomplete, bb’s position should -2.

So, if position-1 and two bytes position incomplete, this 3 bytes become the last two bytes for decode, and then be replace to two “FD”.

Or so, if position-2 and one bytes position incomplete, this 3 bytes become the 4 bytes for decode, and then there’s one more “FD” and this character.

See also java.nio.charset.CharsetDecoder , it’s two decode methods.

Notice :

decode(ByteBuffer in, CharBuffer out, boolean endOfInput) ,the last param tell decoder bb’s imcomplete or complete.

CodingErrorAction has three instances : IGNORE/REPLACE/REPORT for decoder.

Charset.encode(bb) means Charset.newEncoder() .onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE) .encode(bb);

hanguokai · May 6, 2011, 5:12am

My test case for java.nio.charset.CharsetDecoder.decode(ByteBuffer, CharBuffer, boolean):

public static void main(String[] args) throws Exception {

CharsetDecoder replaceDecoder = Charset.forName(“UTF-8”).newDecoder()

.onMalformedInput(CodingErrorAction.REPLACE)

.onUnmappableCharacter(CodingErrorAction.REPLACE);

CharsetDecoder ignoreDecoder = Charset.forName(“UTF-8”).newDecoder()

.onMalformedInput(CodingErrorAction.IGNORE)

.onUnmappableCharacter(CodingErrorAction.IGNORE);

String input = “你好”;

byte[] fullBytes = input.getBytes(“UTF-8”);

System.out.println("input : " + input);

System.out.print("input bytes in utf8 : ");

print(fullBytes);

System.out.println();

System.out.println("=========================================");

// decodeAndPrint(decoder, fullBytes);

byte[] bytes0_4 = Arrays.copyOfRange(fullBytes, 0, 4);

decodeAndPrint(replaceDecoder, bytes0_4, true);

decodeAndPrint(replaceDecoder, bytes0_4, false);

decodeAndPrint(ignoreDecoder, bytes0_4, true);

decodeAndPrint(ignoreDecoder, bytes0_4, false);

byte[] bytes0_5 = Arrays.copyOfRange(fullBytes, 0, 5);

decodeAndPrint(replaceDecoder, bytes0_5, true);

decodeAndPrint(replaceDecoder, bytes0_5, false);

decodeAndPrint(ignoreDecoder, bytes0_5, true);

decodeAndPrint(ignoreDecoder, bytes0_5, false);

}

private static void print(byte[] bytes) {

for (byte b : bytes) {

System.out.print(Integer.toString(b & 0xFF, 16) + " ");

// System.out.print(b+" ");

}

private static void decodeAndPrint(CharsetDecoder decoder, byte[] bytes,

boolean complete) {

decoder.reset();

CharBuffer charBuffer = CharBuffer.allocate(bytes.length);

ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);

CoderResult coderResult = decoder.decode(byteBuffer, charBuffer,

complete);

System.out.println("CoderResult: " + coderResult);

System.out.println(“input " + bytes.length + " bytes”);

System.out.println(“decode " + byteBuffer.position() + " bytes”);

System.out.println(“decode " + charBuffer.position() + " characters”);

System.out.println(Arrays.copyOf(charBuffer.array(), charBuffer

.position()));

System.out.println("-----------------------------------------------");

}

Console show the results:

input : 你好

input bytes in utf8 : e4 bd a0 e5 a5 bd

=========================================

CoderResult: UNDERFLOW

input 4 bytes

decode 4 bytes

decode 2 characters

你?

CoderResult: UNDERFLOW

input 4 bytes

decode 3 bytes

decode 1 characters

你

CoderResult: UNDERFLOW

input 4 bytes

decode 4 bytes

decode 1 characters

你

CoderResult: UNDERFLOW

input 4 bytes

decode 3 bytes

decode 1 characters

你

CoderResult: UNDERFLOW

input 5 bytes

decode 5 bytes

decode 2 characters

你?

CoderResult: UNDERFLOW

input 5 bytes

decode 3 bytes

decode 1 characters

你

CoderResult: UNDERFLOW

input 5 bytes

decode 5 bytes

decode 1 characters

你

CoderResult: UNDERFLOW

input 5 bytes

decode 3 bytes

decode 1 characters

你

MyTest.java.zip (861 Bytes)

hanguokai · May 6, 2011, 5:29am

This problem causes some messy code sometimes at Openfire 3.6.4 or earlier.

Because of using strict mode at Openfire 3.7.0 :

encoder = Charset.forName(charset).newDecoder()

.onMalformedInput(CodingErrorAction.REPORT)

.onUnmappableCharacter(CodingErrorAction.REPORT);

encoder.decode(byteBuffer.buf());

This will throw exception when imcomplete state. And the after code

if (lastChar >= 0xfff0)

will don’t arrive.

So there is random disconnect ！

hanguokai · June 2, 2011, 3:26am

upload my patch. It resolve the problem, and work correctly for a month
3.6.4_XMLLightweightParser.java.patch.zip (1813 Bytes)
3.7.0_XMLLightweightParser.java.patch.zip (1719 Bytes)

wroot · June 2, 2011, 3:42am

Filed as OF-458

3JIou-TaTaPuH · June 6, 2011, 4:00pm

Upload it where, sorry for such as newbie question?

Upload as a plugin?

wroot · June 6, 2011, 5:31pm

I think he meant “i’m uploading my patch” This patch can only be applied to the source code and then Openfire should be recompiled.

3JIou-TaTaPuH · June 6, 2011, 6:52pm

Oh oK . But when we can w8 for the new release from U guys?

Mb someone has recompiled .exe with this fix? If yes could U upload somewhere. Would be incredibly appreciated.!

Mike_Laberko · June 8, 2011, 1:16pm

I confirm the same problem on Openfire 3.7.0 + Pandion. Please, somebody, upload compiled patched Openfire if that’s possible!

wroot · June 8, 2011, 4:41pm

This patch fails to apply to current Openfire svn copy in Netbeans for me.

hanguokai · June 8, 2011, 4:59pm

I apply this patch to the selected source(org.jivesoftware.openfire.nio.XMLLightweightParser) in Eclipse , it’s ok.

So I upload the final source code ,thus you can simple replace the target.
XMLLightweightParser.java.zip (4034 Bytes)

wroot · June 8, 2011, 5:08pm

Ok. Now i was able to apply your patch to the selected source file. Attaching the compiled openfire.jar, which should be copied intom /openfire/lib folder. Though i havent tested this and i’m not sure one need only the recompiled openfire.jar. But here you have it. Make a bacup of the original openfire.jar!
openfire.jar (7195618 Bytes)

JackLin · June 9, 2011, 1:12am

I just pass away…

Mike_Laberko · June 9, 2011, 6:15am

Thank you! I’m going to test this jar today =)

Mike_Laberko · June 10, 2011, 5:54am

Well, I’m testing Openfire 3.7.1 Alpha for 18 hrs. It works fine with no errors or warnings (clients are using Pandion). I’ve simply replaced openfire.jar with patched one.

Thank you, guys, for this update!

wroot · June 12, 2011, 10:23am

hanguokai, it looks like you are using Arrays.copyOf in you patch, which is not supported by Java 5 and it looks like wer are still supporting this obsolete version. This was already questioned, whether we should support this version, which has hitted end of life long time ago, see this poll (and maybe vote) http://community.igniterealtime.org/polls/1025

But, maybe you can make this patch with some other function still available in java 5? Meanwhile we have reverted this patch in the svn. But as i said, this can change.

hanguokai · June 12, 2011, 2:58pm

On last patch, I did not notice Java5 restrictions, I usually use the environment Java6.

This time I upload a new patch for Java5.

The old Statement:

char[] buf = Arrays.copyOf(charBuffer.array(), charBuffer.position());

It can be replaced with some other writings.

char[] buf = charBuffer.flip().toString().toCharArray();

char[] buf = new char[charBuffer.position()];

charBuffer.flip();charBuffer.get(buf);

use the implement of Arrays.copyOf :

char[] copy = new char[newLength];

System.arraycopy(original, 0, copy, 0,

Math.min(original.length, newLength));

I use method 2.
XMLLightweightParser.java.patch.zip (1706 Bytes)

akrherz · June 12, 2011, 9:16pm

Thank you! We are always looking for more openfire SVN committers, please consider it

daryl