A problem with corrupted UTF-8 characters

Hello,

I was running 3_3_2 version of openfire and I have encoutered the XML

parser bug (the one that resulted in adding null characters to utf-8

encoded messages containing specific letters) . I’ve applied the patch,

as suggested by Timur (thank you very much), eventually I’ve downloaded

the 3_3_3 version. Null characters are not appearing, but I have

simillar problem:

I send a message (or IQ stanza) containing characters like the one coded with 286 (polish

language letters- to be more accurate), I am logging that message, and

occasionally it is corrupted. The longer the message- the bigger

possibility it will arrive incorrect. I’ve had that problem using my

custom client as well as using JAJC. Unfortunately I cannot find

pattern, sometimes a message is delivered and logged correctly, and

sometimes the same message has a letter corrupted. Frankly speaking I don’t care much about messages, but corrupted iq stanza is much more serious.

In case it isn’t the server fault- I use the following code to send messages in my custom client:

moWriter = new BufferedWriter(new OutputStreamWriter (moOutputStream,"UTF-8"));            moReader = new BufferedReader(new InputStreamReader (moInputStream,"UTF-8")); ... moWriter.write(str);
moWriter.flush();

I’d be very grateful for all the feedback!

Hello again.

I have the same problem while using psi 0.10 client and fresh openfire_3_3_3. I am creating 20 users crom ct1 to ct20 and a new psi account, ct1 for example. Then I am adding all the users to my roster and changing the group name several times and… I get two group names in reply from openfire, one correct and one with a letter corrupted. Please take a look on the images linked below:

Message was edited by fnx:

Oh, and one more thing. I cannot reproduce the error while using the Spark client. I wonder if I just had luck (or rather didn’t have it) or is there a difference between TCP/IP and HTTP connections while transferring specific UTF-8 characters.

Yes. Im have same problem too with 3.3.2 and 3.3.3 (Psi,Jajc,QIP)…

in changelog for 3.3.3 wrote that it fixed, but not!

Well, it looks as it’s not only mine problem ! ( And I was starting to feel uneasy )

Changelog 3.3.3 says that lightweight parser was inserting null

characters to the stanza (which is fixed- I’ve experienced that problem too), this is simillar but not the

same, as now no null characters are being added, instead- one of

multibyte letters arrives corrupted.

As I’ve said before, the same

stanza will sometimes arrive correct and sometimes not. I have

intercepted one of the corrupt packets, extracted all the characters and the corrupt letter unicode was 65533 (REPLACEMENT CHARACTER), moreover- it doesn’t matter what letter is corrupted- they all arrive as REPLACEMENT CHARACTERS.

It would be great if someone took a look at this case.

Hi,

does it only happen with IQ packets?

LG

Thank you for your interest …

… and no, ordinary message will sometimes arrive corrupted too. Tomorrow I’ll try to check if the org.dom4j classess return correct stanzas, because I don’t know where the error takes place. All I know, for now, that packet.asXml() from PacketInterceptor returns malformed string. I “fixed” that <b>possible</b> bug with introducing my own “coding”. I replace all polish letters with specific strings (’?’ = “_##” for example) on client side before sending them to server, then my plugin decodes them before processing. It works but has serious impact on server performance

I’ve taken closer look at the XMLLightweightParser class.

In the read method there is following piece of code:

CharBuffer charBuffer = encoder.decode(byteBuffer.buf());
char[] buf = charBuffer.array();
int readByte = charBuffer.remaining(); buffer.append(buf, 0, readByte);

where byteBuffer is an instance of ByteBuffer class.

I’ve

placed System.out.println("BUFFER: "+buffer.toString()); just

after the buffer.append. And then I tried to reproduce the error. On

the corrupted message I got following output:

BUFFER:
&lt;iq type="set"&gt;&lt;query xmlns="jabber:iq:roster"&gt;&lt;item
jid="11@localhost" name="11"
ask="subscribe"&gt;&lt;group&gt;XXXXXXXXXX?
BUFFER: &lt;iq type="set"&gt;&lt;query
xmlns="jabber:iq:roster"&gt;&lt;item jid="11@localhost" name="11"
ask="subscribe"&gt;&lt;group&gt;XXXXXXXXXX??XXXXXXXXX

(OOT: is there some xml tag to output xml on this forum???)

Where X are polish (and so multibyte) characters. The following code

CharBuffer charBuffer = encoder.decode(byteBuffer.buf());

is trying to decode incomplete buffer that has been spit just in the middle of multibyte character. Or at least it looks as if that’s the problem. I’ll try to do some more research on that.

Openfire 3.4.0 beta have problem with Unicode chars in messages and status too. The problem arises casually.

And sometimes messages not send to recipient. (Message go to the server, but recipient has not received it)