Invalid character references in XML stream

Hi,

I’m a developer for Digsby, and while debugging a user’s problem connecting to a recently upgraded Openfire server (3.4.3 to 3.5.1) I found the following in a vCard response.

data before being fed into the parser:

'> ’

The problem is that “& #0;” is not valid XML 1.0 (invalid character reference; not in the valid ranges for the construction of Char for XML)

This in turn breaks my parser (libxml2), which causes a disconnect.

I am aware the latest version is now 3.5.2, but looking through the commit history, I didn’t see anything that looked like it would solve the problem.

points of interest: revisions 8618 and 10506 (JM-1388) of XMLLightweightParser

Message was edited by: christopher

please note: added spaces between “&” and “#” so that it this would be visible. original data does not have a space.

Hey Christopher,

The Jira issue you are referring to is about invalid surrogate characters. Is that your case too? If it is then Openfire 3.5.2 will not let a user store in the database invalid surrogate characters. If that is not your case then make sure that your database is storing things in UTF-8 format.

Regards,

– Gato

No, the problem is not with invalid surrogate characters. Unfortunately, the server and database in question are not mine.

I referenced the issue about surrogate characters, and invalid characters in the stream because I believe code for dealing with this will need to be added nearby.

The issue here is that Openfire has sent an invalid character reference, for which I could not find handling in your source tree. According to the XML 1.0 spec, this constitutes a violation of the well-formedness constraint. This is a fatal parser error.

When & #0; comes accross the wire, Openfire has stopped sending valid XML. It does not matter what is in the database, it is simply wrong for the server to send & #0;

Hey Christopher,

Are you sure that that content came from a client? I saw that error happen when not using UTF-8 in the database. So the content that came from the client was fine but it got corrupted when storing it in the database. But maybe this is not your case.

Thanks,

– Gato

I do not know the origin of the invalid XML.

I am certain that it was sent from the server.

Corrupted database or not, this should not have been emitted when building the outgiong XML.

Hi,

I did create JM-884 a long time ago. Anyhow this would not help if the database corrupts the message.

One would need a validation for all incoming and outgoing messages which costs some CPU cycles. This would make Openfire a much more XMPP compliant server.

LG

Sorry to revive this topic…but I think I have the same problem.

I have developed a few plugins…and one of those plugins sends an IQ with the following strings as text-value

“≈♦⇔→•”

…this completely kills the client that requested the info…is there any way around this?

Escape the XML text with StringUtils.escapeForXml()

Thanks a lot. I knew something like this must have existed…I just did not know where to look. I’ll give it a try right now.

Still, somehow I would have expected that the “send” method of Plugin would take care of this…maybe I am a lil spoiled…

Edit:

I did as you told me. It did escape those mentioned strings…however, there still is a Control char ((Unicode: 0x1c)) somewhere…this is quite tricky