powered by Jive Software

XML crippled PacketParserUtils by escaping

I am using Smack for communication with Cisco Finesse. Until now I used Smack version 3.4.1 without issues for a long time. Now I had to upgrade to Smack version 4.3.5 and I am facing the following issue:

From the XMPP server I receive XML packets. These packets arrive in the Smack framework at

PacketParserUtils.parseContentDepthWithRoundtrip(XmlPullParser, int, boolean)

Here the following code

            if (event == XmlPullParser.TEXT) {
                text = StringUtils.escapeForXmlText(text);
            }

(because event == XmlPullParser.TEXT) escapes all ‘<’ in my XML and when the packet of type Message arrives at my StanzaListener the XML looks like this:

&lt;Update>
&lt;data>
&lt;user>
&lt;dialogs>/finesse/api/User/xxxxx/Dialogs&lt;/dialogs>
&lt;extension>xxxxx&lt;/extension>
&lt;firstName>xxxxx&lt;/firstName>
&lt;lastName>xxxxx&lt;/lastName>

&lt;uri>/finesse/api/User/xxxxx&lt;/uri>
&lt;/user>
&lt;/data>
&lt;event>PUT&lt;/event>
&lt;requestId>&lt;/requestId>
&lt;source>/finesse/api/User/xxxxx&lt;/source>
&lt;/Update>

Is there a way to prevent PacketParserUtils.parseContentDepthWithRoundtrip(XmlPullParser, int, boolean) from calling StringUtils.escapeForXmlText(text)?

Thanks,
Zoltan

I have looked into the parseContentDepthWithRoundtrip() of Smack’s 4.4 branch and the current master. I fixed an issue, but as far as I can tell it is unrelated to what you are reporting. I also added two test cases for this method. If you could provide some test vectors that fail with the 4.4 branch (and/or the current master branch), then this would help me to come up with a fix.

It is very well possible that what you are experiencing only exists in Smack 4.3. Are you on Android or Java SE?

No. The method(s) is(/are) designed to return the XML and hence we need to undo the unescaping that is done by the XML parser.

In the hope the issue would be gone with version 4.4, I upgraded to 4.4.0-rc2. Now it got even worse. I don’t have any experience with Smack, I just had to upgrade an existing code (it is Java SE 1.8) to use the newest Smack version for security reasons. So I think that I am just missing some configuration to handle the XMLs coming from a Cisco Finesse server right.
After the upgrade to 4.4.0, now I get XmlPullParserException(“only whitespace content allowed before start tag and not &”) in the Smack framework (MXParserCachingStrings(MXParser).parseProlog() line 1519). Here is the ‘XML’ that is getting parsed:

&lt;Update&gt;
&lt;data&gt;
&lt;user&gt;
&lt;dialogs&gt;/finesse/api/User/xxxxxx/Dialogs&lt;/dialogs&gt;
&lt;extension&gt;xxxxxx&lt;/extension&gt;
&lt;firstName&gt;xxx&lt;/firstName&gt;
&lt;lastName&gt;xxx&lt;/lastName&gt;
&lt;loginId&gt;xxxxxx&lt;/loginId&gt;
&lt;loginName&gt;xxx&lt;/loginName&gt;
&lt;mediaType&gt;1&lt;/mediaType&gt;
&lt;pendingState&gt;&lt;/pendingState&gt;
&lt;reasonCodeId&gt;-1&lt;/reasonCodeId&gt;
&lt;uri&gt;/finesse/api/User/xxxxxx&lt;/uri&gt;
&lt;/user&gt;
&lt;/data&gt;
&lt;event&gt;PUT&lt;/event&gt;
&lt;requestId&gt;&lt;/requestId&gt;
&lt;source&gt;/finesse/api/User/xxxxxx&lt;/source&gt;
&lt;/Update&gt;

As you see each ‘<’ is escaped by ‘&lt;’ and each ‘>’ is escaped by ‘&gt;’. In the Smack Debug Window I see that the XMLs arrive without these escapes, so they are done by the Smack framework (I suppose).
I am pretty sure that this is not a bug but a false configuration on my side. Could you please give me a hint to solve this or direct me to some guide that explains how to upgrade from Smack 3.4.1 (I used the libs smack-3.4.1, smackx-3.4.1 and smackx-debug-3.4.1) to Smack 4.4? For me not being a Smack developer this is quite a challenge.

Thanks,
Zoltan

Then it is strange that parseContentDepthWithRoundtrip() is invovled. On Java SE the StAX parser should be used, which does not support roundtrip parsing. Did you include smack-java8?

It appears that the XML element tags are threated as XML text, an hence the e.g. < is escaped. This should not happen.

I am unable to reproduce this, that is why I asked for a reproducer. Otherwise I have to think more about how to debug this issue with you.

Now I included smack-java8 and smack-xmlparser-stax. The result is the same as above. The following methods are called that escape the XML:

org.jivesoftware.smack.util.PacketParserUtils.parseContentDepthWithoutRoundtrip(XmlPullParser, int, boolean)

here the relevant lines of code:

355: case TEXT_CHARACTERS:
356: if (startElementJustSeen) {
357: startElementJustSeen = false;
358: xml.rightAngleBracket();
359: }
360: xml.escape(parser.getText());
361: break;

From xml.escape(…)

org.jivesoftware.smack.util.escapeForXml(CharSequence, XmlEscapeMode.safe)

is called which does the job of escaping the characters and so for me a crippled xml arrives in my Stanza-listener and I have to revert the escaping. What can I configure to avoid the escaping?

Here the list of libs I have included in my project:

smack-core-4.4.0-rc2.jar,
smack-extensions-4.4.0-rc2.jar,
minidns-core-1.0.0.jar,
jxmpp-jid-1.0.1.jar,
jxmpp-core-1.0.1.jar,
jxmpp-util-cache-1.0.1.jar,
smack-im-4.4.0-rc2.jar,
smack-debug-4.4.0-rc2.jar,
smack-java8-4.4.0-rc2.jar,
smack-tcp-4.4.0-rc2.jar,
smack-resolver-dnsjava-4.4.0-rc2.jar,
dnsjava-3.3.1.jar,
smack-sasl-provided-4.4.0-rc2.jar,
smack-xmlparser-4.4.0-rc2.jar,
smack-streammanagement-4.4.0-rc2.jar,
smack-xmlparser-stax-4.4.0-rc2.jar

There is no knob for this, as this should not be necessary. I am still unsure what happens on your side. As I said before

Also the unit tests I wrote do not show this behavior. Hence I asked you for ideally a unit test that breaks while it should not, or alternative the raw stanza that smack parses, where this happens.

Where can I catch the raw stanza? Can you please tell me the code location?

See https://github.com/igniterealtime/Smack/wiki/How-to-ask-for-help,-report-an-issue-and-possible-solve-the-problem-yourself

An XMPP trace of the exchanged stream elements between client and server, which can be obtained by setting SmackConfiguration.DEBUG (DEBUG_ENABLED in older Smack versions) to true.

I picked a Message-Stanza from the Smack Debug Window and compared the data from the “All Packets” and “Raw Received Packets” tabs and compared the Smack versions 3.4.1 and 4.4.0-rc2.
In version 3.4.1 I see in the “All Packets” tab valid XML data, in the “Raw” tab the XML beginning from the tag <notification> is escaped.
In version 4.4.0-rc2 the XML beginning from the tag <notification> is escaped in both tabs.
I added the logs.

Smack3.4.1_ReceivedMessage.txt (2,9 kB)
Smack3.4.1_ReceivedMessageRaw.txt (1,9 kB)

Since I can only add 2 attachments at a time, I will put the logs for version 4.4.0-rc2 to a next entry.

Here are the logs for version 4.4.0-rc2:

Smack4.4.0-rc2_ReceivedMessage.txt (2,1 kB)
Smack4.4.0-rc2_ReceivedMessageRaw.txt (2,0 kB)

The <Update/> element is proprietary right? It lacks a namespace. Not sure if this is the cause of the issue, but that might be a clue.

Edit: Wait, what is that <notification/> element? That’s not from the PubSub/PEP specification, right?

Yes, I suppose these elements are proprietary. I don’t know the specifications, I simply upgraded existing code from using Smack 3.4.1 to version 4.4.0-rc2. Since there are pretty much API changes I suppose I made some mistake during the migration. I need some help finding out what’s wrong. The other explanation would be, that Smack version 3.4.1 could deal with these proprietary elements while version 4.4.0-rc2 cannot.
We use Smack for receiving XMPP messages from a Cisco Finesse server (a call center solution).

Your problem appears to be that the received PubSub notification message contains an escaped XML document as textual payload. Smack 3.4 potentially handled this differently, but if so, it should not have been done.

You probably simply want to take the raw escaped XML document and unescape it to obtain the actual XML. Or even better, have the originating entity do the right thing by adding the payload “as XML” and not as a String of escaped XML. I do not even think that the <Update/> element being not qualified by an XML namespace is much of an issue here.