powered by Jive Software

Analyze cause of ParseError after upgrading from Smack 4.3.5 to. 4.4.0

After upgrading from Smack 4.3.5 to 4.4.0 when sending an XMPP message to Firebase (fcm-xmpp.googleapis.com) we encounter the following error from the XMPPTCPConnection’s PackageReader while parsing packets in our logs:

org.jivesoftware.smack.xml.XmlPullParserException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1178]
Message: XML document structures must start and end within the same entity.
	at org.jivesoftware.smack.xml.stax.StaxXmlPullParser.next(StaxXmlPullParser.java:194)
	at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.parsePackets(XMPPTCPConnection.java:1140)
	at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.access$700(XMPPTCPConnection.java:913)
	at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader$1.run(XMPPTCPConnection.java:936)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1178]
Message: XML document structures must start and end within the same entity.
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:652)
	at org.jivesoftware.smack.xml.stax.StaxXmlPullParser.next(StaxXmlPullParser.java:192)
	... 4 more

A have a unit test to trigger this problem and to campare with the same message between the Smack versions.

Next I activated debug output via
setDebuggerFactory(ConsoleDebugger::new)
while configurationg the connection.

For 4.3.5 all runs fine a I get nice output.
Unfortunately, with Smack 4.4.0 using JXMPP 1.0.1 I end up with an error writing the debug output right after ouputting
XMPPConnection authenticated
while processing the actual message:

XMPPConnection closed due to an exception (XMPPTCPConnection[XXX] (0))
java.lang.UnsupportedOperationException
	at org.jxmpp.xml.splitter.XmlSplitter.processChar(XmlSplitter.java:376)
	at org.jxmpp.xml.splitter.XmlSplitter.write(XmlSplitter.java:155)
	at java.base/java.io.Writer.write(Writer.java:290)
	at java.base/java.io.Writer.write(Writer.java:249)
	at java.base/java.io.Writer.append(Writer.java:322)
	at org.jivesoftware.smack.debugger.SmackDebugger.lambda$newConnectionWriter$3(SmackDebugger.java:123)
	at org.jivesoftware.smack.util.ObservableWriter.notifyListeners(ObservableWriter.java:103)
	at org.jivesoftware.smack.util.ObservableWriter.flush(ObservableWriter.java:50)
	at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketWriter.writePackets(XMPPTCPConnection.java:1353)
	at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketWriter.access$4200(XMPPTCPConnection.java:1158)
	at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketWriter$1.run(XMPPTCPConnection.java:1208)
	at java.base/java.lang.Thread.run(Thread.java:834)

This seems to be related to a CDATA section used in a custom ExtensionElement - e.g. build like this
<![CDATA[" + json + "]]>

While debugging I noticed the UnsupportedOperationException is for the case AFTER_COMMENT_BANG, the actual character processed is “[” and the content of the splittedPartBuffer is
<message><gcm xmlns="google:mobile:data"><![

I suspect this to be a bug, but it is likely unrelated to the actual issue I wanted to analyze.

Could you please give me any pointers how to quickly identify the cause of the “main issue” with the ParseError? The actual message sending is successful.

Thanks a lot,
Eric

Well, the XML element that triggers this would be very helpful. Can you share the unit test or at least the XML?

Yes, that is probably a bug in jXMPP. Very possible that CDATA is not (yet) supported.

I usually point users to this Smack wiki page, but it appears you did the two most important things: enable a debugger on the connection and use a Java debugger.

Your main issue appears in Smack 4.4. because in 4.4 Smack uses StAX parsing when run on Java (and no longer XPP3 for XML parsing, which is still used on Android).

Anyway, I think I should first see the input that triggers this before investigating this further.

Hey Florian, thanks for looking into it. I must admit, after having a fresh look at it myself, I realized I overlooked some important aspects and now understand much more about the problem.

Please find my learnings:

  • Smack does not log this problem anywhere on ERROR level, instead there is an INFO log from AbstractXMPPConnection.java:1026 AND a WARN logging from AbstractXMPPConnection.java:1738. The reason for the ERROR log is a registered ConnectionListener in our custom code doing the error logging in connectionClosedOnError() which was added in the past, after stream errors caused by invalid xml only logged at WARN by Smack did not create enough attention.

  • Smack’s INFO logging is closest to the actual issue and provides the best message “Exception while waiting for closing stream element from the server”

  • The same parse error also happens with Smack 4.3.5 but is not handled as an error to be reported (ingored). If the exception were reported, the message would be even better due to the details provided by the underlying xpp3 parser implementation “no more data available - expected end tag </stream:stream> to close start tag stream:stream from line 1, parser stopped on END_TAG seen …</data:gcm>… @1:802”
    The difference in behavior is due to a subtle change in the exception handling in org.jivesoftware.smack.tcp.XMPPTCPConnection.PacketReader#parsePackets:
    In 4.3.5 we have:

            catch (Exception e) {
                closingStreamReceived.reportFailure(e);
                // The exception can be ignored if the the connection is 'done'
                // or if the it was caused because the socket got closed
                if (!(done || packetWriter.queue.isShutdown())) {
                    // Close the connection and notify connection listeners of the
                    // error.
                    notifyConnectionError(e);
                }
            }

which now in 4.4.0 looks like

            catch (Exception e) {
                // The exception can be ignored if the the connection is 'done'
                // or if the it was caused because the socket got closed.
                if (!done) {
                    // Set running to false since this thread will exit here and notifyConnectionError() will wait until
                    // the reader and writer thread's 'running' value is false.
                    running = false;
                    // Close the connection and notify connection listeners of the
                    // error.
                    notifyConnectionError(e);
                }
            }

You notice, the comment is still unchanged and refers to ignore the error if the connection is done OR the socket is closed. In fact, the second part of the condition was removed. Now it is sufficient to report the exception if the connection is done, which is the case. With 4.3.5 we had done = true, packetWriter.queue.isShutdown() = false.

  • The console debugger of Smack 4.3.5 with JXMPP 0.6.4 had no issues with a CDATA section in the message, but as the CDATA is not needed to trigger the issue, it can be removed to gather almost the same output for 4.3.5 and 4.4.0.

What I’m still not a 100% sure is the actual cause of this parse issue, which is very likely due to my lack of XMPP protocol knowledge. My current understanding is a response received from the firebase endpoint representing a NACK (because in my test I’m referring to a no longer registered device) ends premature (incomplete) with a closing message, but without closing the stream. As a result the parser encounters a parse error and smack now no longer ignor this, signals the error and sends itself a stream close.

Here is the abbreviated message flow based on the debug output of Smack 4.3.5 for verification.

SENT (0): <stream:stream xmlns='jabber:client' to='fcm-xmpp.googleapis.com' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' xml:lang='en'>
RECV (0): <stream:stream from="fcm-xmpp.googleapis.com" id="EC570024B895B6A5" version="1.0" xmlns:stream="http://etherx.jabber.org/streams" xmlns="jabber:client">
RECV (0): <stream:features><mechanisms >...</mechanisms></stream:features>
SENT (0): <auth xmlns='urn:ietf:params:xml:ns:xmpp-sasl' mechanism='PLAIN'>XXX</auth>
RECV (0): <success xmlns="urn:ietf:params:xml:ns:xmpp-sasl"/>
SENT (0): <stream:stream .../>
RECV (0): <stream:stream from="fcm-xmpp.googleapis.com" .../>
RECV (0): <stream:features><bind .../><session .../></stream:features>
[...]
User logged (0): XXX
SENT (0): <iq id='2VMih-5' type='get'><query xmlns='jabber:iq:roster'></query></iq>
XMPPConnection authenticated (XMPPTCPConnection[XXX] (0))
SENT (0): <message id='2VMih-6'><gcm xmlns="google:mobile:data"><![CDATA[{"message_id": "1234", "to":"XXX","data":{XXX}}]]></gcm></message>
RECV (0): <message><data:gcm xmlns:data="google:mobile:data">{"message_type":"nack","from":"XXX","message_id":"1234","error":"DEVICE_UNREGISTERED","error_description":""}</data:gcm></message>
SENT (0): <presence id='2VMih-8' type='unavailable'></presence>
SENT (0): 
SENT (0): </stream:stream>

Questions:

  • Is my interpretation correct?
  • What was the reason for the removal of the check for packetWriter.queue.isShutdown()) in the mentioned exception handling condition? I found the relevant commit: https://github.com/igniterealtime/Smack/commit/57961a8cc1f2df6ecc1afa8c4f8460794d8d2dce, but it is quite large and still did not give me the context I was looking for?
  • I likely can’t influence the behavior of the communication partner. Is there any good/clever way to ignore only this particular error, without risking to ignore valid errors?

Thanks a lot,
Eric

There is a lot going on here.

One theory I have about the sequence of events is:

  1. There is an exception in jxmpp because of the CDATA
  2. This exception triggers an unclean connection shutdown
  3. As result of the shutdown, the remove server terminates the connection without sending a closing stream tag
  4. Since the inbound stream is closed, the StAX parser throws a “no more data” exception within Smack’s reader thread
  5. In previous Smack versions this last exception was masked due, but in 4.4 you see it.

That’s my current theory. I do not have the spare time to validate it right now. You mentioned that you “have a unit test to trigger this problem”, where I believe “this” is 1. jxmpp does not CDATA. Proving this would be a good first step, then it can be fixed in jxmpp.

Thanks for getting back. I can surely double check, but the behavior is exactly the same when the wrapping CDATA is removed from the message. This was only introduced to avoid encoding issues within the json payload. Removing the CDATA made no change for my tests, except that this prevented the issue with a confiured console debugger, throwing the UnsupportedOperationException while hitting the CDATA start - so connection debug output can be retrieved quite similar to the previous Smack version.
What I can try to figure out while debugging is when the connection is closed and what is triggering the close to check if there is an unclean shutdown with the consequences you outlined.
It may very well be the case that we never recognized this even with the previous Smack version, just because this exception was simply not logged and message exchange from and end user perspective is working fine.

Here, “this” would be an exceptional connection shutdown. Are you sure you would not have recognized that?