powered by Jive Software

BUG: org.xmlpull.v1.XmlPullParserException: parser must be on START_TAG or TEXT to read text

Hi,

it looks like smack packet reader fails to read this message

<message xmlns='jabber:client' from='kenai@muc.kenai.com/Frederic Jean'
id='1258145104804+purplef5b633d8' to='jbecicka@kenai.com/NetBeans'
type='groupchat' xml:lang='en'><body>502 - Bad Gateway
A 502 status code indicates that a server, while acting as a proxy,
received a response from a server further upstream that it judged
invalid.</body><body xmlns='http://www.w3.org/1999/xhtml'><span
style='font-weight: bold;'>502 - Bad Gateway</span>A 502 status code
indicates that a server, while acting as a proxy, received a response
from a server further upstream that it judged invalid.</body><delay
xmlns='urn:xmpp:delay' stamp='2009-11-13T20:45:04.804+00:00'/></message>

I know, that this message stanza is not according to xmpp spec. But this message was sent using adium and it would be nice, if it will not cause smack to disconnect due to:

org.xmlpull.v1.XmlPullParserException: parser must be on START_TAG or
TEXT to read text (position: START_TAG seen
...s=\'http://www.w3.org/1999/xhtml\'><span style=\'font-weight:
bold;\'>... @143:226)
        at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1071)
        at
org.jivesoftware.smack.util.PacketParserUtils.parseMessage(PacketParserUtils.java:89)
        at
org.jivesoftware.smack.PacketReader.parsePackets(PacketReader.java:272)
        at
org.jivesoftware.smack.PacketReader.access$000(PacketReader.java:44)
        at org.jivesoftware.smack.PacketReader$1.run(PacketReader.java:76)

The Exception itself is OK (message stanza is really malformed), but the Exception is not handled and
it cause other problems: few following stanzas are not parsed properly.

See suggested patch.

Thanks,
Jan

PacketParserUtils.patch.zip (801 Bytes)

I’d like to ask someone to review and apply this patch, if it is OK.

I’ve been having a similar problem. Mine occurs whenever I have a pidgin client join a multi user chat room that a smack client is in. From looking at XmlPull, and specifically the MXP1 implementation, it would appear that Smack is parsing messages in such a way that this exception will always be thrown if the message itself contains mixed content (which, as you were saying, is specifically prohibited by the XMPP protocol, but unfortunately that doesn’t seem to be stopping pidgin from doing it). I think the best solution at this point would be to modify MXP1 itself so that its readText() method, which reads the textual content of an XML element and throws an exception if it encounters mixed content while reading, instead ignores any tags nested within the content. This would solve the problem, at the expense of removing all formatting (bold/italic/underline/etc) present in a pidgin message.

My use case involves communication involving both GMail web clients and Pidgin clients, and since GMail doesn’t support embedded formatting anyway a loss of formatting isn’t a problem at all for my use case.

It looks like smack developers does not read this forum, or they are not interested. We need to patch smack.jar ourselves in our applications.

Well, I suppose that the contributor to the smack svn is niess (http://www.igniterealtime.org/fisheye/changelog/svn-org/smack). He is actively working on Smack. A good way to deal with this is:

  • Open a smack entry in Jira (http://www.igniterealtime.org/issues/secure/Dashboard.jspa). A user can be created by wroot (send him a mail via the forum)

  • Create a patch for the current smack trunk, that deals with your problem

  • Notify niess via a forum mail about the jira bug and the attached patch in Jira.

If he agrees with the patch, he may submit it to the SVN.

Walter

Not knowing what your patch was doing at the time, I did a very similar thing to solve this problem.

It looks like in the XML openfire sends back from jabber rooms, there’s a clear text body that comes before the xhtml body.

My code checks to see if body has been already populated (which it generally has) and skips the whole thing.

If for some reason it gets a message with just xhtml body, it silently fails, (which is probably stupid ; )

else if (elementName.equals("body") && body.equals("")) {  //added a check so this won't run if body was already found
    String xmlLang = getLanguageAttribute(parser);
    try{
        body = parser.nextText();
    }
    catch(Exception e){ } //silently discard failures.  (bad?)
    message.addBody(xmlLang, body);
}

Aaron Propst

Can someone link the corresponding Adium and Pidgin tickets? I can’t reproduce it, my pidgin client works XEP-0071 compliant.

I have an openfire jabber server running on a mac mini. In it is a MUC room that a bunch of my friends use day to day.

I wrote a bot application using smack that sits in that room and does various tasks. I encountered this problem repeatedly until I pulled smack down and patched it with the code I mentioned above.

So, the messages themselves in that MUC channel come from any number of clients (mostly pidgin and adium) but in the end, they’re actually sent to my bot from the openfire server, and they contain 2 body tags, one that has plain text, and one that has XHTML.

The XHTML body reliably crashes this code.

I’m sorry I don’t have idea how to “send a mail to person via forum”.

I know, that there is JIRA bugtracking, but I don’t have permission to create issue.

those packets are in my original post. See the very first message.

Patch for this bug is here http://www.igniterealtime.org/community/message/201984

XML Pull Parser is an interface that defines parsing functionlity provided in XMLPULL V1 API

There are following different kinds of parser depending on which features are set:

  • behaves like XML 1.0 comliant non-validating parser if no DOCDECL is present in XML documents when FEATURE_PROCESS_DOCDECL is false (this is default parser and internal enetites can still be defiend with defineEntityReplacementText())
  • non-validating parser as defined in XML 1.0 spec when FEATURE_PROCESS_DOCDECL is true
  • validating parser as defined in XML 1.0 spec when FEATURE_VALIDATION is true (and that implies that FEATURE_PROCESS_DOCDECL is true)
    There are only two key methods: next() and nextToken() that provides access to high level parsing events and to lower level tokens.

The parser is always in some event state and type of the current event can be determined by calling getEventType() mehod. Initially parser is in START_DOCUMENT state.

impresoras cd

__Good FRIENDS are hard to find, harder to leave, and impossible to forget__