Unexpected XML parsing learnings

Smack uses an XML Pull Parser (XPP) to parse XMPP stanzas and build custom Packet objects. Wildfire represents XMPP stanzas as DOM objects wrapped by Packet objects. Each approach has its own pros and cons. Moreover, in Wildfire we can use different parsers to generate DOM objects.

For Wildfire 3.2 we needed to change the way we were parsing XML to work with the new, more scalable, networking layer built using MINA. We wanted to keep building DOM objects wrapped by Packet objects so our parsers were still useful. However, we needed a way to parse stanzas received in an asynchronous way. We ended up reusing a contribution made by a community member. We know that it is a custom solution that we might need to replace at some point since the parsing is not 100% efficient and some cases may potentially break the parser as was seen in Wildfire 3.2.1.

Anyway, while doing some heavy load testing on Wildfire and measuring performance we noticed that the parser we were using to generate DOM objects was becoming a serious bottleneck. At that point we were using a SAX parser (SAXReader) so we decided to try with other parsers and compare results. The other parser that we had in Wildfire was XMPPPacketReader that uses an XML Pull Parser (XPP). To my surprise the performance improvement was substantial with the new parser: around 30% faster with the XPP parser compared to the SAX parser.

Architecturally it is obvious that an XPP parser will be much faster than a SAX parser, but since both parsers were being used to create DOM objects I initially thought it wouldn’t make much of a difference which one was used since building DOM objects was the expensive operation. Well, I was simply flat wrong and I learned it the hard way. I’m happy, though, that we found this bottleneck during our performance tests prior to release. It is yet another proof of how important it is to include QA as part of your development cycle.

A while back i discovered the wonderful world of pull parsers (http://www.xmlpull.org) when doing a JavaME XMPP client using kXML (http://kxml.sourceforge.net)

I was surprised to even see a JSR for Streaming API for XML (StAX)(http://jcp.org/en/jsr/detail?id=173) that i had not heard of… and i have been doing XML in Java for a while… since 2000

Just this week I discovered Woodstox (http://woodstox.codehaus.org), a high-performance validating namespace-aware StAX-compliant (JSR-173) Open Source XML-processor written in Java via the Nux project (http://dsd.lbl.gov/nux/)

A report from SUN from Aug-2005 (http://java.sun.com/performance/reference/whitepapers/StAX-1_0.pdf) showed that woodstock performed the best for small documents (5-10K) on the whole, followed closely by SUN’s SJSXP and XPP3.

So thought you might like to consider some alternate parsers for application performance.

Hey Derek,

Thanks for the information. Very useful info. Since we found Nux we always wanted to give it a try. Moreover, for communication between Connection Managers and Openfire we wanted to use binary XML and Nux was the perfect fit.

Regards,

– Gato