Revisiting org.xmpp.packet.*

I would like to start the discussion surround how messengers org.xmpp.packet implementation could be unified (between smack and messenger) and made pluggable in context of this project.

This url http://www.jivesoftware.org/community/thread.jspa?threadID=15708&tstart=0&messa geID=103277#103277thread[/url] raises the issue and this url http://www.jivesoftware.org/community/message.jspa?messageID=104915post[/url] brings up the desire to investigate alternatives to text based xml.

Dom4j is the current implementation, but lacks the standard interface that would allow it to “swapped” out. DOM/SAX/StAX are all standard interface in JavaSE 5 so they provide a plugin point. Nux/bnux doesn’'t support these interfaces, but does support XQuery. This brings an interesting point. We could interface the XMPP packet stanzas as XQuery paths and let the best implementation be the default provider.

Performance is definitely the deciding factor so benchmarking and a framework/test harness to reproduce the results should also be generated.

What do you think?

Noah

Just to clarify my understanding

  • Smack - a XMPP client side library, uses an xml pull parser and a simple utility class to convert the parsed stanzas into Packet (Message etc) objects.
  • Whack - a XMPP component library, uses an xml pull parser, dom4j and org.xmpp.packet classes Packet (Message etc) to wrap the dom4j
  • Messenger - uses org.xmpp.packet classes from Whack throughout its code base.

We are now talking about how to achieve pluggability in the org.xmpp.packet classes to swap in other xml processing techniques such as nux etc. Please correct me if I have missed the point.

The common ground would appear to be XPath - dom4j supports it and as its a subset of XQuery it should be usable for nux aswell. The stanzas are well known so it should be possible to set up a simple trial and benchmark to see what the performances are like. Would also be a nice opportunity to start having a look at the wire formats outside the box.

However it looks like either org.xmpp.packet must become a pure interface/abstract package or a higher level representation is needed in JM, simply because out of the box we need to support the existing plain text wire format. The alternative is to convert existing plain text wire format to the new CM<->JM wire format in memory at JM to keep everything working as is. Or?

Agreed that performance is the key, should tests must use streaming to simulate real thing or are you thinking of looking at file reading/writing. Also would like to see memory usage stats. This will also be important for the production server.

Conor.

Yes, yes and yes We’'re on the same page.

The question that sticks out in my mind is XPath vs. DOM/StAX/SAX the more appropriate interface point. I think a discussion would be useful in terms managability…but performance is the trump card.

Interfaces or Abstract classes should also be considered to replace the concrete classes. I tend to lean towards interfaces but we should pick one and run with it.

Noah

There’‘s two assumptions that I’'m not sure I agree with:

  1. The assumption that the XML lib we use should be pluggable in the Packet class. Instead, I’‘d prefer to find the best library and simply use that. Dom4J is actually already “pluggable” in the sense that you can use different XML parsers to get the XML in – standard DOM, various SAX implementations, XPP3, etc. We happen to use XPP3 at the moment because it’‘s quite fast. We’‘re really just choosing between high-level XML API’‘s based on performance, ease-of-use and the various services they provide such as binary XML. We definitely don’‘t want to create our own abstraction over the other XML API’'s (unless we want a BileBlog directed against us in the style of the logger API for all the log packages).

Nux looks very flexible with regards to XML parser implementation and can even support Fast Infoset (in addition to Bnux) according to an email from the Nux author. So, the main thing we need to do is check out the XOM API and ensure that it will do what Dom4J is doing for us now.

  1. The assumption that Smack and Messenger Packet representations must be the same. I agree that it’'s a good goal, but anything we come up with for Messenger may simply be too heavy-weight for Smack. Smack has a really small JAR at the moment, which is great for embedding into other apps.

Thanks,

-Matt

Don’‘t want the bile from Hani…that’'s for sure.

Creating a another high level library is silly…agreed, but at some point we need to actually create a class that translates the xml to java object and back again. Currently Dom4j is our flex point. What I’‘d like to discover is what the most optimal flex point in terms of performance and implementation flexibility…in that order. The least common denominator between XOM and the rest of the XML api’‘s is XQuery. If the packet classes use XQuery to define the values they represent than we could use either implementation. To me, this is putting implementation flexibility in front of performance (well theoretical since we have compared the two in any micro benchmark). So at this point, it’'s about benchmarking what we have and looking at experimenting with other apis to see what makes the best fit.

One requirement that we’'ll have to consider is that in the same runtime, we might have binary and non binary xml being sent to JM. Since we need to retain the OOTB ease of use, we can restrict clients or users from simply pointing to the JM directly nor would it be wise to require a CM to access JM because it reduces initial ease of use.

Regarding point number 2, I think you ment to say Smack and Messenger Packet implementation. There representation is xml. They time that these two api will be of benefit to development is when the Smack library is used in Messenger. My use case was to use GoogleConnection in a transports so we can have a gateway until they implement s2s. I can agree that this may be unecessary and superceded in a relatively short time.

Noah

Re: Smack/Whack, I don’'t see the point making Smack bigger than it is or introducing additional dependencies. Its a client library so, lets keep it like that.

Referring back to org.xmpp.package - I am now confused. This is the current setup.

client<–-plain text format----->JM (xmpp/dom4j)

In addition we need to support this.

client<–plain text format- > CM (nux/xom) <-bnux forrmat----->JM (xmpp/dom4j)

I added nux/xom/bnux to have something concrete. Is that closer to what you are thinking Matt?. If so then the only issues for JM is have bnux read/write support. This can be added using a specifc connection handler for CM connections (as per the s2s specific connection handlers) and would allow us to focus more on the CM side of things, without going and reworking JMs existing code base.

If we find that num/xom is really working for us we can always rework JM core to use xom/num instead of dom4j.

Sorry if I am not “getting” it, but its a bit difficult without nice pictures!.

Conor.

You’'ve hit it right on the head.

So it seems that the Smack/XMPP interop is not all that important. ''nuf said.

I think it understood, but not stated, that the JM connection handler will need to be reworked.

Noah

Yep, echoing what Noah said – whatever we pick for the CM I’'d also like to use in JM core for easy interop. So, if the CM uses bnux, the main Packet class should be switched to use NUX instead of DOM4J.

-Matt

Does anyone have a nice set of xml files for Message/IQ/Presence/ etc. We are going to need this in order to build unit tests for org.xmpp.packet. Once the unit tests are complete (functional and load/time added) we can migrate the entire org.xmpp to whatever underlying library we like and simply run the tests to make sure we don’'t break JM.

I created some unit tests last night for Packet and started putting in nux in place of dom4j. Its fairly straight forward. Only QName doesn’'t seem to have a direct matching class in xom - I started using Element(localname, namespace) as a replacement.

Conor.

We need some use cases…that’'s for sure.

We can invent some use cases or maybe we can borrow some live chat transripts and use those. One idea is to create an audit log of all the conversations on a well know messenger installation, scrub the text and change the names to protect the innocent and use that as our initial test set. Maybe the wednesday developer chat could be replayed (since it’'s all public anyway) and we can use that as another test bed.

As far as benchmarking goes, please consider the following: Create a JMX interface to allow us to see what’‘s going on. Gato and I have had some discussions on this and feel that it’‘s the best/standard approach. Since JMX is somewhat foreign to folks in my experience, I can assure you that it’‘s very very easy. What it does require is that you think about your code in terms of counters and blocks of execution. Let say in the Message packet you have a “parse” function. One place to record a count would be the beginning of the “parse” function to capture how many times this was called. The other element, time, is to record the beginning and end of each call, save the difference and add it to a running tally. Given the total / number calls gives you an average. Expose this through JMX and a metering project I’'m currently close to releasing will be able to capture/store/chart that information.

Noah

JMX is a nice way to expose those statistics. Rather than go through all the code adding measurement hooks, any thoughts on using AOP to intercept the calls we are interested in and measure it like that?

It would be less intrusive, we can keep the parsing code clean, it is easy enough to change whats being measured. Some things may only be useful during development, others you might want to keep for production aswell.

Another thought to the pile.

Conor.

As I was writing my response, I also thought that AOP would be applicable. Monitoring the entry and exit of methods for time metrics. If this were an enterprise application or one where performance was less important than ease of development, then AOP would make sense. However, performance is our main driver and AOP is not a free operation…the dynamic invocation of the method will add addition instructions to be executed. Plus, JVMTI is what’‘s expected for performance profiling and JProfile and various other profilers exists that will give us that information. However, you don’'t want to run production code with JVMTI hooked up as it introduces overhead (just like AOP does).

So where is the happy medium? My answer is right where we started…just add the elements to the class and expose the information you’‘re really after. In this case start time of parse, end time of parse and the number of times the methods been called. It’'s dead simple, works and has minimal overhead.

Hopefully I have not angered the AOP gods with this one

Noah

My understanding is that with compile-time weaving (AspectJ) and a standard optimizing compiler, you can introduce aspects for debugging and monitoring, and with no code changes produce binaries that suffer no overhead if they’‘re not used. JVMTI is a good generic solution to profiling but truely useful metrics will rely on application-specific metrics that probably can’'t be had without introducing something like AOP (or manual proxies) into the mix.

Good application metrics also provide the opportunity for feedback loops to influence and optimize the overall server performance. Knowing how many and the types of packets flowing to particular destinations, average packet size (and how a particular packet measures compared to the average), destination and source addresses, etc can all be both useful metrics for monitoring and administration, but also could be used by the router itself in determining routing strategies.

AOP isn’'t necessarily the only way to do this, but it seems like a good candidate considering the alternatives (manual coding across the entire class heirarchy).

-iain