Interesting projects

Hey guys,

Some interesting projects:

  1. Nux: XML library that includes binary XML format. It looks quite good, although we’'d need to switch to XOM dom4j. http://dsd.lbl.gov/nux/

From the latest changelog:

"Numerous bnux Binary XML performance enhancements for serialization and deserialization (UTF-8 character encoding, buffer management, symbol table, pack sorting, cache locality, etc). Overall, bnux is now about twice as fast, and, perhaps more importantly, has a much more uniform performance profile, no matter what kind of document flavour is thrown at it. It routinely delivers 50-100 MB/sec deserialization performance, and 30-70 MB/sec serialization performance (commodity PC 2004). It is roughly 5-10 times faster than xom-1.1 with xerces-2.7.1 (which, in turn, is faster than saxonb-8.5, dom4j-1.6.1 and xerces-2.7.1 DOM). Further, preliminary measurements indicate bnux deserialization and serialization to be consistently 2-3 times faster than Sun’‘s FastInfoSet implementation, using XOM. Saxon’'s PTree could not be tested as it is only available in the commercial version. The only remaining area with substantial potential for performance improvement seems to be complex namespace handling. This might be addressed by slightly restructuring private XOM internals in a future version. "

  1. Sea: a SEDA implementation: http://dsd.lbl.gov/sea/

-Matt

Matt,

Very interesting, the bit about zlib compression was enlightening as thats the only on-the-wire compression algorithm recognised in the JEP spec (thats between cient/server).

In any case I really want to make sure that whatever compression technique we do use, thats its pluggable. We could start with standard plain text and then introduce it and get a good benchmark against different compression techniques.

The comms architecture seems like a good fit with the previous threads re EmberIO etc. and seems a lot more advanced.

Conor.

As far as benchmarking goes, I’‘m wrapping up a UI to a metering plugin. It provides a way to record performance metrics from any JMX MBean. I’‘m still wrestling with the admin interface, but hope to have a beta that’'s “good enough.” shortly.

At some level we’'ve got to commit to a common architecture. NIO is the lowest step on the ladder when it comes to implementation, but the possibilities are wide open. SEA, JCyclone, et al. fall into the SEDA camp and Quickserver at the high level. Being pluggable at what level is important, but the “pluggability” of NIO vs Quickserver are very different.

I for one am in favor of a SEDA implementation, but Matt said there are some nay sayers to the glories of SEDA architecture (Matt, please dig deep and see if you can find those references ;). SEDA provides the most flexibility for “pluggability” because it’‘s built on stages. SEDA resembles a Chain of Command design pattern, and CoC’'s are known for the extensibility.

my 2c.

Noah

Pluggability would be nice at 2 levels

  1. the wire format i.e. be that plain text, zlib, fast info set, bnux, bnux with optional zlib compression. This would allow us to benchmark different compression algorithms.

  2. the comms architecture i.e. standard io, nio, SEDA, emberio.This would allow us to benchmark different comms architecture. For large scale / high throughput SEDA seems like a likely option, but it would be nice to have to option to switch out to a different configuration.

The benchmarking results will allow us to choose the best general strategy without fixing ourselves to a particular setup.

What do you think?

Conor.

Agreed. These are the two areas that are of most interest.

Item #1 I believe requires some extensions to JEP-0144x to negoiate which level is appropriate.

Item #2 EmberIO provides a framework to switch between sio and nio. It’‘d be interesting to add SEDA as another option. We might also fork EmberIO (since there’'s no activity on that project) and bring it over.

What metrics do you think we should look for? Here’'s my off the cuff list:

Protocol:

  • Size of packets sent (avg, largest, smallest, last, total)

  • RT time (if it’'s possible)

  • Transformation time (avg, largest, smallest, last, total)

For IO, I see the following logical pipeline:

client -> packet in start -> packet in end -> packet routed

-> (packet serialized) -> packet out start -> packet out end -> {JM Round trip}

-> packet in start -> packet in end -> (packet deserialized) -> packet routed

-> packet out start -> packet out end.

/pre

Let me create some acronyms…these match 1 to 1 with the model above.

client -> PIsi -> PIei -> PRi -> PS -> POsi -> POei

-> {JM Round Trip} -> PIeo -> PIeo -> PD -> PRo -> POso -> POeo

/pre

PI = Packet In

PO = Packet Out

PR = Packet Route (optional)

PS = packet serialized

PD = packet deserialized

s = start

e = end

i = in

o = out

IO:

  • Number of packets (in/out total)

  • PIsi to POeo time (average, max)

  • PIsi to PIeo time : Input time

  • POsi to POeo time : Outbound time

  • PIeo to POsi : Internal CM crunch time

  • Fragmentation of any PI or PO, ie the number context switches that occur at this level (average, max, last)

  • PS and PD time.

This is just a first cut. We can always add more

Noah

  1. the wire format i.e. be that plain text, zlib,

fast info set, bnux, bnux with optional zlib

compression. This would allow us to benchmark

different compression algorithms.

I definitely like the idea of flexibility. However, two major things to note:

  1. If we go with nux (and bnux) we’'ll likely need to change the core packet representation in Jive Messenger to use XOM instead of DOM4J. That may also preclude using fast infoset, etc.

  2. A major goal is to utilize as few resources on the core router as possible. That probably means that we won’‘t want to use compression, but we’'ll still probably want to test.

-Matt

This is a topic I brought up before regarding base packet representation. DOM4J is a great toolkit, but we should definitly look at ways of providing alternative ways of parsing packets. Using dom, stax, sax to do this requires some effort to evaluate which is better? FastInfoset works provides implementations for SAX and StAX (and probably DOM). Hopefully nux/bnux does as well.

What do mean by core router…the router in JM or the router in CM?

Noah

Mina from the Apache directory project looks pretty interesting also: http://directory.apache.org/subprojects/network/index.html. Supports an abstraction of I/O and protocol and supports SSL on NIO if used with Java5.

“Thread pools are implemented as filters so that users can customize thread model.”

Pretty good examples to peak interest.

Rob