We might have uncovered the root of a compression-related problem. We’re running openfire 3.5.1 with client/server compression enabled. To isolate the problem, we disabled server-server compression and all tls options.
The results of the problem are a NULL pointer exception in jzlib and an inability of Alice (described next) to send/recv messages in a remotely hosted chat room.
Here’s a description:
Alice logs into server X and connects to chat room Z on server Y (Alice enables compression). Bob logs into server Y and joins chat room Z on server Y. At this point, we exercise Alice and Bob’s chat clients to send numerous, simultaneous messages to room Z. Alice suddenly loses the ability to send/receive from the room, albeit, she hasn’t been kicked out.
We believe interleaved access is being granted to Alice’s deflator object and that this interleaved access is the problem.
Here are some bits of evidence that lead us to this conclusion:
-
According to ServerSockerReader.java, server X can spawn threads to handle packets received from server Y.
-
Packets destined to Alice are deflated (using jzlib) before being transmitted.
-
According to our investigations, threads sometimes interleave access to the jzlib deflator object. (See the output at the end of this message.)
org.jivesoftware.openfire.net.ServerSocketReader is suspicious because it can create new threads (at the server) to handle incoming messages. What happens if two messages are received and destined for Alice? Could deflation of one message be incomplete before deflation of the second begins?
Here’s an example of a compression error we’ve seen. Exception messages vary, but usually end in a NULL pointer exception somewhere in jzlib deflate.
java.lang.NullPointerException
at com.jcraft.jzlib.ZStream.flush_pending(ZStream.java:162)
at com.jcraft.jzlib.Deflate.flush_block_only(Deflate.java:776)
at com.jcraft.jzlib.Deflate.deflate_slow(Deflate.java:1207)
at com.jcraft.jzlib.Deflate.deflate(Deflate.java:1567)
at com.jcraft.jzlib.ZStream.deflate(ZStream.java:133)
at org.apache.mina.filter.support.Zlib.deflate(Zlib.java:176)
at org.apache.mina.filter.CompressionFilter.filterWrite(CompressionFilter.java:191)
at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:361)
at org.apache.mina.common.support.AbstractIoFilterChain.access$1300(AbstractIoFilterChain.java:53)
at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(AbstractIoFilterChain.java:659)
at org.apache.mina.filter.codec.ProtocolCodecFilter.filterWrite(ProtocolCodecFilter.java:205)
at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:361)
at org.apache.mina.common.support.AbstractIoFilterChain.access$1300(AbstractIoFilterChain.java:53)
at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(AbstractIoFilterChain.java:659)
at org.apache.mina.common.IoFilterAdapter.filterWrite(IoFilterAdapter.java:90)
at org.jivesoftware.openfire.net.StalledSessionsFilter.filterWrite(StalledSessionsFilter.java:49)
at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:361)
at org.apache.mina.common.support.AbstractIoFilterChain.access$1300(AbstractIoFilterChain.java:53)
at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(AbstractIoFilterChain.java:659)
at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.filterWrite(AbstractIoFilterChain.java:587)
at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:361)
at org.apache.mina.common.support.AbstractIoFilterChain.fireFilterWrite(AbstractIoFilterChain.java:355)
at org.apache.mina.transport.socket.nio.SocketSessionImpl.write0(SocketSessionImpl.java:166)
at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:177)
at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:168)
at org.jivesoftware.openfire.nio.NIOConnection.deliver(NIOConnection.java:223)
at org.jivesoftware.openfire.session.LocalClientSession.deliver(LocalClientSession.java:787)
at org.jivesoftware.openfire.session.LocalSession.process(LocalSession.java:262)
at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.java:235)
at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:356)
at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:101)
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:68)
at org.jivesoftware.openfire.net.SocketReader.processIQ(SocketReader.java:244)
at org.jivesoftware.openfire.net.ServerSocketReader.access$001(ServerSocketReader.java:47)
at org.jivesoftware.openfire.net.ServerSocketReader$1.run(ServerSocketReader.java:80)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:613)
We instrumented JZLib in com.jcraft.jzlib.Deflate.{longest_match,deflate} and noticed that all problems proceeded immediately after interleaved access to the deflation object. Here’s an example log output where one can see interleaved access to the same hashcode (the hashcode is the identifier of the deflator object).
85553 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 30368, cur_match 29945, best_len 240 85554 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 29945, cur_match 29522, best_len 240 85555 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 29522, cur_match 29099, best_len 240 85556 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 29099, cur_match 28676, best_len 240 85557 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 28676, cur_match 28253, best_len 240 85558 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 28253, cur_match 27830, best_len 240 85559 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 27830, cur_match 27407, best_len 240 85560 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 27407, cur_match 26984, best_len 240 85561 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 26984, cur_match 26436, best_len 240 85562 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 26436, cur_match 26201, best_len 240 85563 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 26201, cur_match 26011, best_len 240 85564 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 26011, cur_match 25776, best_len 240 85565 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 25776, cur_match 25586, best_len 240 85566 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 25586, cur_match 25351, best_len 240 85567 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 25351, cur_match 25161, best_len 240 85568 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 25161, cur_match 24926, best_len 240 85569 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 24926, cur_match 24736, best_len 240 85570 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 24736, cur_match 24501, best_len 240 85571 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, next_in 445, next_in_index 0, avail_in 445, next_out 458, next_out_index 0, avail_out 458 85572 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 24501, cur_match 24311, best_len 240 85573 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 24311, cur_match 24076, best_len 240 85574 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, longest_match: prev_length 2 85575 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 24076, cur_match 23886, best_len 240 85576 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, match 0, cur_match 47140, best_len 2 85577 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 23886, cur_match 23651, best_len 240 85578 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, longest_match: prev_length 2 85579 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, match 0, cur_match 53637, best_len 2 85580 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 23651, cur_match 23461, best_len 240 85581 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, match 53698, cur_match 53214, best_len 61 85582 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 23461, cur_match 23226, best_len 240 85583 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, match 53452, cur_match 52791, best_len 238 85584 thread Thread[pool-9-thread-1,5,main], hashcode 10114092, match 23226, cur_match 23036, best_len 240 85585 thread Thread[pool-9-thread-2,5,main], hashcode 10114092, match 52791, cur_match 52368, best_len 238
Thoughts?
Joe