Openfire 4.8.1 getting stuck

Hi, first let me thank you all for your work - we’ve been relying on openfire for more than 10 years now.

A little background: We’re using openfire for connecting with about 25k IoT devices (=clients, direct tls with client certificates, no websockets), plus an external component that wraps remote procedure calls in <iq>s. So in our scenario openfire has to handle mostly IQ and presence stanzas as well as constant reconnects due to unstable internet connections of some devices.

Until recently we were using a very old version of openfire (3.8.1), but we finally managed to upgrade to 4.8.1 only to run into strange issues: After some hours (sometimes days) openfire would stop accepting connections, with the existing sessions becoming unresponsive as well. Restarting would often lead to a similar issue where we would only connect 2k-5k devices and then be stuck at this number of sessions until another restart.

After switching to version 4.7.5 it seems to work fine so far (has been running for about 2 weeks without problems now). The only difference (configuration-wise) is that with 4.7.5 we have to deactivate TLSv1.3 due to OF-1861. Both versions have been running in a container with the cpu limited to 7 CPUs and 5 GiB of memory (up to 70% of that as heap memory). Worker threads are set to 6, listen backlog to 256, everything else is pretty much set to default. Also after a restart 4.7.5 is picking up the connections a lot faster than 4.8.1 (~2 minutes on 4.7.5 vs ~8 minutes on 4.8.1).

I’ve been trying to grep through the log files, but due to the constant reconnect-related warnings with stack traces - and a errors, see below - our log file is rolling over every ~30 minutes (trace logging fills a file within just a few minutes). I found two types of errors:

  1. org.jivesoftware.openfire.IQRouter - Could not route packet (observed in 3.8.1, 4.7.5 and 4.8.1)
Stack trace
java.lang.NullPointerException: Cannot invoke "org.dom4j.Element.createCopy()" because the return value of "org.xmpp.packet.IQ.getChildElement()" is null
	at org.jivesoftware.openfire.IQRouter.sendErrorPacket(IQRouter.java:474) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.IQRouter.routingFailed(IQRouter.java:524) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.java:372) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:447) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:106) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:74) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:392) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.net.ComponentStanzaHandler.processIQ(ComponentStanzaHandler.java:155) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:334) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.net.StanzaHandler.processStanza(StanzaHandler.java:222) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:114) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.nio.NettyConnectionHandler.channelRead0(NettyConnectionHandler.java:142) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.nio.NettyConnectionHandler.channelRead0(NettyConnectionHandler.java:50) ~[xmppserver-4.8.1.jar:4.8.1]
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.traffic.AbstractTrafficShapingHandler.channelRead(AbstractTrafficShapingHandler.java:506) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at java.lang.Thread.run(Thread.java:1583) [?:?]
  1. org.jivesoftware.openfire.nio.NettyConnection - Problem during connection close or cleanup (observed only in 4.8.1)
Stack trace 1
java.nio.channels.ClosedChannelException: null
	at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1155) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at java.lang.Thread.run(Thread.java:1583) [?:?]

and/or

Stack trace 2
java.io.IOException: Broken pipe
	at sun.nio.ch.SocketDispatcher.write0(Native Method) ~[?:?]
	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62) ~[?:?]
	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:137) ~[?:?]
	at sun.nio.ch.IOUtil.write(IOUtil.java:81) ~[?:?]
	at sun.nio.ch.IOUtil.write(IOUtil.java:58) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:542) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:415) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:931) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:921) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:907) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:893) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.ssl.SslHandler.forceFlush(SslHandler.java:2236) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:825) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:802) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:925) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:907) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:893) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:125) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:925) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:966) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:934) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:984) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at org.jivesoftware.openfire.nio.NettyConnection.close(NettyConnection.java:215) ~[xmppserver-4.8.1.jar:4.8.1]
	at org.jivesoftware.openfire.nio.NettyXMPPDecoder.exceptionCaught(NettyXMPPDecoder.java:72) ~[xmppserver-4.8.1.jar:4.8.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.handler.ssl.SslHandler.exceptionCaught(SslHandler.java:1203) ~[netty-handler-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:125) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:177) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.100.Final.jar:4.1.100.Final]
	at java.lang.Thread.run(Thread.java:1583) [?:?]

I guess the first error could be a combination of an empty result IQ that cannot be routed back because the client has disconnected.

I don’t think either of these two errors seem to be responsible for blocking any threads (or something similar) as I can find other log entries on the same thread shortly after the errors.

We’ll stay on 4.7.5 for now, but we would like to stick as close to the most recent version as possible in the future. Does anyone have an idea why we run into these issues?

1 Like

While my response here may not be very helpful, it would be interesting to know if you have these same issues on 4.8.0. The issue you are describing may have crept in between 4.8.0 and 4.8.1.