Hello Dear Openfire-Community,
During was able to reproduce a bug where Openfire v4.9.2 is not able to accept any new connections because all socket threads are in a blocked state.
Testsetup:
- Establish 10.000 connections (40 users with 500 different resources each)
- Ungracefully terminate all connections
- Try to reastblish connections with a rate of ~ 120/s
Relevant configurations:
Resource conflict: Kick
At some point during the test step 3 openfire did not accept any new connections anymore. I captured a heapdump and analyzed the state of Openfire and saw that every c2s socket thread was in following stacktrace:
Thread 'socket_c2s-thread-17' with ID = 102
java.lang.Object.wait(Object.java)
java.lang.Object.wait(Object.java:328)
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:131)
io.netty.channel.DefaultChannelPromise.await(DefaultChannelPromise.java:30)
io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:405)
io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:119)
io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:30)
org.jivesoftware.openfire.nio.NettyConnection.close(NettyConnection.java:237)
org.jivesoftware.openfire.Connection.close(Connection.java:177)
org.jivesoftware.openfire.session.LocalSession$$Lambda$911.accept(Native method)
java.util.Optional.ifPresent(Optional.java:183)
org.jivesoftware.openfire.session.LocalSession.close(LocalSession.java:481)
org.jivesoftware.openfire.handler.IQBindHandler.handleIQ(IQBindHandler.java:131)
org.jivesoftware.openfire.handler.IQHandler.process(IQHandler.java:127)
org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:403)
org.jivesoftware.openfire.IQRouter.route(IQRouter.java:106)
org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:74)
org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:392)
org.jivesoftware.openfire.net.ClientStanzaHandler.processIQ(ClientStanzaHandler.java:90)
org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:334)
org.jivesoftware.openfire.net.StanzaHandler.processStanza(StanzaHandler.java:222)
org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:114)
org.jivesoftware.openfire.nio.NettyConnectionHandler.channelRead0(NettyConnectionHandler.java:142)
org.jivesoftware.openfire.nio.NettyConnectionHandler.channelRead0(NettyConnectionHandler.java:50)
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:289)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
io.netty.handler.traffic.AbstractTrafficShapingHandler.channelRead(AbstractTrafficShapingHandler.java:506)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.lang.Thread.run(Thread.java:829)
I can provide further information from the stacktrace and also try reproduce the issue once more, in case any additional information is required.
Remark: The test was created in the process of reproducing a production issue, where openfire does not accept new connections after a network interruption. During that production issue, we observed that the affected machines stacked up a lot of TCP connections in the “closed wait” state and we slowly lost user sessions. What might be relevant for the analyzation is, that the initial problem also occured on openfire version 4.7.2 (before switch to nio) but I did not had the chance to run the same test on such a machine.
Are there any configurations available that could help with such a issue?
Best regards,
Nico