Hello, we’re running:
- Openfire 5.0.1
- Hazelcast plugin 5.5.0.1
- Two clustered server instances (8xCPU, 32GB RAM)
This is a setup intended to replace our original single server deployment that handles ~60.000 connections at its busiest periods.
With this cluster, the moment the load starts increasing rapidly (first time in the morning when clients start to connect), we are seeing continuous timeout and threading errors such as:
2025.10.13 08:31:56.514 ^[[32mINFO ^[[m [hz.openfire.cached.thread-248]: org.jivesoftware.openfire.session.RemoteSessionTask - An exception was logged while executing RemoteSessionTask to close session: LocalClientSession{address=204fda1d-0867-4781-85a9-7ea7e5936cee@xmpp.displaynote.com/I/5F3601E1F57AC2A2BE5E976DD537229D130B7155, streamID=373st5uxpw, status=CLOSED, isEncrypted=true, isDetached=false, serverName='xmpp.displaynote.com', isInitialized=true, hasAuthToken=true, peer address='168.254.25.129', presence='<presence from="204fda1d-0867-4781-85a9-7ea7e5936cee@xmpp.displaynote.com/I/5F3601E1F57AC2A2BE5E976DD537229D130B7155"><c xmlns="http://jabber.org/protocol/caps" hash="sha-1" node="https://github.com/qxmpp-project/qxmpp" ver="K3ag6SarEcZHRQXYiCJ4QixxRkE="></c><receiver-status xmlns="https://www.displaynote.com/ns/commands" receiver-platform="android" status="idle" version-tag="2.39.11"></receiver-status></presence>'}
java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:204) ~[?:?]
at org.jivesoftware.openfire.session.RemoteSessionTask.run(RemoteSessionTask.java:162) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.session.ClientSessionTask.run(ClientSessionTask.java:71) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:603) ~[hazelcast-5.5.0.1.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at com.hazelcast.executor.impl.DistributedExecutorService$Processor.run(DistributedExecutorService.java:286) ~[hazelcast-5.5.0.jar:5.5.0]
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:220) ~[hazelcast-5.5.0.jar:5.5.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) [?:?]
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) [hazelcast-5.5.0.jar:5.5.0]
at com.hazelcast.internal.util.executor.PoolExecutorThreadFactory$ManagedThread.executeRun(PoolExecutorThreadFactory.java:74) [hazelcast-5.5.0.jar:5.5.0]
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:111) [hazelcast-5.5.0.jar:5.5.0]
or (more timeouts):
2025.10.13 08:31:33.019 ^[[1;31mERROR^[[m [socket_c2s-worker-303]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds
java.util.concurrent.TimeoutException: MemberCallableTaskOperation failed to complete within 59999955356 NANOSECONDS. Invocation{op=com.hazelcast.executor.impl.operations.MemberCallableTaskOperation{serviceName='hz:impl:executorService', identityHash=2128197160, partitionId=-1, replicaIndex=0, callId=11441889, invocationTime=1760344233019 (2025-10-13 08:30:33.019), waitTimeout=-1, callTimeout=30000, tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0, name=openfire::cluster::executor}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=30000, firstInvocationTimeMs=1760344233019, firstInvocationTime='2025-10-13 08:30:33.019', lastHeartbeatMillis=1760344288852, lastHeartbeatTime='2025-10-13 08:31:28.852', targetAddress=[xmpp-cluster-prod-02.displaynote.com]:5701, targetMember=Member [xmpp-cluster-prod-02.displaynote.com]:5701 - 9e1acebd-a193-4667-a2df-bc40964eb2fd, memberListVersion=2, pendingResponse={VOID}, backupsAcksExpected=-1, backupsAcksReceived=0, connection=Connection[id=1, /10.18.45.4:5701->/10.18.45.5:53051, qualifier=null, endpoint=[xmpp-cluster-prod-02.displaynote.com]:5701, remoteUuid=9e1acebd-a193-4667-a2df-bc40964eb2fd, alive=true, connectionType=MEMBER, planeIndex=0]}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newTimeoutException(InvocationFuture.java:85) ~[?:?]
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:657) ~[?:?]
at com.hazelcast.spi.impl.DelegatingCompletableFuture.get(DelegatingCompletableFuture.java:119) ~[?:?]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(ClusteredCacheFactory.java:433) ~[?:?]
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory.java:779) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.handler.IQBindHandler.handleIQ(IQBindHandler.java:126) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.handler.IQHandler.process(IQHandler.java:125) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:403) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:106) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:74) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:392) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.net.ClientStanzaHandler.processIQ(ClientStanzaHandler.java:90) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:334) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.net.StanzaHandler.processStanza(StanzaHandler.java:222) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:114) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.nio.NettyConnectionHandler.channelRead0(NettyConnectionHandler.java:142) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at org.jivesoftware.openfire.nio.NettyConnectionHandler.channelRead0(NettyConnectionHandler.java:50) ~[xmppserver-5.0.1.DNRC1.jar:5.0.1.DNRC1]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[netty-codec-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[netty-codec-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:289) ~[netty-handler-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.traffic.AbstractTrafficShapingHandler.channelRead(AbstractTrafficShapingHandler.java:506) ~[netty-handler-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1515) ~[netty-handler-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1378) ~[netty-handler-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1427) ~[netty-handler-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) ~[netty-codec-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) ~[netty-codec-4.1.118.Final.jar:4.1.118.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.118.Final.jar:4.1.118.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998) ~[netty-common-4.1.118.Final.jar:4.1.118.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.118.Final.jar:4.1.118.Final]
at java.lang.Thread.run(Thread.java:1583) [?:?]
or (threading):
2025.10.13 08:31:37.598 ^[[33mWARN ^[[m [ForkJoinPool.commonPool-worker-263]: org.jivesoftware.openfire.nio.NettyConnection - Exception while invoking close listeners for NettyConnection{peer: /110.54.145.108:54898, state: CLOSED, session: LocalClientSession{address=xmpp.displaynote.com/2a710dde-d4c0-4365-b503-c8519504cb6f, streamID=67iyql8bau, status=CLOSED, isEncrypted=true, isDetached=false, serverName='xmpp.displaynote.com', isInitialized=false, hasAuthToken=true, peer address='110.54.145.108', presence='<presence type="unavailable"/>'}, Netty channel handler context name: NettyClientConnectionHandler#0}
java.util.concurrent.CompletionException: java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) ~[?:?]
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) ~[?:?]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:874) [?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) [?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) [?:?]
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1810) [?:?]
at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1796) [?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387) [?:?]
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312) [?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843) [?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808) [?:?]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188) [?:?]
Caused by: java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker
at java.util.concurrent.ForkJoinPool.tryCompensate(ForkJoinPool.java:2000) ~[?:?]
at java.util.concurrent.ForkJoinPool.compensatedBlock(ForkJoinPool.java:3737) ~[?:?]
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3723) ~[?:?]
at com.hazelcast.spi.impl.AbstractInvocationFuture.manageParking(AbstractInvocationFuture.java:692) ~[?:?]
at com.hazelcast.spi.impl.AbstractInvocationFuture.joinInternal(AbstractInvocationFuture.java:583) ~[?:?]
at com.hazelcast.internal.locksupport.LockProxySupport.lock(LockProxySupport.java:67) ~[?:?]
at com.hazelcast.internal.locksupport.LockProxySupport.lock(LockProxySupport.java:59) ~[?:?]
at com.hazelcast.map.impl.proxy.MapProxyImpl.lock(MapProxyImpl.java:321) ~[?:?]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCache$ClusterLock.doLock(ClusteredCache.java:438) ~[?:?]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCache$ClusterLock.lock(ClusteredCache.java:402) ~[?:?]
at org.jivesoftware.openfire.spi.RoutingTableImpl.removeClientRoute(RoutingTableImpl.java:989) ~[?:?]
at org.jivesoftware.openfire.SessionManager.removeSession(SessionManager.java:1286) ~[?:?]
at org.jivesoftware.openfire.SessionManager.removeSession(SessionManager.java:1262) ~[?:?]
at org.jivesoftware.openfire.SessionManager$ClientSessionListener.lambda$onConnectionClosing$2(SessionManager.java:1397) ~[?:?]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
... 9 more
I already have this configured in my hazelcast-local-config.xml:
<executor-service name="openfire::cluster::executor">
<pool-size>500</pool-size>
<queue-capacity>4000</queue-capacity>
<statistics-enabled>true</statistics-enabled>
<!-- <split-brain-protection-ref>splitbrainprotection-name</split-brain-protection-ref> -->
</executor-service>
Anything we could be missing to configure to be able to reduce contention and assume this load?
Happy to provide more details if needed. Thanks!