Openfire clustered node scatter after some time

Hi all,

I have set up 3 standalone openfire server and clustered them successfully.

We can now capable to handle 50k user concurrently with each server configuration m4.4xlarge on Amazon.

We got satisfied performance with the clustering but some time in middle of somewhere one of them are disappeared from the clustering table and showing N/A i have also take snapshot of the error from the each node which is listed below so any one having the same kind of error and also gone through this and have some solution then please help us.

Server A:

org.jivesoftware.openfire.IQRouter - Could not route packet
com.hazelcast.core.HazelcastInstanceNotActiveException: State: SHUT_DOWN Operation: class com.hazelcast.map.impl.operation.ContainsKeyOperation
at com.hazelcast.spi.impl.operationservice.impl.Invocation.engineActive(Invocation.java:490)
at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:523)
at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:513)
at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:207)
at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnPartition(OperationServiceImpl.java:310)
at com.hazelcast.map.impl.proxy.MapProxySupport.containsKeyInternal(MapProxySupport.java:613)
at com.hazelcast.map.impl.proxy.MapProxyImpl.containsKey(MapProxyImpl.java:243)
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCache.containsKey(ClusteredCache.java:109)
at org.jivesoftware.util.cache.CacheWrapper.containsKey(CacheWrapper.java:110)
at org.jivesoftware.openfire.spi.RoutingTableImpl.hasComponentRoute(RoutingTableImpl.java:907)
at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:321)
at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:115)
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:78)
at org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:341)
at org.jivesoftware.openfire.net.ClientStanzaHandler.processIQ(ClientStanzaHandler.java:99)
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:306)
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:198)
at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandler.java:181)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl.flush(ProtocolCodecFilter.java:407)
at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:236)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.core.filterchain.IoFilterEvent.fire(IoFilterEvent.java:74)
at org.apache.mina.core.session.IoEvent.run(IoEvent.java:63)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTask(OrderedThreadPoolExecutor.java:769)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTasks(OrderedThreadPoolExecutor.java:761)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.run(OrderedThreadPoolExecutor.java:703)
at java.lang.Thread.run(Thread.java:748)
at ------ submitted from ------.(Unknown Source)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:127)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:147)
at com.hazelcast.map.impl.proxy.MapProxySupport.containsKeyInternal(MapProxySupport.java:614)
at com.hazelcast.map.impl.proxy.MapProxyImpl.containsKey(MapProxyImpl.java:243)
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCache.containsKey(ClusteredCache.java:109)
at org.jivesoftware.util.cache.CacheWrapper.containsKey(CacheWrapper.java:110)
at org.jivesoftware.openfire.spi.RoutingTableImpl.hasComponentRoute(RoutingTableImpl.java:907)
at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:321)
at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:115)
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:78)
at org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:341)
at org.jivesoftware.openfire.net.ClientStanzaHandler.processIQ(ClientStanzaHandler.java:99)
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:306)
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:198)
at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandler.java:181)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl.flush(ProtocolCodecFilter.java:407)
at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:236)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.core.filterchain.IoFilterEvent.fire(IoFilterEvent.java:74)
at org.apache.mina.core.session.IoEvent.run(IoEvent.java:63)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTask(OrderedThreadPoolExecutor.java:769)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTasks(OrderedThreadPoolExecutor.java:761)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.run(OrderedThreadPoolExecutor.java:703)
at java.lang.Thread.run(Thread.java:748)

Server B

2018.05.13 16:57:28 org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds
java.util.concurrent.TimeoutException: MemberCallableTaskOperation failed to complete within 30 SECONDS. Invocation{op=com.hazelcast.executor.impl.operations.MemberCallableTaskOperation{serviceName=‘hz:impl:executorService’, identityHash=1772065000, partitionId=-1, replicaIndex=0, callId=77241427, invocationTime=1526230618721 (2018-05-13 16:56:58.721), waitTimeout=-1, callTimeout=30000, name=openfire::cluster::executor}, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeoutMillis=30000, firstInvocationTimeMs=1526230618100, firstInvocationTime=‘2018-05-13 16:56:58.100’, lastHeartbeatMillis=1526230646896, lastHeartbeatTime=‘2018-05-13 16:57:26.896’, target=[172.31.25.215]:5701, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=Connection[id=1, /172.31.24.84:13751->/172.31.25.215:5701, endpoint=[172.31.25.215]:5701, alive=true, type=MEMBER]}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newTimeoutException(InvocationFuture.java:68)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:194)
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:88)
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(ClusteredCacheFactory.java:433)
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory.java:596)
at org.jivesoftware.openfire.plugin.session.RemoteSession.doSynchronousClusterTask(RemoteSession.java:194)
at org.jivesoftware.openfire.plugin.session.RemoteClientSession.incrementConflictCount(RemoteClientSession.java:154)
at org.jivesoftware.openfire.handler.IQBindHandler.handleIQ(IQBindHandler.java:134)
at org.jivesoftware.openfire.handler.IQHandler.process(IQHandler.java:66)
at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:372)
at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:115)
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:78)
at org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:341)
at org.jivesoftware.openfire.net.ClientStanzaHandler.processIQ(ClientStanzaHandler.java:99)
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:306)
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:198)
at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandler.java:181)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl.flush(ProtocolCodecFilter.java:407)
at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:236)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.core.filterchain.IoFilterEvent.fire(IoFilterEvent.java:74)
at org.apache.mina.core.session.IoEvent.run(IoEvent.java:63)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTask(OrderedThreadPoolExecutor.java:769)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTasks(OrderedThreadPoolExecutor.java:761)
at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.run(OrderedThreadPoolExecutor.java:703)

Any suggestions?

Which node left the cluster? Was it Server C? It looks like something as simple as the node in question not being accessible; what do the logs on that server suggest?

Greg

it is serverA.

What could be it’s cause? do you have any hint for that from the logs?

The logs you include from Server A merely indicate that the cluster has been shutdown. The reason for it, if it is logged, will be before that.

Greg

hi thanks for you answer i am attaching all the logs could you please detect it where the problem occurs

Server A

error.log.3 (1.0 MB)
error.log.2 (1.0 MB)
error.log.5 (1.0 MB)
error.log.1 (1.0 MB)
error.log.4 (1.0 MB)

logs from server B

error.log.3 (1.0 MB)
error.log.4 (1.0 MB)
error.log.5 (1.0 MB)
error.log.1 (1.0 MB)
error.log.2 (1.0 MB)

Please let me know anything more you needed

Sorry, but I can’t see anything in there that indicates the reason for the failure,

Greg

Okay i will try once again to get the logs actually there are many concurrent users are connected so couldn’t log the exact error.

Let me try to increase log file size if we can capture any more.

Thanks,
Hiren