i have found the problem after a long investigation.
If the first cluster node comes online, it becomes a senior member.
If a second cluster node comes online, the senior member asks for the mucs and online occupants in this node.
The senior member then inserts the online occupants via OccupantAddedEvent in his own maps. However, he replaces the NodeID with his NodeID.
If now the member node goes offline the senior member gets a corresponding event. He removes all occupants from the mucs, which come from the leaving node.
The consequence is that the NodeId is always the one of the senior node and so the remote sessions are not removed. If the client connects on the senior node, it cannot join because a remote session already exists.
I fixed the bug in https://github.com/igniterealtime/Openfire/pull/1748
How would you rate this fixed? Is there anything else to consider?
The pull request is in “draft” state because there are some new problems with the fix.
If a node goes offline, it looks like all other nodes want to send an unavailable presence for the occupants of the node. This fails because the NodeId in the cluster is not known anymore.
Here I will look for how this happens and add a corresponding fix in the branch. Help is welcome.
Best regards
chp
2020.11.03 17:07:45.965 ERROR [hz.openfire.cached.thread-4] (org.jivesoftware.openfire.spi.RoutingTableImpl:279) - Primary packet routing failed
java.lang.IllegalArgumentException: Requested node 5ed5407e-e7fb-4b87-8ef0-9aff16dc07a0 not found in cluster
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doClusterTask(ClusteredCacheFactory.java:397) ~[hazelcast-2.5.2-SNAPSHOT.jar!/:?]
at org.jivesoftware.util.cache.CacheFactory.doClusterTask(CacheFactory.java:701) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.plugin.util.cluster.ClusterPacketRouter.routePacket(ClusterPacketRouter.java:45) ~[hazelcast-2.5.2-SNAPSHOT.jar!/:?]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routeToLocalDomain(RoutingTableImpl.java:359) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.java:262) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.PresenceRouter.handle(PresenceRouter.java:165) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.PresenceRouter.route(PresenceRouter.java:79) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:84) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.spi.LocalMUCRoom.broadcast(LocalMUCRoom.java:1585) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.cluster.BroadcastPresenceRequest$1.run(BroadcastPresenceRequest.java:87) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.cluster.MUCRoomTask.execute(MUCRoomTask.java:81) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.cluster.BroadcastPresenceRequest.run(BroadcastPresenceRequest.java:82) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) [hazelcast-2.5.2-SNAPSHOT.jar!/:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_265]
at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) [hazelcast-3.12.5.jar!/:?]
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) [hazelcast-3.12.5.jar!/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) [hazelcast-3.12.5.jar!/:?]
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) [hazelcast-3.12.5.jar!/:?]
2020.11.03 17:07:45.990 ERROR [hz.openfire.cached.thread-14] (org.jivesoftware.openfire.spi.RoutingTableImpl:279) - Primary packet routing failed
java.lang.IllegalArgumentException: Requested node 5ed5407e-e7fb-4b87-8ef0-9aff16dc07a0 not found in cluster
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doClusterTask(ClusteredCacheFactory.java:397) ~[hazelcast-2.5.2-SNAPSHOT.jar!/:?]
at org.jivesoftware.util.cache.CacheFactory.doClusterTask(CacheFactory.java:701) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.plugin.util.cluster.ClusterPacketRouter.routePacket(ClusterPacketRouter.java:45) ~[hazelcast-2.5.2-SNAPSHOT.jar!/:?]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routeToLocalDomain(RoutingTableImpl.java:359) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.java:262) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.PresenceRouter.handle(PresenceRouter.java:165) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.PresenceRouter.route(PresenceRouter.java:79) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:84) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.spi.LocalMUCRoom.broadcast(LocalMUCRoom.java:1585) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.cluster.BroadcastPresenceRequest$1.run(BroadcastPresenceRequest.java:87) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.cluster.MUCRoomTask.execute(MUCRoomTask.java:81) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.muc.cluster.BroadcastPresenceRequest.run(BroadcastPresenceRequest.java:82) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) [hazelcast-2.5.2-SNAPSHOT.jar!/:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_265]
at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) [hazelcast-3.12.5.jar!/:?]
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) [hazelcast-3.12.5.jar!/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) [hazelcast-3.12.5.jar!/:?]
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) [hazelcast-3.12.5.jar!/:?]