Openfire HazelCast Clustering not working as expected

I’ve got two questions or issues in regards to the HazelCast clutering.

  1. When the senior member server is restarted, none of the other servers will promote to the senior member roll. When the server that was a senior member finishes restarting, it starts a new cluster instance and remains completely disjointed from the other cluster members. What is the expected behavior with the hazelcast clustering? With what I’ve seen, it seems that if there is any communication disruption with the senior member, openfire must be restarted on all servers.

  2. The communicatoin between the cluster members and the senior member does not appear to be stable. When I look at the clustering page in the openfire administration page on the senior member, I can see the other servers. When I’m on one of the other servers, I routinely do not get stats back for any other servers except for the local server. I will occasionally get stats back but it is very inconsistent. When I do not get stats back, I get the following error in the error.log. The error occurs immediately and does not wait for 30 seconds as the log indicates.

2015.08.12 15:23:38 org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds

java.util.concurrent.TimeoutException: Call Invocation{ serviceName=‘hz:impl:executorService’, op=com.hazelcast.executor.impl.operations.MemberCallableTaskOperation{serviceNa me=‘null’, partitionId=-1, callId=265, invocationTime=1439393018757, waitTimeout=-1, callTimeout=30000}, partitionId=-1, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=30000, target=Address[x.x.x.x]:5701, backupsExpected=0, backupsCompleted=0} encountered a timeout

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicatio nResponse(InvocationFuture.java:366)

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicatio nResponseOrThrowException(InvocationFuture.java:334)

at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFut ure.java:225)

at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:71)

at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronous ClusterTask(ClusteredCacheFactory.java:335)

at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:588)

at org.jivesoftware.openfire.admin.system_002dclustering_jsp._jspService(system_00 2dclustering_jsp.java:123)

at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1669)

at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:11 8)

at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1652)

at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:74)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1652)

at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingF ilter.java:50)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1652)

at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:78)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1652)

at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:159)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1652)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:22 3)

at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:11 27)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)

at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185 )

at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:106 1)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:215)

at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.jav a:110)

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:497)

at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)

at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)

at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)

at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635 )

at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Unknown Source)

I’m running Openfire 3.10.2 and Hazelcast plugin 2.1.1. I’ve replicated the set up with a set of 3 Windows servers and a set of 2 Linux servers. The servers are in the same subnet with no network devices in between them. I’ve tried using both the multicast and the tcp-ip configuration with the same results.

I’m looking for some insight in to expected hazelcast results and hopefully some tips on making this more stable.