Sessions tab unresponsive while clustered

We’re doing some tests with clustering Openfire with the hazelcast plugin. Our setup has ~11k connected clients and when running non-clustered, the Sessions tab in the admin UI loads in less than a second. When we activate clustering the Sessions tab takes minutes to load. Is this normal or is it indicative of other problems?

We’re running Openfire v4.6.6 in Docker, which in turn runs on Amazon linux on t3a.large instances on AWS. Hazelcast version is 2.6.0. Load is moderate.
We have tried updating to the latest version (which is 4.7.1 at time of writing) but the problem persists.

Hi Johan,

I do not expect that to happen. It sounds like you uncovered an inefficiency in the code.

Could you please provide a Java thread dump taken at the time when the session page is loading. This likely will help us pinpoint the source of the issue.

There are many ways to create a thread dump. For convenience, an Openfire plugin that is named “thread dump plugin” is made available to allow you to do this through the Openfire admin console.

I’ve installed the Thread Dump plugin and will post a dump file here as soon as I have the time to create one.

1 Like

Here is a thread dump I took with 3700 clients connected, the sessions page loads in ~20 seconds.

Thread-dump.txt (131.8 KB)

Thank you. I suspect that this is the culprit. I’ve created a new issue in our issue tracker for us to have a look at this: [OF-2439] - Ignite Realtime Jira

"Jetty-QTP-AdminConsole-7201" #7201 prio=5
   java.lang.Thread.State: WAITING
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
	at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:161)
	at com.hazelcast.map.impl.proxy.MapProxySupport.containsKeyInternal(MapProxySupport.java:662)
	at com.hazelcast.map.impl.proxy.MapProxyImpl.containsKey(MapProxyImpl.java:299)
	at org.jivesoftware.openfire.plugin.util.cache.ClusteredCache.containsKey(ClusteredCache.java:106)
	at org.jivesoftware.util.cache.CacheWrapper.containsKey(CacheWrapper.java:117)
	at org.jivesoftware.openfire.spi.RoutingTableImpl.isAnonymousRoute(RoutingTableImpl.java:866)
	at org.jivesoftware.openfire.SessionManager.isAnonymousRoute(SessionManager.java:794)
	at org.jivesoftware.openfire.plugin.session.RemoteClientSession.isAnonymousUser(RemoteClientSession.java:85)
	at org.jivesoftware.openfire.SessionResultFilter$SessionComparator.compare(SessionResultFilter.java:235)
	at org.jivesoftware.openfire.SessionResultFilter$SessionComparator.compare(SessionResultFilter.java:214)
	at java.util.TimSort.mergeLo(TimSort.java:717)
	at java.util.TimSort.mergeAt(TimSort.java:514)
	at java.util.TimSort.mergeCollapse(TimSort.java:441)
	at java.util.TimSort.sort(TimSort.java:245)
	at java.util.Arrays.sort(Arrays.java:1512)
	at java.util.ArrayList.sort(ArrayList.java:1464)
	at java.util.Collections.sort(Collections.java:177)
	at org.jivesoftware.openfire.SessionManager.getSessions(SessionManager.java:865)
	at org.jivesoftware.openfire.admin.session_002dsummary_jsp._jspService(session_002dsummary_jsp.java:240)
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1452)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791)
	at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
	at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129)
	at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
	at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:73)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
	at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:49)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
	at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:226)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
	at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:234)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
	at org.eclipse.jetty.server.HttpChannel$$Lambda$205/865376010.dispatch(Unknown Source)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
	at java.lang.Thread.run(Thread.java:748)

Great that you’re looking into this.
I have two questions regarding this:

  1. Do you have some idea when this will be fixed?

  2. Do you think that there are some other adverse effects of this? Or is this isolated to the admin page?

I don’t know when this bug will be addressed. It’s now on a heap of issues that is being tackled by volunteers whenever they are available and willing to invest some time.

Assuming that this is indeed the cause of the slowdown: the inefficiency appears to be with looking up a characteristic of a session that ‘lives’ on a different cluster node than the one on which the lookup is performed. In theory, that affects many actions, but in practise I suspect that it’s noticeable only when many sessions are being evaluated at the same time (such as when a admin console is used to show a listing of all sessions).