Openfire increases threads and does not reduce them

DkW · December 17, 2023, 12:52pm

Hi All,
We have two servers clustered with the Hazelcast plugin and Openfire server version of 4.7.4.
We are experiencing regular issues with “Cluster task failure” and the sudden spike in the number of processes by Openfire. These processes that are increased suddenly do not reduce at all.
We are in doubt if this is related to network connectivity from the client connections or an issue with the cluster or any issues from the VM Openfire service is running on…

guus · December 18, 2023, 7:42am

Can you please provide more information? Is there anything of interest being logged in the log files when this happens? After it happens, try creating a thread dump. This will tell us what all those threads are doing.

DkW · December 18, 2023, 10:28am

Hi Guus,
THank you for the reply.
Thread dump before and after
xmpp-01-jstack1275202.txt (371.8 KB)
xmpp-01jstack_after.txt (411.0 KB)

DkW · December 18, 2023, 10:31am

ALso openfire logs registering cluster timeout on the same interval during the spike

openfire (13).log (21.0 MB)

DkW · December 18, 2023, 10:46am

Also we hardly have more than 1700 clients connections.
Our openfire has two vms with 32 gb ram each, out which has 20gb is allocated to heap.

DkW · January 2, 2024, 5:20am

We haven’t been able to get any findings on why the clustering times out given our very minimal number of connections and configuration. Any help would be much appreciated.

guus · May 7, 2024, 10:32am

Apologies for taking so long to respond. This dropped of my radar.

Sadly, the information that you’ve provided doesn’t immediately show the source of the problem. One thing that does strike me as odd is that your log files are full of MUC-related errors, notably errors that say something like:

Unable to find occupant with nickname ‘<somevalue>’ in room ‘<somevalue>’

This may indicate that there is data inconsistency in your server. That may or may not be related to the problem at hand.

I’m afraid that more diagnostic efforts are needed to get the bottom of this problem.