OPENFIRE 4.2.3 decreased performance, high CPU load on server

We have been using OPENFIRE 4.2.3 (12 GB RAM and 8 core CPU), currently it server 1000-2500 simultaneous connection at any time of day. During peak hours (4 hours) where server serves 2500 connections, a high CPU load of about (10-20 ) on server is noticed.
Once we get a high CPU load on server, OPENFIRE performance decrease, new clients trying to connect gets Timeout exception on connection, messaging slows down.

We are using MariaDB in connection with OPENFIRE.
On checking the database query statistics, query related ofUser table is taking significant time.

Screenshot%20from%202019-02-21%2010-35-40

OfUser table might contain 1 million entries.

Cache setting are in the image below,

From what I can see is high load CPU is in connection with DB Query taking time, what should we do in this scenario ?

  • Should we start purging inactive users data on server ?

  • What steps to take to optimize cache ?

  • Why is queries taking time, is indexing not implemented on ofUser table ?

I’m sorry to hear you’re running into these issues.

Although I have no clear-cut solution, here are a couple of things that I suggest looking into first:

  • Openfire should be able to handle 2500 concurrent client connections (assuming that they’re regular chat users) with ease.
  • Openfire 4.3.2 offers much improved cache statistics.
  • The cache statistics show the size of the cache in bytes not in elements.
  • To what extend are you sure that the load that you’re experiencing is related to the ofUser table?
  • The username column of the ofUser database table should be the primary key of that table. It does not hurt to check this.
  • Is the database also experiencing high load?
  • Is the database located on a physically different machine? If not, you could consider doing so, to free up resources.
  • Is the database connection pool sufficiently large?
  • Your database will have various tools to evaluate its runtime performance (slowlogs, processlists, etc). Use these to determine if a problem is detected.
  • Use a Java profiler to determine what part of Openfire is under stress.
  • Openfire 4.3.2 offers much improved cache statistics.

We will update as part of the solution to our high load problem but we want to debug the existing version to find the root cause of why this is happening.

  • The cache statistics show the size of the cache in bytes not in elements .

Can you share optimial cache settings we need to put in system.properties for the dataset of our size. around about 1 million users on OPENFIRE and their corresponding roster as well
2500 connections at one time

  • To what extend are you sure that the load that you’re experiencing is related to the ofUser table?

We will employ database profiler to check this.

  • The username column of the ofUser database table should be the primary key of that table. It does not hurt to check this.

Yes it is in our case as well

  • Is the database located on a physically different machine? If not, you could consider doing so, to free up resources.

it is currently located on same machine. Will try this.

  • Is the database connection pool sufficiently large?

Currently it is 25, should we increase this ?

It’s hard to compute this, as it depends on the average number of contacts in a roster. I strongly suggest using the metrics provided by Openfire 4.3.2, as they will simply tell you what the performance of your cache is. Alternatively, you could try to analyze a heap dump, and see how big roster instances are in your setup.

There’s no point in increasing it without knowing if the 25 that you have right now are actually being used. Openfire shows the current usage of the connection pool in its admin console. Your database will provide monitoring solutions that you can use to verify this as well.