Use 600% of roster cache

andom · May 25, 2022, 4:59pm

Openfire is using cache extensivily.
I have got 405 ldap user 60 ldap groups. One of the groups is the roster containing all members.
Since I have added all members to that group the members are being provided to all users but I have now a cache issue. Openfire is using 600% of the roster cache that mostlikely causes the issue that joining a room takes almost 1 minute.
See picture attached. There are also other caches extensivly.
How can I increase the cache or how can I avoid this issue.

Versions:
RedHat 8
MariaDB 8
OF 4.7.1
Openldap

guus · May 27, 2022, 8:30am

Create or modify an Openfire system property named cache.username2roster.size to a value in bytes.

From the screenshot, you also want to increase:

cache.LDAPUserDN.size
cache.DefaultNodeConfigurations.size

My advise is to be generous.

guus · May 27, 2022, 8:37am

I think Openfire can be improved out-of-the-box in a number of ways, to help people run into issues like these. I’ve created these trackers for that:

andom · May 27, 2022, 9:34am

Thank you Guus,
after a deeper research I already found it.
But the joining of a room still takes ages. Any clue on this?

guus · May 27, 2022, 11:43am

I can’t tell what’s causing that without looking at additional analytics.

Please provide a new overview of the cache pages, and try to create a thread dump while a client is in process of joining the room.

andom · May 28, 2022, 5:34am

In the morning first connection after a night of inactivity the joining took 30 seconds. noDegug.log

Then:
After a restart of the openfire service joining takes more then one minute. Debug.log
Since I am not patient I reopened the client. Surpricly the joining process went fast. Debug1.log

And a xml log after waiting 10 minutes inactivity. Again 20 seconds of waiting. XML.log
User what was joing : andre.dombrowsky
Rooms: admins@spt-rooms…, cis@spt-rooms… and sysadm@spt-rooms…

Please find screenshots and dumps below.

logs.tar (16.4 MB)

guus · May 29, 2022, 8:38am

I am not immediately seeing a reason for the system being slow. A common issue with LDAP / AD is that Openfire is configured to pull in all directory entries, as opposed to only those that it needs (users & groups). Sometimes, this pulls in data an order of magnitude higher than what Openfire needs, which is where slowdowns can occur.

Try looking at the “users” and “groups” that Openfire recognizes in the Openfire admin console. If those lists contain a lot of entries that aren’t really users or groups (stuff like administrative accounts, accounts for people that don’t use Openfire, etc), then you should reconfigure the LDAP integration by putting in place more filters.

speedy · May 29, 2022, 3:51pm

Have you always had performance issue with joining a MUC? Or is this a new issue?

WebGreg · May 30, 2022, 7:16am

Java Memory - what is the percentage of use?

andom · May 30, 2022, 11:46am

@speedy
I had first around 60 user. 50% have been online. Joining a MUC took about 2 seconds.
I am using one posix group containing all user used for the roster and around 50 other groups used for granting access to MUCs. Most groups were emtpy.
After the import of the rest of the users (350) … so we are talking about 405 user in total in the roster group, and in addition adding them to their related groups for getting access to their MUC, the joining takes like described above.
But the amount of users what looging in has not changed due to the stop of rolling it out. As soon as I noted the joining takes too long I stopped the roll out.

andom · May 30, 2022, 11:48am

@WebGreg
Java memory stays between 20 and 50%, even when I am joining a MUC.
In total I have got 2 GB just for JAVA

speedy · May 31, 2022, 1:12am

How many users do you have signed in and joined to the MUC before you start seeing issues?
Are you running openfire in a cluster?
Any custom plugins or clients?

andom · May 31, 2022, 4:17pm

@speedy
Before the issue I had 60 users joining 1 MUC. (granted through a ldap group)
Those 60 also are/were broken down into a few groups what also grant access to other MUCs.
But this are just 5 more MUCs.
This worked perfectly.
After the import of 350 additional user the issue came up.

But I dont think it’s the groups or the ldap or the MUCs.
It is the roster.
It has 405 user allowed to use Openfire. 350 of them haven’t logged in yet.
I have been trying a different group what does not contain all 405 members. I just have taken the group with the 60 user and the joining just takes a jiffy.
So something is wrong with the global group or Openfire has issues with users waht have never had a session.

speedy · May 31, 2022, 4:22pm

thanks for the info. I’ll try to reproduce this.
my environment is a little different, I’ll be using windows and AD, but if its an open fire bug, it should still be reproducible.

speedy · May 31, 2022, 9:41pm

I think im going to need some bots…
I created 1000 test users in AD
created a group called “BigGroup” and added all 1000 test users as members
shared BigGroup as a roster group to everyone
Created a MUC with “members only” and selected BigGroup
was able to sign in without any delays
@guus may need your help with bots!

guus · June 1, 2022, 8:47am

Can you create a couple of thread dumps when the join process is in progress? That should tell us what code is being executed, which very likely is the code that is being slower than expected.

@speedy Bots, you say? Those probably will be Smack-based client connections. What exactly do you want them to do?

speedy · June 1, 2022, 11:00am

I guess just connect, load the roster, join the muc, and eventually disconnect after a few minutes so I can see if I can reproduce the issue. I tried jmeter, but was not successful. I keep getting a
: error in xmpp sampler: org.jivesoftware.smack.smackexception$noresponseexception: null error when trying to make the initial connection

andom · June 1, 2022, 2:28pm

Will do so asap.
Today I have played around with the groups.
Created new group, added a few existing users and 400 new created users.
Used this group as roster.
Result a little better:
iOS Snikket 5 secs
iOS Siskin 5 secs
iOS ChatSecure 5 secs
Android Siskin 4 secs

Then I increased the Java memory to 4096 Byte and gave the openfire 2 more cores
No changes

Then added a index more to the ldap because it said there is no index for the field I am using in the filter of OF.
No improvement

Now I went back to the origin group added also the 400 test user, so there are now 800 user member of
and it still takes arround 5 secs joining a MUC.

So that sound even better the 30 secs at the beginning but maybe it still can be improved, unless you say 5 secs are good.

speedy · June 1, 2022, 2:32pm

looks like the xmpp plugin for jmeter could use an update. its using smack 4.1.0 alpha? anyway, beyond my capabilities to update.

speedy · June 1, 2022, 2:34pm

4-5 seconds doesn’t seem unreasonable to me. It could be a client issue. what are the results if you use a desktop client like spark?