Openfire 4.6.4 struggling with 500 users

I am trying to perform a load test on Openfire, with around 500 users in a room the server becomes extremely slow (the admin portal becomes unresponsive) and I can’t connect any new users (not even to different rooms).

My current setup is using cluster, I have 3 replicas.

Anyone had similar issues? What you found out to be the best way to load test it and optimize it?

What database are you using? Are you able to instrument the database and ensure it is keeping up with the query traffic openfire is sending it?

Using postgres. Will try to collect more metrics to post here.

What’s the recommended max number of users in a room? I am expecting to have huge rooms, so I didn’t have the same issue by connecting a few thousands users (only connection) but with 500 users all in the same room it is struggling.

Also is there anything out of the box in Openfire that would block this load identifying this as an attack? I’m starting 500~ bots in a matter of seconds from the same IP.

I would expect Openfire to easily handle that, but we’re not explicitly testing for load in a MUC room. Maybe we’ve silently introduced some kind of bottleneck that you’ve now uncovered.

Is it possible for you to create thread dumps while the server is under load (ideally a couple of them, each a few seconds apart)? That would probably show where resource contention is occurring. There are various ways to create thread dumps of Openfire (which is a standard Java application). One of the ways to do this is to use the Openfire “Thread Dump” plugin (although that does involve the admin console, which you mentioned was unresponsive).

Ok, will do that. Also I just found the post about connection managers and it looks like for my situation I should be using connection managers already right?

Development of the Connection Manager has stopped a long, long time ago. I would not recommend using them. They were not designed for logic handling anyway, only for delegating network-io.