We have been running some performance tests on Openfire using JMeter and the XMPP Protocol Support plugin (by BlazeMeter), with some interesting results, finding there is a considerable bottleneck using Hazelcast clustering plugin, let me share the info in case you have any ideas:
2 Openfire Machines
6 Cores processors
7 GB RAM
100 roster each
ramp-up: 30 seconds
connect timeout: 10 seconds
- Connect to server
- Login with user
- Set Presence
- Get roster
Meaning that in 30 seconds 4000 users must connect to the cluster, set their presence and retrieve their roster.
The tests run smoothly without clustering on each node separately, also runs OK in cluster but only with one node connected, the problem arises with both nodes clustered when failed transaction rate goes to the roof. These are the graphics of successful vs failed transactions:
One node with no Cluster: Failed rate 1.3%
One node alone with Cluster: Failed rate 3% (still acceptable)
Two nodes with Cluster: Failed rate 65%
The interesting part about the clustering test comes when we checked what is causing the failing: it is the “3. Set presence” step, for some reason that one is causing the huge failing.
Without it the fail rate drops to 0%
We have been trying to tweak hazelcast cache config and garbace collection settings here and there with no success, if you guys have any idea it will greatly help us to understand.