I’ve got a decently sized Openfire 3.7 instance. We’re primarily using the service for P2P communications for devices, and not really using many features but login, messages. They are not users, but devices. They’re not using file transfers, offline messaging, etc. Just a webserver communicating to a bunch of devices via XMPP.
The total connection size is nearly 100,000 concurrent right now. Openfire has a 10GB heap assigned to it, and it fluctuates between 5-8GB between garbage cycles. GC cycles take a very long time.
We’ve ended up at this set of Java options, although I don’t know for certain that they are optimal:
-server -Xmx10g -Xms10g -Xmn2500m -XX:MaxPermSize=128m -XX:PermSize=128m -XX:+AggressiveOpts -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31 -XX:CMSInitiatingOccupancyFraction=40 -XX:+CMSPermGenSweepingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+PrintGCDetails -Xloggc:/tmp/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
The JVM process itself bubbles up to ~ 20 GB of system memory:
daemon 11311 226 90.7 25727452 22393632 ? Sl Oct03 95286:30
And the system load average goes between 5-6 on a 4 core server.
11:45:28 up 49 days, 13:34, 1 user, load average: 6.06, 5.93, 5.47
We’re thinking about upping it to an 8 core server, but in any case, there is probably some scaling issue we’re missing here.
We do use TLS with a real certificate as well, but we are able to disable this if it’s a problem temporarily. We had experienced big problems with the compression setting in the past and have it disabled.
Should we think about sharding out the users at this point, or is there some magic tweaks we can make to increase openfire’s performance?
The base OS is CentOS 5.8.