Openfire 3.7 memory use

scott_cameron · November 1, 2012, 6:49pm

Hello,

I’ve got a decently sized Openfire 3.7 instance. We’re primarily using the service for P2P communications for devices, and not really using many features but login, messages. They are not users, but devices. They’re not using file transfers, offline messaging, etc. Just a webserver communicating to a bunch of devices via XMPP.

The total connection size is nearly 100,000 concurrent right now. Openfire has a 10GB heap assigned to it, and it fluctuates between 5-8GB between garbage cycles. GC cycles take a very long time.

We’ve ended up at this set of Java options, although I don’t know for certain that they are optimal:

-server -Xmx10g -Xms10g -Xmn2500m -XX:MaxPermSize=128m -XX:PermSize=128m -XX:+AggressiveOpts -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31 -XX:CMSInitiatingOccupancyFraction=40 -XX:+CMSPermGenSweepingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+PrintGCDetails -Xloggc:/tmp/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

The JVM process itself bubbles up to ~ 20 GB of system memory:

daemon 11311 226 90.7 25727452 22393632 ? Sl Oct03 95286:30

And the system load average goes between 5-6 on a 4 core server.

11:45:28 up 49 days, 13:34, 1 user, load average: 6.06, 5.93, 5.47

We’re thinking about upping it to an 8 core server, but in any case, there is probably some scaling issue we’re missing here.

We do use TLS with a real certificate as well, but we are able to disable this if it’s a problem temporarily. We had experienced big problems with the compression setting in the past and have it disabled.

Should we think about sharding out the users at this point, or is there some magic tweaks we can make to increase openfire’s performance?

The base OS is CentOS 5.8.

akrherz · November 1, 2012, 8:07pm

Hi, Wow, that’s cool stats you shared. I think you are doing very well for 100k connections! The upcoming 3.7.2 release will contain some fixes that should help performance with that many connections. If you can test subversion trunk, that may be a great.

scott_cameron · November 1, 2012, 8:25pm

Most of the 100,000 connections are devices on the consumer premises. There are a handful – let’s say 100 – from a webserver farm.

When the user connects to the website, they are able to access the device remotely. The web site provides a web interface for the user to access their device and perform commands remotely. To do this, the web site signals the device over XMPP.

The device receives the command and performs an action.

This could range from checking the local storage of the device to asking it to stream video/audio to a remote RTSP server.

The customer devices are mostly just signalling that they are connected. We have a custom plugin for presence, which updates our database so the web farm is aware that the device is available.

Is it normal for the memory use to be so high? Is there anything we can do to shrink it in general?

We’re finding now after analyzing the garbage collection log that the stop-the-world pauses are generally quite high, and likely the culprit for our poor performance. We might try tweaking the young gen to expand it and hope that objects are simply released in young gen before getting promoted.

Even so, on a large enough time scale, the memory requirements seem pretty steep. With a 10GB heap @ 100K users, we’re tracking at 100KB memory per user – when we’re not really requiring any in-openfire-memory state.

Is there any way to offload some of, whatever it’s hanging on to, in to memcached or other dumb memory system? JVM’s after a certain memory footprint are trouble.

guus · November 1, 2012, 8:49pm

That’s quite an impressive setup. I can’t think of any quick fixes, and I think some trial and error would probably be required for things to improve. Is there any way that you can do that without affecting your production environment?

Some things that come to mind (in no particular order):

My first question is perhaps a very obvious one, but just to be sure: is that host actually CPU bound, and not, for instance IO bound? I bet that gc.log file is growing quite fast…

Apart from the plugins that you most likely did not load, there might be some benefit in unloading some modules that you don’t use - that, however, requires a custom build of Openfire. Not sure if that would give you any mileage though.

Did you ever check/profile what parts (lines of code) of Openfire are the most heavy used ones? Did you ever successfully use a profiler, or made some consecutive thread dumps of a running system? That might give us an idea what the bottleneck is that you’re running into.

How many threads are there in the first place? Perhaps the JVM is context switching more than anything else?

Did you consider using connection managers, to offload some of the IO-processing to different JVMs (possibly running on different hosts)?

guus · November 1, 2012, 8:57pm

And, obviously, a very useful thing to analyze is a full heapdump of a running system. That should tell us where all that memory is being allocated.

Walter_Ebeling · November 1, 2012, 10:06pm

We are not in the 100.000 concurrent user range (3500) but I can comment on the profiler. We had very good results in a large project using Appdynamics for profiling and memory bug hunting. It’s in the price range below $10.000 per JVM and helps us tremendously to identify issues in production.

I’ll get one licence from my production and report setup & results.

By the way, one finding I had with the “lite” edition of Appdynamics was a delay in the DB driver caused by the test SQL statement that is per default active. Switching that one off increased throughput for all methods that have a DB connection.

What kind of DB is used for the setup above? We are using PostGREs.

scott_cameron · November 2, 2012, 12:45am

Guus:

These are some helpful pointers, thank you.

On the IO point, I iostat for a few ticks until the GC triggered inside of openfire.

This is a loose correlation, but hopefully you get the idea.

avg-cpu: %user %nice %system %iowait %steal %idle

93.60 0.00 0.49 0.00 0.00 5.91

snippet of gc.log around the same time:

http://pastebin.com/z75HsLSE

the gc.log is 86MB after almost a month. Nothing too big.

I’m pretty hesitant to attach a profiler/debugger to a 20GB JVM. It may be superstitious on my part. The real challenge we face in this arena is loadtesting. While we can simulate some of the load from the consumer devices, it’s tough to replicate the randomness of what it is real users are doing on the system at large scale. Compounding that, none of the staff (myself included) are very strong with Java.

There are 250 threads running now. There are two that stand out as being particularly busy compared to the rest:

http://pastebin.com/D7K7Bryu

Specifically, 11324 and 11325.

I haven’t looked at the connection manager module because the concurrency it implies in the text seems a bit low compared to where I need to be.

On the topic of full heap dump, does Openfire support a builtin dump-to-disk feature? I have seen some Java apps support thread stack dumps and even memory dumps with kill -1 and kill -2.

Walter:

I will have a look at the AppDynamics solution, looks interesting.

Our SQL backend is currenyl MSSQL, but we’re not married to it.

Here’s a screenshot of our Database Statistics page from Openfire itself:

https://docs.google.com/open?id=0B-hpFpGxShgJcXpSODdoQi1Udzg

Since we’re not really using many of the native features of Openfire except realtime signalling, I imagine that it’s not contacting SQL very often.

We’re also using LDAP for authentication, forgot to mention that originally.

scott_cameron · November 2, 2012, 3:20am

Just did a test with disabling TLS.

We had all inbound connections coming on 4530 for both clear/encrypted.

Server -> Server Settings -> Security Settings -> Custom -> TLS not available

After large amounts of the users ended up reconnecting, the memory use and load average dramatically dropped – I have less than 1.0 load now, pretty crazy.

We’re going to investigate offloading explicit SSL with haproxy and having the clients use an explicit SSL port rather than start TLS on the shared port. But it would be interesting to see if anyone else has stories of large volumes of SSL exhibiting problems.

guus · November 2, 2012, 10:15am

Hi Scott,

Creating heap- and threaddumps is something that is handled by the JVM implementation - there is nothing specifically needed in an application (such as Openfire) ran by the JVM for it to be able to generate a heap- or threaddump. You can use (amongst others) standard Sun/Oracle tools to generate them.

In short: to have a threaddump written to standard-out, issue a kill -3 on the process id. This won’t abort the process itself.

To create a (hprof) dump of the heap, use a tool called jmap: http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html Please be aware that this will pause the running process for the duration of the dumping activity - which will be at least a few minutes.

It helps to execute the commands through the same user that is running the process. This avoids permission issues.

I can appreciate your reluctance to hook up a debugger. As a light-weight solution, you could consider creating a few thread dumps in succession. As the thread IDs are printed, you can get some idea of what code is being executed. Not nearly as detailled as profiling, but it could give you some hints.

What exactly do you mean with the limited concurrency in connection managers? As far as I know, they should be as concurrent as Openfire itself (much of the relevant code is an exact copy). Using connection managers you can offload much of the IO-related overhead (compression, SSL, etc) to a different JVM (optionally running on a different host). You’re not limited to using just one either - if you can load balance your traffic (do your clients resolve hosts through DNS SRV?), you can easily add multiple CMs, allowing you to scale up those entities that do a good part of expensive number crunching.

scott_cameron · November 2, 2012, 6:56pm

Quick update. We took an outage last night, updated the system to 8 CPUs, and adjusted the JVM options. Here are our new flags:

-server -Xmx14g -Xms14g -Xmn4g -XX:MaxPermSize=128m -XX:PermSize=128m -XX:+AggressiveOpts -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=40 -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -Xloggc:/tmp/gc.log

We bumped up Xmn, and adjusted Xmx a bit more to accommodate for the change. The theory was that, even if the increase in young gen had no impact, we would have overcompensated in overall heap so it wouldn’t matter.

We added UseCompressedOops, which seems like a best practice we missed along the way.

We changed MaxTenuringThreshold to 1 (from 31).

Lastly, we switched from SunJDK6 to OpenJDK6.

Our goal was to have the bulk of GC work happen in young gen, which is super fast.

We’ve been live now for about 14 hours and it seems to be working well. In the previous setup, immediately after restarting Openfire, we would spike to 4 load avg and basically stay there.

XMPP load average for the past month (2 hour avg RRA): https://docs.google.com/open?id=0B-hpFpGxShgJdW1lU1dDdi1VY2c

XMPP connections for same period (2 hour avg RRA): https://docs.google.com/open?id=0B-hpFpGxShgJWkYwYXdMczlIZlE

New load average (5 minute avg RRA): https://docs.google.com/open?id=0B-hpFpGxShgJaDRHYzQwZmlrQjA

The two hour spikes are, basically, bad logic within our device firmware. Every 2 hours they send a message in to XMPP to update our database (via our custom plugin).

The text of the connection manager module spoke along the lines of “several thousands of connections”, which unfortunately is pretty far below the scale I need to think about. I guess we can try it out in a test case and see how it scales for us. We’ll also be trying to get the devices to use explicit SSL so we can offload it via haproxy.

Our clients ping a webservice endpoint to retrieve config for xmpp, rtsp, etc. We can balance it out here in the web logic, especially based on existing numbers of connections, health, and so on.

I think we back on a good track with Openfire – in fact, quite happy with the way it’s performing, since there doesn’t appear to be many users with large concurrency as we have.

guus · November 3, 2012, 1:06pm

Good to hear things are taking a turn for the better.

I do not believe that the number you’re referring to for connection manager concurrency would be applicable to your setup - they should scale roughly as well as Openfire - they’re essentially doing the same work as Openfire is doing, but less of it.

An added bonus of connection manager is that you can apply as many of them as you need. The only drawback is that you’ll need more hardware - but that might be a good trade-off for upgrading the exising hardware, and you’ll get more milage from that one Openfire instance that you have running. I suggest you start playing around with at least one instance.