Helped but same issue happens after longer period of time
Added “-Xms256m -Xmx768m -XX:+UseG1GC”
No issues for 48 hours
This may not work for everyone, we have used some pretty restrictive memory settings but also have a very small system. Its worked for us, until there is some kind of actual fix to the package we will leave it like this. Just thought id post if anyone else wants to test this.
Where (what file) should I add the “-Xms256m -Xmx768m -XX:+UseG1GC” parameter?
Follow up: Nevermind, found it. Tried this parameter and it didn’t work. In fact, things got worse. Messages/iChat on my Mac (OS X Yosemite) made the fan start going nuts.
Thought I had this issue fixed. Server was up 13 days, but then then the CPU pegged and all clients got booted, but the management web interface still functional.
nohup.out attached (generated with ‘kill -3’ after CPU pegged).
CentOS 6.6 VM on ESXi, 2GB RAM, 1 CPU.
OF 3.10.0, embedded database, AD auth, default JVM (1.7.0_76 Oracle). OPENFIRE_OPTS="-Djava.net.preferIPv4Stack=true -Xms256m -Xmx1024m -XX:+UseG1GC"
We currently have 2 instances of Openfire running on 2 separate servers. one is for our internal use and one is for our external clients. We have about 60 users on our server and 25 on the client server. Since the upgrade to 3.10.0 we have had to restart the openfire service on internal client 3 times and the external client twice. They both have the same specs as well. But this time the external client server has been up for 2 weeks without having to restart the openfire service and I have to restart the internal openfire service tonight after only being up for 7 days and %CPU hovering around 198-200% today for java. And the %MEM for openfire is at 35.2% on the internal server and 6.1% on the client server. I have java on both machines set to OPENFIRE_OPTS="-Xms768m -Xmx3096m" so the memory has never been an issue for me.
When i upgraded from 3.9.3 to 3.10 I suddenly started having the cpu% maxed out over time which seems to be the same as everyone else. Currently both machines are running centos-release-6.5.e16.centos.11.2.x86_64. They both are fully updated and patched as well.
So i restarted the openfire service on both machines last night and already today less than a day later our internal openfire server has java running at %CPU 200. But our other server is still perfectly fine. Its getting very frustrating that this is happening.
Can confirm I am seeing this as well, process jumps to 100% CPU and kicks all users from the server. Its happened 3 times in quick succession today.
Does anyone know if you can safely downgrade to 3.9.3 which seems to be a more stable version according to some posts I’ve seen? This is a new install as of 3.10.0.
We saw this issue as well within 24 hours of upgrading. The openfire server was using 100% of a CPU core, and it stopped allowing new connections. It also generated a traceback whenever I attempted to load the sessions page in the web console (though the console otherwise worked).
Server config is:
OS: CentOS 5.11
Openfire: 3.10.0 (from RPM)
JRE: Oracle Java 1.8.0_45.
The previous 3.9.3 was using 1.8.0_45 as well without issues.
Plugins:
Broadcast 1.9.0
Content Filter 1.7.0
Monitoring Service 1.4.2
Packet Filter 3.2.0
Registration 1.6.0
Search 1.6.0
User Import Export 2.4.0
I did not do any further diagnostics and immediately reverted to 3.9.3, at which point the issue was resolved. (I realize that’s not especially helpful, but there are definitely tons of other people sticking with 3.10.0 that can reproduce this issue, I just want to provide my configuration in case it helps.)
We’re hitting the same issue. Our 3.9.3 server was running on an old squeeze KVM, with 1 core and 2gb RAM and never missed a beat. Built a new KVM on ubuntu with 2 cores and 4gb of RAM and OpenJDK 1.7.0_79 - it appeared to work fine when switched over one evening, but by the next morning had stopped responding. Have now tried all the suggestions in this thread (garbage collection, memory allocation and disable ipv6) and the problem is persisting on 3.10.0, even with only 15 users - will wait for a point release to try again and for now have rolled back to our older 3.9.3 server.
Happy to provide any further info if anyone needs it for troubleshooting purposes.
Finally wanting to add my +1- I’m also seeing this on one of my 2 Jabber servers (the “active” one in an active-passive pair). In saying that this server isn’t under any real load as thus far it’s only been used by myself for testing and it’s running on a Dell R320 w. 32GB RAM and a quad-core Xeon processor. Operating system is RHEL 6.5 using OpenJDK 1.7.0_79
I’ve currently got the following JVM options sent (most of these are for Garbage Collection):
So we ran into this issue as well as we were trying to run some load tests on one of our middleware components that connects users to Openfire. With 3.10 it basically failed to login users after about 5 or 6 and at that point Openfire spins. We can still log in to the admin console but you can no longer get the list of user sessions. When we click on the Sessions tab, it just spins.
Running the same tests on 3.9.3 works fine.
Now since some comments seemed to indicate an issue with the Apache MINA libraries (Openfire 3.10.0 Beta - High CPU usage, https://issues.apache.org/jira/browse/DIRMINA-1011), I went ahead and removed the org/apache/mina folder from the openfire.jar and put the 5 Apache MINA jars into the lib folder directly. I first tried 2.0.9 and saw the same issue, then I tried 2.0.8 and still saw the same issue. Then I tried 2.0.7 and boom, now it’s working properly. Now we’re able to run the load test and get well past a hundred users.
So I’d say this is definitely related to some bugs in the 2.0.8 and 2.0.9 versions of the Apache MINA core library.
EDIT: I created and attached a zip file which contains the modified openfire.jar and the 5 Apache MINA 2.0.7 jars if anyone else wants to try this out. All I did to the openfire.jar is that I unzipped it, I removed the ‘org/apache/mina’ folder, zipped it back up, renamed to .jar and copied it back to the Openfire lib folder. Then I copied the 5 MINA jars there and restarted the server.
If you do this, make sure that you either rename the original openfire.jar to something else so it doesn’t end in .jar (e.g. openfire.jar.original) or move it to a different folder.
Thanks for the sleuthing. I’m running your build right now and will report back with my findings. Based on the recent update in the OF-883 issue, it seems very likely that this is caused by Mina or OF’s integration of Mina.
If so, it’s confusing, because that bug is marked fixed, and the comments within indicate they would be releasing MINA 2.0.10 right away.
Looking at git, they were about to release 2.0.10 on December 22nd, then changed their minds and now it’s been 6 months with no changes to the 2.0.x branch:
ASF Git Repos - mina.git/shortlog
It might be an interesting test case to try building the almost-2.0.10 code and see if that addresses the issue similarly to Andi’s trials.
Edit: While I was bouncing around the MINA mailing lists I noticed this:
With regards to DIRMINA-1001, it’s possible that it’s related but I doubt it because I saw the same issue when using 2.0.8 and it seems that new logic was added in 2.0.9. I might try out a local build of the latest 2.0.10 and see if that works.