100% CPU Usage

Guus found this:

https://issues.apache.org/jira/browse/DIRMINA-785

and the comments on that MINA issue are very interesting, it also matches the change from 2.0.7 to 2.0.8

Actually, Daryl Herzmann updated OF-883 noting https://issues.apache.org/jira/browse/DIRMINA-785 which appears to be the change that caused DIRMINA-1001, and lists fix version as 2.0.8, so the comment about 2.0.9 being where the change was introduced might have been incorrect.

DIRMINA-785 acknowledges that they should probably have called that release 2.1.0 because it was a substantial API and behavior change.

Yeah so I cloned the MINA repo (ASF Git Repos - mina.git/summary ) and checked out the 2.0 branch where the DIRMINA-995 patch was applied but when running our load tests with those 2.0.10 builds, we still see the same issue with Openfire.

Guus just made some changes on how Openfire handles sessions based on the changes in the newest MINA versions. This should probably be in 05.29 or 05.30 nightly build Ignite Realtime: Openfire Nightly Builds

Or in 1704 build for Windows, when it shows up Openfire - Nightly Windows Build (trunk): Plan summary - Atlassian Bamboo (then press on a build number, artifacts tab, Project Windows distribution files link and download the first exe).

Unfortunately these haven’t cured the issue - you’re better of staying where you are for now. Although this seems like it’s a disaster, we’re pretty sure we know where the bug lies now, and we’re zeroing in on it fast - but for now, we’ve replaced an unbounded loop hit randomly with a message delivery failure hit pretty consistently.

If anybody is feeling lucky and wants to try the 3.10.1 Release Candidate to see if the 100% CPU issues are resolved, please have at it!

Ignite Realtime: Beta Downloads

1 Like

Spoke too soon. IMs were not being delivered and CPU usage on the server spiked up to around 200% again. Messages on my OS X Mac goes nuts too spiking the local CPU usage while making the fan go nuts.

Screen Shot 2015-05-31 at 3.13.50 PM.png

So I tried the 3.9.3 build, since other people said it was working for them, although this build has the same CPU issue as 3.10, and once again, the CPU usage spiked to almost 200%.

Ryan,

That should not be happening. Please double check the /opt/openfire/lib/ folder and ensure that all of the jar files have roughly the same timestamp.

Strange, so before running rpm -Uvh the date/time stamps were May 28th.

[root@im ~]# ls -la /opt/openfire/lib/

total 12380

drwxr-xr-x 2 daemon daemon 4096 Jun 2 2013 .

drwxr-x— 14 daemon daemon 4096 Jun 2 2013 …

-rw-r–r-- 1 daemon daemon 54829 May 28 2013 activation.jar

-rw-r–r-- 1 daemon daemon 1815677 May 28 2013 bouncycastle.jar

-rw-r–r-- 1 daemon daemon 112341 May 28 2013 commons-el.jar

-rw-r–r-- 1 daemon daemon 641570 May 28 2013 hsqldb.jar

-rw-r–r-- 1 daemon daemon 407502 May 28 2013 jasper-compiler.jar

-rw-r–r-- 1 daemon daemon 77056 May 28 2013 jasper-runtime.jar

-rw-r–r-- 1 daemon daemon 74639 May 28 2013 jdic.jar

-rw-r–r-- 1 daemon daemon 294726 May 28 2013 jtds.jar

-rw-r–r-- 1 daemon daemon 2848 May 28 2013 log4j.xml

-rw-r–r-- 1 daemon daemon 362975 May 28 2013 mail.jar

-rw-r–r-- 1 daemon daemon 540852 May 28 2013 mysql.jar

-rw-r–r-- 1 daemon daemon 7521230 May 28 2013 openfire.jar

-rw-r–r-- 1 daemon daemon 448014 May 28 2013 postgres.jar

-rw-r–r-- 1 daemon daemon 132425 May 28 2013 servlet.jar

-rw-r–r-- 1 daemon daemon 9750 May 28 2013 slf4j-log4j12.jar

-rw-r–r-- 1 daemon daemon 71213 May 28 2013 startup.jar

Post upgrade all but one, bouncycastle.jar, changed to May 6th.

[root@im ~]# ls -la /opt/openfire/lib/

total 16524

drwxr-xr-x 2 daemon daemon 4096 May 31 18:14 .

drwxr-x— 14 daemon daemon 4096 May 31 18:14 …

-rw-r–r-- 1 daemon daemon 54829 May 6 2014 activation.jar

-rw-r–r-- 1 daemon daemon 260437 May 6 2014 bcpg-jdk15on.jar

-rw-r–r-- 1 daemon daemon 598674 May 6 2014 bcpkix-jdk15on.jar

-rw-r–r-- 1 daemon daemon 2732684 May 6 2014 bcprov-jdk15on.jar

-rw-r–r-- 1 daemon daemon 1815677 May 28 2013 bouncycastle.jar

-rw-r–r-- 1 daemon daemon 112341 May 6 2014 commons-el.jar

-rw-r–r-- 1 daemon daemon 641570 May 6 2014 hsqldb.jar

-rw-r–r-- 1 daemon daemon 407502 May 6 2014 jasper-compiler.jar

-rw-r–r-- 1 daemon daemon 77056 May 6 2014 jasper-runtime.jar

-rw-r–r-- 1 daemon daemon 74639 May 6 2014 jdic.jar

-rw-r–r-- 1 daemon daemon 294726 May 6 2014 jtds.jar

-rw-r–r-- 1 daemon daemon 2848 May 6 2014 log4j.xml

-rw-r–r-- 1 daemon daemon 362975 May 6 2014 mail.jar

-rw-r–r-- 1 daemon daemon 954041 May 6 2014 mysql.jar

-rw-r–r-- 1 daemon daemon 7603130 May 6 2014 openfire.jar

-rw-r–r-- 1 daemon daemon 588974 May 6 2014 postgres.jar

-rw-r–r-- 1 daemon daemon 132425 May 6 2014 servlet.jar

-rw-r–r-- 1 daemon daemon 9750 May 6 2014 slf4j-log4j12.jar

-rw-r–r-- 1 daemon daemon 71479 May 6 2014 startup.jar

Ryan,

The output you show makes little sense. Why are the files dated 2013 and 2014? Having that old bouncycastle.jar will cause CPU spinning, so please try updated into 3.10.1RC and then ensure that jar file is not in /lib/ and also ensure the files are dated May 31, 2015.

Didn’t tryed the 3.10.1 RC yet…no time for more problems now…want to be sure its safe to update

I remember the problem form me is with OF 3.9.3 in centos 5.11 with 4GB (2Gb reserved for openfire)

We have migrated some apps to another server to liberate some load from the server where the openfire is…the problem seemed to have disappeared for some weeks but now it started again…‘normally’ happens 2, 3 times a day

here goes some more logs

I’m using 3.10.1 RC on my test box (only 2-3 users connected usually) and i have to reboot the server every hour or more often as it doesn’t allow new logins anymore.

I didn’t have such problem with 3.10.0 betas and final release.

And it looks like the issue arise sooner if i do many disconnects and logins in my clients.

FWIW, I’ve been using the build from Andi Heusser (see page 8) with success for nearly two weeks now. He back-revved Mina to 2.0.7 and left everything else alone. It seems to me that OF 3.10.1 should be released with Mina 2.0.7 while all the kinks with the newest version of Mina get worked out in the meantime. Just my two cents, and I certainly appreciate the developers’ and volunteers’ efforts.

I tried Andi’s patched release but it breaks my admin console- user’s cant log in via LDAP and Jetty throws a 500 error.

As a result I’ve rolled back and will wait for an official OF patched release.

I installed the 3.10.1 RC on our server and changed the networking to point at the new server. Whilst it appeared fine after hours, when people started to connect to it this morning it soon locked up and refused connections.

Any update for this problem?

My users are going to kill me, because of 2-3 times restart openfire service every day.

I set a cron to restart servcie everyday, but sometimes i need to restart it again after 3-4 hours.

Me too. My openfire 3.10.0 is on RHEL/CentOS 6.6 x64.

Also thanks for his fix suggestion.

I too am having problems. We are looking to deploy openfire to approx 100 users and then serve another 4000 via the fastpath addin.