100% CPU Usage

Bill_Roland · May 1, 2015, 7:05pm

Interesting. I also enabled garbage collection and that really shrunk my memory footprint as well.

dwd · May 1, 2015, 7:34pm

It’ll hopefully be fixed in the 3.10.X series (with luck 3.10.1), but if this is a Mina bug it might be tricky.

JLG1 · May 2, 2015, 7:11am

Same issue here, 3.10.0 on Mac OS X 10.9.5, hits 100% CPU immediately. I’ve implemented the “-Djava.net.preferIPv4Stack=true” parameter and also “org.jitsi.videobridge.ofmeet.sip.enabled=false” to be sure the SIP spamming wasn’t the issue. The admin interface doesn’t respond, though XMPP clients can connect.
dump.txt.zip (7369 Bytes)

Phil_Chetcuti · May 2, 2015, 7:21pm

Environment

Openfire 3.10 (first stable release)

Linux Ubuntu 14.04 64bit

1.8.0_45 Oracle 64 bit Java VM.

LDAP and MySQL for the backend

180 total users, 30 users consistently online, peaking at 120 simultaneous users in the evenings

Plugins

Broadcast, Client Control, Fastpath Service Jingle Nodes, MoTD, Openfire Meetings (note: the Fastpath service was added after seeing the spike errors)

Issue

Like many others, was seeing CPU peaking, even passing 500% in just an hour or two (no more than two hours)

Applied the “java.net.preferIPv4Stack=true” fix, CPU peak would still occur after about 4 to 5 hours.

I reverted my server back to 3.10 beta of 3-24-2015 and my issues are gone, no more CPU peak. Its maxing out at 7%, and averaging 2% now.

I did notice others that did same are saying that the CPU will peak again after a day in service, I can’t answer that since I have a cron job restarting the server once a day for maintenance

ageekhere · May 4, 2015, 11:09pm

Any updates on what is causing this 100% issue?

mc12 · May 5, 2015, 2:37pm

Here goes one more thread dump…hopping these logs can help someone who is looking into this

openfire 3.9.3

centos 5.11

java 8 (with java 7 same problem)

I Removed many plugins trying to find out wich would be causing the problem but the problem persists

These are now the only plugins installed:

Presence Service

STUN server plugin

User Service
nohup3.out.zip (7374 Bytes)

speedy · May 5, 2015, 3:15pm

I’m still NOT seeing this issue

Guest OS on VMWare

Openfire 3.10.0

MS SQL 2012 express

Windows 2008r2

java 1.7.79 (32 bit)SSO with Active Directory/with SSL

Only the search plug in is currently being used.

All clients are running spark nightly 2.7.0 build 669 with jre 1.8.45

For those having the issue…Could it be AV related? If you temp disable AV or uninstall , do you still see the spike? I’ve seen issues with some AV that have caused applications to freak out due to a driver they install for inspecting network traffic.

speedy · May 5, 2015, 3:17pm

Bill,

currently, java 8 breaks SSO. If you are looking to implement SSO, please keep that in mind.

akrherz · May 5, 2015, 3:24pm

@speedy, I suspect you’d need to be using httpbind/bosh in order to see this issue

speedy · May 5, 2015, 3:35pm

Daryl,

I don’t used httpbind/bosh, and currently have it disabled in system properties. If you can PM me instructions on how to test it to see if it causing any spikes, I would be happy to.

Philip_Lembo · May 5, 2015, 4:00pm

We’re also not experiencing this problem. We use HTTP-Bind/BOSH heavily for a web client implemented with Converse.js. Most of our other clients are a modified build of Spark for Windows, a few Pidigns and Trillian. Both CPUs are basically at idle most of the time (around 0.2%) and RAM usage is around 200 MB for around a dozen test users. In production where we’re on 3.9.3 and CPU is around 2% and RAM use about 800 MB to service around 200 regular users a day.

Our upgrade path in dev has been pretty constant. Basically I’ve waited a month or so before going to the next available version. We started on 3.8.1 and moved up from there, including the early 3.10.0 Alpha.

My current dev system is:

RHEL 6.6 x86_64 on VMware ESX

2 CPU, 8GB RAM

MySQL 5.1.73-3.el6_5

Oracle JDK 1.8.0_31-b13

Openfire 3.10.0 Release tar.gz (not rpm, we like to put our app bits on a SAN volume)

We’re using LDAP authentication against an old Sun Directory 5 (load balanced) environment, but have successfully tested with a newer OpenDJ 2.6.0 instance. No photos are allowed on the LDAP directory (keeps the size of entries returned on searches manageable), we encourage users to install their avatars to their local profile from where they propagate out to their rosters.

Plugins:

Search

Broadcast

Content Filter

Load Statistic

Monitoring Service

Openfire Meetings

Packet Filter

Presence Service

REST API

Subscription

Init script (/etc/init.d/openfired - taken from extra/openfired) variables:

export OPENFIRE_HOME=/b001/app/openfire/im-server

export OPENFIRE_USER=jive

export JAVA_HOME=/usr/java/default

Variables defined in $OPENFIRE_HOME/bin/openfire:

OPENFIRE_HOME=/b001/app/openfire/im-server

INSTALL4J_JAVA_HOME_OVERRIDE=/usr/java/default

INSTALL4J_ADD_VM_PARAMS="-server -Xms256m -Xmx2048m -XX:+UseG1GC -Dcom.sun.management.jmxremote"

A couple of things to point out: I have been using G1GC since the beginning. We also switched from OpenJDK to the Oracle JDK (1.8.0_31-b13) when we first started testing 3.10 when it was in Alpha.

Malte1 · May 8, 2015, 8:33am

I have the same problem.

After 2 days 100% CPU Usage

My platform:

Server Uptime:
2 days, 20 hours, 52 minutes – started May 5, 2015 1:10:21 PM
Version:
Openfire 3.10.0
Server Directory:
C:\Program Files (x86)\Openfire
Environment
Java Version:
1.7.0_76 Oracle Corporation – Java HotSpot™ Client VM
Appserver:
jetty/9.2.z-SNAPSHOT
Host Name:
OS / Hardware:
Windows Server 2008 R2
Java Memory
115,49 MB of 247,50 MB (46,7%) used

Stable Openfire 3.10.0 with Java.

10-15 Users

Login into the admin page works but not user can conenct to the chat server.

I stopped openfire on the console and start again. CPU runs normally

Brad_McClave · May 8, 2015, 11:54am

I was just reading the bug report for the BETA version of this. It seems many found a fix by disabling IPv6. Has anyone tried this to see if it resolves the issue?

Sebastien_Weber · May 11, 2015, 6:00am

Disabling IPv6 support doesn’t solve anything unfortunately.

workwyz · May 11, 2015, 6:10am

me too

win5hit · May 12, 2015, 10:43am

Hi there!

I’m running Openfire 3.10.0 on ubuntu 14.04 LTS. Host is a virtual Machine.

And, basically I’m experiencing a high load also. But, users are still able to login even though it’s CPU is 100% .

My Users reside in AD and DB is mysql. I’ve got very few users on it: ~10.

I’ve got Kraken and Openfire Meetings installed.

I read the whole thread and also tried to get a nohup.out.:

Logged in as root
identified openfire process with “ps aux | grep openfiren”

openfire 53717 56.9 30.7 ...

issued “kill -3 53717 > nohup.out”

The command finished “immediately” and no output is produced. (nohup.out is empty and openfire log dir is also not showing any new files)

So, I guess no debug output for me? (Or I have to wait for a crazy amount of times to get this output… Right now its ~1hr after issuing the command)

The other thing that I’ve tried is to identify the hung thread:

root@openfire:/var/log/openfire# ps -T -p 53717 -o pid,tid,pri,time | grep -v '00:00:00' | grep 53810
53717 53810  19 1-01:11:15
root@openfire:/var/log/openfire# ps -T -p 53717 -o pid,tid,pri,time | grep -v '00:00:00' | grep 53810
53717 53810  19 1-01:11:17
root@openfire:/var/log/openfire# ps -T -p 53717 -o pid,tid,pri,time | grep -v '00:00:00' | grep 53810
53717 53810  19 1-01:11:17
root@openfire:/var/log/openfire# ps -T -p 53717 -o pid,tid,pri,time | grep -v '00:00:00' | grep 53810
53717 53810  19 1-01:11:19
root@openfire:/var/log/openfire# ps -T -p 53717 -o pid,tid,pri,time | grep -v '00:00:00' | grep 53810
53717 53810  19 1-01:31:05

And 53810 looks like the hung one.

I did get a strace via:

“strace -p 53810”

...
read(225, "", 256)                  = 0 gettimeofday({1431427222, 304226}, NULL) = 0
gettimeofday({1431427222, 304254}, NULL) = 0
gettimeofday({1431427222, 304362}, NULL) = 0
epoll_wait(49, {{EPOLLIN, {u32=225, u64=225}}}, 4096, 1000) = 1
gettimeofday({1431427222, 304525}, NULL) = 0
read(225, "", 256)                  = 0 gettimeofday({1431427222, 304686}, NULL) = 0
gettimeofday({1431427222, 304715}, NULL) = 0
gettimeofday({1431427222, 304824}, NULL) = 0
epoll_wait(49, {{EPOLLIN, {u32=225, u64=225}}}, 4096, 1000) = 1
gettimeofday({1431427222, 305026}, NULL) = 0
...

and this keeps fliying past to me at awesome speed!!!

not like a few a second, feels like multiple dozen per second!

I’m not any further in finding the cause why the thread loops… but maybe this helps somebody else (with knowledge of openfire code) to debug further.

To me it’s looking like a totally crazy time function

I’ve not restarted the openfire service yet, if you need some more stuff let me know!
Kind regards!

Sebastien_Weber · May 12, 2015, 11:13am

Hi,

About the kill command, you just have to type “kill -3 53717”, the output will be automatically written in the nohup.out.

Kind regards,

win5hit · May 12, 2015, 11:58am

Hi,

thanks for your reply. I’ve also issued this command (without stdio redirect to file) in the meantime.(also over an hour ago)

Up to now there is still no nohup file.

Kind regards

akrherz · May 12, 2015, 8:07pm

When you ran the kill -3 53717 > nohup.out, you ruined that file for writing by openfire. You’d have to restart openfire to re-establish its writing ability to that file

dwd · May 12, 2015, 8:51pm

Maybe. But Linux is strange:

proc - Linux file deleted recovery - Stack Overflow

Or:

You might well find that /proc/53717/fd/1 has the original file contents you want.