powered by Jive Software

Openfire 4.3.2 cpu goes at 100% after a few hours on linux


#1

I am runing xmpp server using openfire 4.3.2 with mysql database. Java memory I have set is : 12Gb . The problem is on linux( tried ubunut,debian,centos) cpu usage goes up to 100% after a few hours and keeps all time 100%. I have installed on windows server and the openfire process runs smoothly for days without any issues. (both windows and linux have same active users but linux cpu usage is too high)

hardware : 4 cpu. 16gb ram . Mysql is on external server and no slow queries logged.


#2

Today I have determined the threads that spike full cpu . Ther are all NioProccessor (apache mina). here is the thread dump

"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x00007f0408002000 nid=0x2a6a runnable [0x00007f0476dee000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000004c486b748> (a sun.nio.ch.Util$3)
	- locked <0x00000004c486b738> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x00000004c4d03530> (a java.util.concurrent.ThreadPoolExecutor$Worker)

#3

Are you sure that these cause CPU spikes? The thread from your dump seems to be blocked for input - that should not cause CPU load.


#4

Doesn’t Thread.State: RUNNABLE mean, that the thread is running, possibly with full load?

This looks like this https://jira.apache.org/jira/browse/DIRMINA-678

Netty has a similar issue:

See also this:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6403933

Which JDK are you using?


#5

See also this issue:

Appearently the issue is gone with JDK 11.


#6

Yes I am sure. I have this problem from a month only on linux servers . Today I have made dump of all the openfire threads and compared the thread pid which takes full cpu load. All cpu load came from Nioprocessor. On windows server no such problem.


#7

I have tried both openjdk 8 and oracle jdk 8 . Same problem.


#8

I’d love to find a way to reproduce this problem. I’m running various versions of Linux, but I don’t see that issue occur.

What you suffer from matches with what CSH has found. Could you double-check the JARs in your lib folder, and see what version of Apache MINA your Openfire is running with? Maybe a weird upgrade/deploy scenario left multiple versions?


#9

Mina version is 2.1.2 . Now I have reverted back to mina 2.0.7 to see if it is the same problem.


#10

Do you see "Create a new selector. Selected is 0, delta = " in your logs?


#11

or: “Broken connection” ?


#12

In logs there are too many exceptions “Connection reset by peer” . every second an exception like this is thrown

2019.05.16 10:10:41 WARN [socket_c2s-thread-10]: org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x00001172: nio socket, server, null => 0.0.0.0/0.0.0.0:5222) 
java.io.IOException: Connection reset by peer 
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_212] 
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_212] 
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_212] 
at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_212] 
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_212] 
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273) ~[mina-core-2.0.7.jar:?] 
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44) ~[mina-core-2.0.7.jar:?] 
at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690) ~[mina-core-2.0.7.jar:?] 
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664) ~[mina-core-2.0.7.jar:?] 
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653) ~[mina-core-2.0.7.jar:?] 
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67) ~[mina-core-2.0.7.jar:?] 
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124) ~[mina-core-2.0.7.jar:?] 
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) ~[mina-core-2.0.7.jar:?] 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212] 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212] 
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212] 

#13

I have found the issue . The apache mina 2.1.2 caused this issue . I have tested mina 2.0.7 and 2.0.21 , The openfire server now works smoothly with mina 2.0.21.


#14

Maybe I am missing something in the thread here, but how were you using MINA 2.1.2 with Openfire 4.3.2? MINA was only upgraded recently here https://issues.igniterealtime.org/browse/OF-1740 and scheduled for 4.4.0 Openfire


#15

We have more than one server . one server is the version 4.4.0 alpha and the other 4.3.2 . We we upgraded the apache mina version in both servers.


#16

So we gonna see this issue with 4.4.0?


#17

Yes right. I have been struggling with this issue from a month. First I thought the high cpu usage was caused by java GC but the apache mina 2.1.2 is the problem. Now with mina 2.0.21 and openfire 4.4.0 everything works smoothly and cpu usage very normal with high load user count.
With mina 2.1.2 , the first 400 active users cpu usage was normal . active users exceeds 500 , Cpu usage becomes 30-40% when active users count reaches 700 the cpu spikes to 90-100% and even if the active user count go back under 500 the cpu load remains 90-100% until we restart openfire.


#18

Update of MINA was required to get support for Java 10+ in Openfire… https://issues.igniterealtime.org/browse/OF-1697 but that was 2.0.20. And then there was a bug in this version with compression and we had to update it to the latest version https://issues.igniterealtime.org/browse/OF-1718. 2.1.2 was just a minor update later.


#19

We have disabled compression from a while. I have a suggestion but I do not know if it is possible or not . Why we do not develop custom network layer instead of apache mina?


#20

Actually one contributor in chat suggested that and he said he will try to provide a patch to move from MINA to some other framework (don’t remember the name). But it was many months before and no patch yet :slight_smile: