I am runing xmpp server using openfire 4.3.2 with mysql database. Java memory I have set is : 12Gb . The problem is on linux( tried ubunut,debian,centos) cpu usage goes up to 100% after a few hours and keeps all time 100%. I have installed on windows server and the openfire process runs smoothly for days without any issues. (both windows and linux have same active users but linux cpu usage is too high)
hardware : 4 cpu. 16gb ram . Mysql is on external server and no slow queries logged.
Today I have determined the threads that spike full cpu . Ther are all NioProccessor (apache mina). here is the thread dump
"NioProcessor-3" #55 prio=5 os_prio=0 tid=0x00007f0408002000 nid=0x2a6a runnable [0x00007f0476dee000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000004c486b748> (a sun.nio.ch.Util$3)
- locked <0x00000004c486b738> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000004c420ccb0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:112)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:616)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x00000004c4d03530> (a java.util.concurrent.ThreadPoolExecutor$Worker)
Yes I am sure. I have this problem from a month only on linux servers . Today I have made dump of all the openfire threads and compared the thread pid which takes full cpu load. All cpu load came from Nioprocessor. On windows server no such problem.
I’d love to find a way to reproduce this problem. I’m running various versions of Linux, but I don’t see that issue occur.
What you suffer from matches with what CSH has found. Could you double-check the JARs in your lib folder, and see what version of Apache MINA your Openfire is running with? Maybe a weird upgrade/deploy scenario left multiple versions?
In logs there are too many exceptions “Connection reset by peer” . every second an exception like this is thrown
2019.05.16 10:10:41 WARN [socket_c2s-thread-10]: org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x00001172: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_212]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_212]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_212]
at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_212]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_212]
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273) ~[mina-core-2.0.7.jar:?]
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44) ~[mina-core-2.0.7.jar:?]
at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690) ~[mina-core-2.0.7.jar:?]
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664) ~[mina-core-2.0.7.jar:?]
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653) ~[mina-core-2.0.7.jar:?]
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67) ~[mina-core-2.0.7.jar:?]
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124) ~[mina-core-2.0.7.jar:?]
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) ~[mina-core-2.0.7.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
I have found the issue . The apache mina 2.1.2 caused this issue . I have tested mina 2.0.7 and 2.0.21 , The openfire server now works smoothly with mina 2.0.21.
Maybe I am missing something in the thread here, but how were you using MINA 2.1.2 with Openfire 4.3.2? MINA was only upgraded recently here [OF-1740] - Ignite Realtime Jira and scheduled for 4.4.0 Openfire
Yes right. I have been struggling with this issue from a month. First I thought the high cpu usage was caused by java GC but the apache mina 2.1.2 is the problem. Now with mina 2.0.21 and openfire 4.4.0 everything works smoothly and cpu usage very normal with high load user count.
With mina 2.1.2 , the first 400 active users cpu usage was normal . active users exceeds 500 , Cpu usage becomes 30-40% when active users count reaches 700 the cpu spikes to 90-100% and even if the active user count go back under 500 the cpu load remains 90-100% until we restart openfire.
We have disabled compression from a while. I have a suggestion but I do not know if it is possible or not . Why we do not develop custom network layer instead of apache mina?
Actually one contributor in chat suggested that and he said he will try to provide a patch to move from MINA to some other framework (don’t remember the name). But it was many months before and no patch yet