I have found the issue . The apache mina 2.1.2 caused this issue . I have tested mina 2.0.7 and 2.0.21 , The openfire server now works smoothly with mina 2.0.21.
Maybe I am missing something in the thread here, but how were you using MINA 2.1.2 with Openfire 4.3.2? MINA was only upgraded recently here https://issues.igniterealtime.org/browse/OF-1740 and scheduled for 4.4.0 Openfire
We have more than one server . one server is the version 4.4.0 alpha and the other 4.3.2 . We we upgraded the apache mina version in both servers.
So we gonna see this issue with 4.4.0?
Yes right. I have been struggling with this issue from a month. First I thought the high cpu usage was caused by java GC but the apache mina 2.1.2 is the problem. Now with mina 2.0.21 and openfire 4.4.0 everything works smoothly and cpu usage very normal with high load user count.
With mina 2.1.2 , the first 400 active users cpu usage was normal . active users exceeds 500 , Cpu usage becomes 30-40% when active users count reaches 700 the cpu spikes to 90-100% and even if the active user count go back under 500 the cpu load remains 90-100% until we restart openfire.
Update of MINA was required to get support for Java 10+ in Openfire… https://issues.igniterealtime.org/browse/OF-1697 but that was 2.0.20. And then there was a bug in this version with compression and we had to update it to the latest version https://issues.igniterealtime.org/browse/OF-1718. 2.1.2 was just a minor update later.
We have disabled compression from a while. I have a suggestion but I do not know if it is possible or not . Why we do not develop custom network layer instead of apache mina?
Actually one contributor in chat suggested that and he said he will try to provide a patch to move from MINA to some other framework (don’t remember the name). But it was many months before and no patch yet
I have logged an issue with Apache MINA: https://issues.apache.org/jira/browse/DIRMINA-1111
Openfire 4.4.0 has yet to be released, which exact nightly build / master build are you using?
I’m a MINA committer. A few comments:
- a RUNNABLE thread may just do nothing, if it’s executing a native method. The JVM has no way to know what’s being done in a native call, so it just mark the thread as RUNNABLE. It may just wait for a resource.
- It would be useful to do a ’ top -H -p ’ followed by a 'jstack ’ to get more information about the thread burning your CPU.
MINA 2.1. is pretty much the same as MINA 2.0, we just have added an event method in the API, and fixed an issue caused by the presence of a Compression filter in the chain.
I’m following this thread and the MINA JIRA thread. Side note: Netty is facing the exact same problem, and it’s frequently due to external causes.
The master branch . I build from source as I extended some openfire core like :
- Persisting the room creator so no other owners can remove or change the affiliation of the room creator.
- Check Brute force for not allowing password guessing
- Limit each IP address how many sessions could be opened at once (Mainly to prevent flooding)
- Set the time between each MUC message can an occupant send ( For example an occupant can not send two messages during 500ms period.
I have followed this guide as I firstly thought that the Java GC was causing this problem but all the threads that burning CPU were from Nioprocessor .
@suf126a Just because NioProcessor is burning does not mean MINA is at fault. NioProcessor executes the Non-MINA Application code. Please generate a flamegraph to help us determine which methods are utilizing the CPU time. https://github.com/brendangregg/FlameGraph
@suf126a would you mind testing this fix? The commit of interest is 9274ddad3edce5b8796d98fdb0a9ccbe487a9b9e. I have built these MINA libraries that include this fix (but feel free to build your own, if you prefer):
mina-core-2.1.3-SNAPSHOT.jar (651.7 KB)
mina-filter-compression-2.1.3-SNAPSHOT.jar (13.0 KB)
mina-integration-beans-2.1.3-SNAPSHOT.jar (40.5 KB)
mina-integration-jmx-2.1.3-SNAPSHOT.jar (28.6 KB)
mina-integration-ognl-2.1.3-SNAPSHOT.jar (15.8 KB)
I’ve observed that that the 100% CPU issue is prevented with a very similar fix applied to another environment that suffers from the same problem. Of interest is that the bug that causes the 100% CPU fix is triggered primarily when an irregular situation occurs (in case of the other environment, it appears to be triggered by events being timed out - although we’re still investigating). In other words, it is not unthinkable that with this fix applied, another issue arises, which would be the ‘root cause’ of the problems that you’re seeing.
Thanks for your efforts to fix this bug. I will test this solution today and tell you the results.
I have tested and I confirm the problem has been fixed. Now the server is running from 8 hours and active users above 800 and CPU usage is normal.
Thank you for your efforts.
Excellent! Thanks for testing.
I have filed a ticket for this https://issues.igniterealtime.org/browse/OF-1786
The new version of MINA (2.1.3) has been released. Openfire will be using this version in version 4.4.0.