Scaling out Single INstance Openfire Server, only used for sending notifications

Dear All,

We have a system which needs to send notifications to end users. To accomplish this we have created desktop clients which are installed at users end, and connect to a Openfire Server. Our Application Server manages a pool of admin users (using Smack) connected to this server. Whenever it needs to send a notification to a given client, our app server sends it via the admin account (using Smack). So basically, we dont need any groups, rosters, chats as such. Our 1 and only requirement is sending a message (notification) to a particular user via our App Server.

For our Openfire Server :
openfire version 4.2.3
Centos 7
6 cores, 32 GB RAM. Min Heap : 4GB, max Head 12 GB

We have close to 70K users, and all of them login using the desktop client(though there is no other activity they do). Over the past couple of days we are noticing that our server will stop responding (cant open openfire login page), and we are not able to connect. We observe multiple instances of the following error:

START OF ERROR
2021.03.22 13:35:45 WARN [socket_c2s-thread-12]: org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x00021363: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273) …
END

Once we restart openfire, things get back to normal in some time. This is causinga huge issue as this is recurring. I read a lot of literature around openfire (but sadly, all old posts), a lot of them that point to maximum 5K concurrent connections. I want to understand :

  1. What can I do to resolve this issue?
  2. Can Openfire scale to a production grade system supporting Million plus users (Given my basic requirement)
  3. If yes, how can I tune it for better performance.

APologies if the question is missing some trivial details. I am happy to provide settings/configs detail as required.

Thanks
Abhi

Openfire 4.2.3 is rather old, it is difficult to ask for free support from here when running such an old version. If you are able to reproduce issues with the current release, you are more likely to get responses. The 5K issue is typically a Linux kernel tuning issue with not allowing more than that many TCP connections per running process. What sysctl tuning have you done for CentOS 7?

Hi akrherz,

Thanks for your prompt reply. We havent done any explicit sysctl tuning. Our ulimit (-n) is set at 100000.

1.Can you please let me know which parameters at ulimit should I be looking at.
2.Also I see that their is high consumption of RAM. I tried with 28G max heap size and its fully consumed. I am afraid it may go into Overflow soon. I can always increase memory but right now the load is of around 30k users. Are there certain processes/functionalities that I can stop in openfire so I can get some benefit.
3. Can you point me into the direction of some literature on how to scale out openfire for 100k concurrent connections (as I said earlier, my requirement is for just sending notifications)

I will also look at upgrading my server though I dont see I can do that very soon.

Many Thanks
Abhi

Greetings,

  1. ulimit -n looks good, but you have it set to 100k and are hoping to scale to 100k. It seems like you need to go higher. You may also need to look at the tcp port range to allow for more port connections too. Articles like the Linux C10M may be useful.
  2. You should take heap dumps and inspect for memory leaks. That old of openfire has issues with memory leaks and invalid cache memory size accounting. If you want support for such an old openfire, you should consider professional partners.
  3. Searching around in this forum is about the best there is to offer. Openfire is not really well known for scaling well, unlike ejabberd or some other XMPP servers. It can be done, but you have to understand how Java works for things like memory GC.

To add to what @akrherz said:

  • for the use-case that you’re describing, the hardware should suffice.
  • the high memory usage is something that I’d advise you to look into, as that seems out-of-place. Memory inspection, as what was already suggested, seems like a good first thing to do.
  • the error that you’re showing is most likely caused by a client dropping the connection, without nicely logging off (send a </stream:stream> element). It is probably benign.
  • for concurrent connections over many tens-of-thousands, look at clustering solutions, as offered by the Hazelcast plugin.