Urgent help to scale before explosion of our server?

drottemberg · March 14, 2020, 2:54pm

Hello the community, our application is based on Openfire and I think we are reaching the max, and I would need help to scale. I am happy to hire consultant too.

We have about 450,000 users and getting close to 5,000 (xmpp/websocket) silmultaneous connections at any given time plus thousands of requests every second to our https servlet hosted on OpenFire with our plugin.

I am using only 1 Openfire server 4.1.1.

Anyone knows the best solution to scale? Openfire Connection Managers, load balancing, clustering?

I need to be able to handle 15,000+ within the next month.

Thank you so much for your help

guus · March 15, 2020, 10:32am

Let’s see if we can make that work for you. To manage expectations: I don’t think that there is a “one-size-fits-all” solution to your problem, especially since you have custom code.

As a side-note: your version of Openfire is quite old. It is worth considering to upgrade before you try to tackle the capacity issue. The upgrade itself can be significant - you might want to do that as an explicitly different step/project.

As for the capacity challenge: My first thought would be to see if it is possible to offload the “many thousands of servlet requests” that your custom plugin is doing (as that will add considerable load to the server). If this is feasible at all depends on your architecture and implementation, of course. Not knowing anything about that, I can only give you very broad suggestions. See if your plugin is (or can be migrated) to a Component-based implementation, then there’s a good chance that you can turn it into an external component (using the Whack library). This allows you to run the component on a different server, offloading the system resource usage to another machine, freeing up cycles for Openfire.

Stay away from Openfire Connection Managers. These were very useful 10 years ago, but the development efforts stopped at that time. Also, I’m not sure if they even support HTTP-Bind. Even if they do, I wonder if the Connection Manager even functions at all. Any bugs in connectivity that is fixed in Openfire in the last 10 years won’t have been applied to the connection manager code.

Can your clients use TCP sockets instead of websockets? Those might scale better.

Try attaching a profiler to Openfire, to diagnose where resource bottlenecks are. This might give you hints as to what configuration settings to modify, or, in case of custom code, if there are bottlenecks there.

Clustering will also help, but is complex - especially when you’re running custom code. Your code must either be stateless, or aware that it’s running on a cluster (or be deployed on only one cluster node, which often defeats the purpose of running on a cluster). Also, with clustering, you’re not “doubling” the capacity instantaneously. The overhead of running a cluster is significant, meaning that the capacity of two Openfire servers in a cluster is not equal to the capacity of one, non-clustered Openfire instance, doubled. Also, running in a cluster will introduce subtle changes to the behavior of Openfire (for instance, in the timing of certain events, which now sometimes have to be evaluated on multiple cluster nodes. Your mileage will vary. Clustering can certainly be deployed to increase the capacity of Openfire, but it’s not as easy as “switching it on”. My advice is to tackle this as a development project, with proper time for deployment and integration testing.

speedy · March 15, 2020, 3:45pm

from an infrastructure point of view…what kind of machine are you running openfire on? what database are you using, os, etc…

guus · March 15, 2020, 4:06pm

… which reminds me: moving the database to a machine that is different from the one that is running Openfire makes sense. That could be an easy win, if you’re not there already.

speedy · March 15, 2020, 4:51pm

also moving to 64 bit java with more memory allocated might also be an easy win; if not already there.

drottemberg · March 16, 2020, 12:36am

Thank you all.

I am running a server with 40 cores and 256GB of RAM. The DB is on a SSD drives.

@guus could you explain the part of making the plugin external? I think I use a lot of internal OF API like usermangers, xmpp…

Would you do any consulting by any chance?

guus · March 16, 2020, 8:26am

Plugins often use one of these three implementation flavors (or combine multiple):

PacketInterceptor
IQHander
Component

In the case of Component (which is an addressable entity), it is often feasible to rewire the implementation to run as a separate process, on a different machine even (using the Jabber Component Protocol). If the implementation depends on a lot of the Openfire API, then it becomes more complex to go down that path, but not necessarily impossible.

Yes, I’m the “GoodBytes” listed in the Professional Partners section of this website. My contact information is available through that.