Let’s see if we can make that work for you. To manage expectations: I don’t think that there is a “one-size-fits-all” solution to your problem, especially since you have custom code.
As a side-note: your version of Openfire is quite old. It is worth considering to upgrade before you try to tackle the capacity issue. The upgrade itself can be significant - you might want to do that as an explicitly different step/project.
As for the capacity challenge: My first thought would be to see if it is possible to offload the “many thousands of servlet requests” that your custom plugin is doing (as that will add considerable load to the server). If this is feasible at all depends on your architecture and implementation, of course. Not knowing anything about that, I can only give you very broad suggestions. See if your plugin is (or can be migrated) to a Component-based implementation, then there’s a good chance that you can turn it into an external component (using the Whack library). This allows you to run the component on a different server, offloading the system resource usage to another machine, freeing up cycles for Openfire.
Stay away from Openfire Connection Managers. These were very useful 10 years ago, but the development efforts stopped at that time. Also, I’m not sure if they even support HTTP-Bind. Even if they do, I wonder if the Connection Manager even functions at all. Any bugs in connectivity that is fixed in Openfire in the last 10 years won’t have been applied to the connection manager code.
Can your clients use TCP sockets instead of websockets? Those might scale better.
Try attaching a profiler to Openfire, to diagnose where resource bottlenecks are. This might give you hints as to what configuration settings to modify, or, in case of custom code, if there are bottlenecks there.
Clustering will also help, but is complex - especially when you’re running custom code. Your code must either be stateless, or aware that it’s running on a cluster (or be deployed on only one cluster node, which often defeats the purpose of running on a cluster). Also, with clustering, you’re not “doubling” the capacity instantaneously. The overhead of running a cluster is significant, meaning that the capacity of two Openfire servers in a cluster is not equal to the capacity of one, non-clustered Openfire instance, doubled. Also, running in a cluster will introduce subtle changes to the behavior of Openfire (for instance, in the timing of certain events, which now sometimes have to be evaluated on multiple cluster nodes. Your mileage will vary. Clustering can certainly be deployed to increase the capacity of Openfire, but it’s not as easy as “switching it on”. My advice is to tackle this as a development project, with proper time for deployment and integration testing.