BOSH latency with Apache and Openfire

I am experiencing an issue with Latency with my BOSH setup and looking for advise / ideas on where the bottle neck might be.

My Setup.

  • Windows,
  • Apache 2.23
  • Openfire 3.7.1
  • Candy 1.0.9

When I first had users test the service, it was great until about 50 connections, then I had major reports of latency and some dropped connections. Quickly it became unusable

This test was running on a virtual server so I replicated it on one of our internal servers to rule out a hardware issue and a similar thing happened.

During both tests, CPU and RAM were fine so it did not point to a hardware issue and two different networks were involved (1 internal, 1 external) and so I think that is also in the clear.

To determine where the bottleneck might be, I setup 2 identical rooms. One in Openfire’s spank directory and the other in Apache’s htdocs directory.

I then had 50-100 users join the room: **http://example.com:7070/testroom **which had no issues. It was fast and performed very well with the data being served up by openfire’s Jetty server. (Good news for Openfire)

I then had 50-100 users join the room: http://example.com/testroom which began to have problems again at about 50 users. The issues were with latency and disconnects. This room was served up via apache and a .htaccess file proxying the connection over to port 7070 on the same server (to Openfire).

So based on this test, it would appear the issue is an apache bottleneck.

I am just wondering if anyone has any thoughts on this.

Also, if you have a large bosh deployment, whether or not they have seen that with Apache and if there are any apache tweaks/settings that may fix this issue.

Ultimately I am looking to connect about 300 users with them each being in 1-2 rooms. I would use port 7070 / 7443 directly (jetty) but the will be locked in our live setup so I am limited to port 443 (7443 is blocked externally because of PCI compliance, Openfire’s port 7443 uses a weak cipher)

Any help is appreciated. Thanks.

I believe I found the issue. I am testing tomorrow but the numbers line up.

For those interested, in apache there is an include for the Server-pool management

Server-pool management (MPM specific)

#Include conf/extra/httpd-mpm.conf

Within that file, there is a section

ThreadsPerChild 150

MaxRequestsPerChild 0

In researching Apache’s ThreadsPerChild setting, I found that the default when not specified is 64 which matches my issue (factoring overhead of inital page loads)

To fix (I hope) I have uncommented httpd-mpm.conf and set ThreadsPerChild to 1000.

Upgrade to openfire 3.7.2 and use websockets directly to Openfire

I am doing some websockets testing on a test server to get ready for that option but in our production environment I dont want to upgrade to an alpha release of openfire for obvious reasons.

The current nightly build will become 3.7.2 sooner than later. It is certainly way more stable than 3.7.1

Curious, stable in what sense? We have had no issues with 3.7.1 as far as stability.

I know about the Jetty update and websockets.

It is much less likely to crash due to PEP memory leaks

good to know. Luckily we havent been affected by that.

Quote from Pat Santora (Feb 2012)

Dele and team,

I’ve just placed the websockets plugin into our production environment with 10+ clustered servers and a few thousand concurrent connections. At this point it’s running pretty well. However, I made a few modifications to account for resource accountability as we needed multiple sockets open on a per user/resource bases. I’ve also adjusted the plugin to work with the default WebSocketServlet rather than a general HttpServlet with the WebSocketsFactory. This was done for simplicity for now.

I’ll send something to you and Guus shortly for review. I just want to make sure it’s working for a day or two first.

http://community.igniterealtime.org/blogs/ignite/2012/02/14/webrtc-websockets-an d-openfire