BOSH latency with Apache and Openfire

jeffk · December 5, 2012, 6:50pm

I am experiencing an issue with Latency with my BOSH setup and looking for advise / ideas on where the bottle neck might be.

My Setup.

Windows,
Apache 2.23
Openfire 3.7.1
Candy 1.0.9

When I first had users test the service, it was great until about 50 connections, then I had major reports of latency and some dropped connections. Quickly it became unusable

This test was running on a virtual server so I replicated it on one of our internal servers to rule out a hardware issue and a similar thing happened.

During both tests, CPU and RAM were fine so it did not point to a hardware issue and two different networks were involved (1 internal, 1 external) and so I think that is also in the clear.

To determine where the bottleneck might be, I setup 2 identical rooms. One in Openfire’s spank directory and the other in Apache’s htdocs directory.

I then had 50-100 users join the room: **http://example.com:7070/testroom **which had no issues. It was fast and performed very well with the data being served up by openfire’s Jetty server. (Good news for Openfire)

I then had 50-100 users join the room: http://example.com/testroom which began to have problems again at about 50 users. The issues were with latency and disconnects. This room was served up via apache and a .htaccess file proxying the connection over to port 7070 on the same server (to Openfire).

So based on this test, it would appear the issue is an apache bottleneck.

I am just wondering if anyone has any thoughts on this.

Also, if you have a large bosh deployment, whether or not they have seen that with Apache and if there are any apache tweaks/settings that may fix this issue.

Ultimately I am looking to connect about 300 users with them each being in 1-2 rooms. I would use port 7070 / 7443 directly (jetty) but the will be locked in our live setup so I am limited to port 443 (7443 is blocked externally because of PCI compliance, Openfire’s port 7443 uses a weak cipher)

Any help is appreciated. Thanks.

jeffk · December 5, 2012, 8:42pm

I believe I found the issue. I am testing tomorrow but the numbers line up.

For those interested, in apache there is an include for the Server-pool management

Server-pool management (MPM specific)

#Include conf/extra/httpd-mpm.conf

Within that file, there is a section

ThreadsPerChild 150

MaxRequestsPerChild 0

In researching Apache’s ThreadsPerChild setting, I found that the default when not specified is 64 which matches my issue (factoring overhead of inital page loads)

To fix (I hope) I have uncommented httpd-mpm.conf and set ThreadsPerChild to 1000.

Dele_Olajide · December 6, 2012, 9:26am

Upgrade to openfire 3.7.2 and use websockets directly to Openfire

jeffk · December 6, 2012, 4:57pm

I am doing some websockets testing on a test server to get ready for that option but in our production environment I dont want to upgrade to an alpha release of openfire for obvious reasons.

akrherz · December 6, 2012, 5:45pm

The current nightly build will become 3.7.2 sooner than later. It is certainly way more stable than 3.7.1

jeffk · December 6, 2012, 5:55pm

Curious, stable in what sense? We have had no issues with 3.7.1 as far as stability.

I know about the Jetty update and websockets.

akrherz · December 6, 2012, 5:57pm

It is much less likely to crash due to PEP memory leaks

jeffk · December 6, 2012, 5:58pm

good to know. Luckily we havent been affected by that.

Dele_Olajide · December 6, 2012, 6:29pm

Quote from Pat Santora (Feb 2012)

Dele and team,

I’ve just placed the websockets plugin into our production environment with 10+ clustered servers and a few thousand concurrent connections. At this point it’s running pretty well. However, I made a few modifications to account for resource accountability as we needed multiple sockets open on a per user/resource bases. I’ve also adjusted the plugin to work with the default WebSocketServlet rather than a general HttpServlet with the WebSocketsFactory. This was done for simplicity for now.

I’ll send something to you and Guus shortly for review. I just want to make sure it’s working for a day or two first.

http://community.igniterealtime.org/blogs/ignite/2012/02/14/webrtc-websockets-an d-openfire