Stability concerns with Openfire

Hello,

My company has been trying to adopt Openfire as the primary method for internal IM communication, but we’ve been plagued by our users being frequently disconnected from the server.

Here’s a background of our setup:

-Openfire 3.6.3 installed on a VM with the most up to date version of CentOS

-MySQL database on the same server

-LDAP authentication is enabled with shared groups

-SSL enabled

-Conferencing server setup with 5 persistent chat rooms, usually with about 5-10 users per room

-Approximately 50 concurrent users during peek hours

-xmpp.client.idle = -1

I polled our users to see how wide spread the problem was and nearly half of them reported frequent disconnects (more than once per day), while many others reported disconnects several times a week. The primary client of choice for our users is Pidgin, but the problem exists on Spark as well.

It seems that users who are constantly in our chat rooms suffer the worst of it, being disconnected upwards of 3-5 times per day. I do not frequent these chat rooms, but I am usually disconnected about once per day (Pidgin reports it as a “ping timeout”). I remain connected to several other services when my connection drops, so I can confirm it’s not a problem with my internet connection. I have also checked the logs on the server on several occasions and see nothing out of the ordinary or anything that may give me clues about the disconnects.

We like Openfire a lot and it fits all of our needs if we could solve these disconnect issues. Does anyone have any ideas about what might be the root cause and possible solutions for this?

Would it be possible to get in touch with some of the developers to have a more detailed discussion about our implementation and the problems we’re currently facing?

Thanks!

-Adam

My server is vastly more populated than that and does not suffer from any down time other than my weekly scheduled reboot of the server. I have heard of VM machines having packet issues with openfire. I would be looking to your VM machine first.

Hi Adam,

I wonder whether you see expections in the client log or in the Openfire log.

Try to disable compression on https://yourserver:9091/compression-settings.jsp - this could help.

LG

Hey LG/Todd,

We already had those compression settings disabled. We migrated our VM host to a different ESX host with less load on it and faster storage. Hopefully that will improve performance. We would prefer not to have to put this on a physical box.

Are you guys aware of anyone successfully running Openfire in a VM environment?

Thanks!

-Adam

Adam,

We have an Openfire box running on our ESX 3.5U4 cluster, been up for about 18 months now, zero issues like you are describing.

CentOS 5.2 VM, base install (no gui or extra services)

Openfire 3.6.3 installed via RPM, MySQL 5 backend, LDAP connector, shared groups, etc (very similar to your setup)

120 users connecting via Miranda IM client, SSL enabled.

no conferences, just user to user chats.

The box has been ROCK solid - other than restarting Openfire to apply updates we haven’t had a single issue. In fact, our entire core enviroment (domain controllers, exchange, internal and external web servers, JIRA box, etc) are all running on ESX and we’ve had very few issues overall - what issues we have had our almost always due to our SAN.

In any case, not sure if this helps but our VM environments for all of our servers have been extremely reliable and administration friendly.

Hello!

We have exactly the same behavior that you describe, multiple disconnects for chat users, more if they’re in chat rooms. Did you find a good solution for this issue?

I disabled compression today to see if that will help.

– edit –

I was just kicked from the server, so apparently disabling compression doesn’t help.

–Matthew

Hi Matthew,

Do you use LDAP groups for your roster? We tracked the root cause to a large shared LDAP group that was being synchronized with Openfire every couple of seconds. This was causing a huge amount of stress on the server. We removed the sharing for this group and our disconnect problems went away.

We would still like to use this group, but I haven’t found a way to make Openfire synchronize LDAP groups less often.

Hope this helps,

-Adam

We do use LDAP group sharing. I’ll look into that. So it was a single large group in LDAP that was causing the issue?

Did you disable roster sharing, or exclude that group in Openfire somehow…?

We just disabled roster sharing for the group. We still have the group in our Openfire installation.