Openfire 3.4.5 on Linux freezes repeatedly

I’ve installed openfire and set it up to allow users to create group chat rooms. For some reason it seems to hang completely at least once a day, to the point the web admin interface doesn’t respond and no messages get passed between clients. I can’t see any obvious problems in the log files, and attempting to send the process a -3 signal doesn’t seem to yield a stack trace.

In the end I have to use kill -9 to terminate the process and then wait for more than a minute before I restart it, otherwise it doesn’t get as far as reporting that the admin console is bound to port 9090.

Has anyone else seen this sort of behaviour, or has anyone got any ideas for how I could diagnose it? The users are getting rather impatient with me…

Niall

  • Openfire 3.4.5

  • Using active directory as the user database

If there’s any other details that might be usful please let me know.

I’m having similar problems in Windows. Normally, OpenFire just boots me from the server (probably everyone else), and trying to access the Web Admin just renders “Connecting to…” followed by “Connected to…” in a loop in the Firefox status bar. Testing it out with wget indicates that it can’t start the SSL session, and that the regular http requests don’t make it through.

The logs gives no obvious explanation what’s behind it either.

Thought I would chime in and report a similar problem. I’m running OpenFire on an XServe G5 so I’ve been attributing some of the flakyness to Apple’s java vm. Here’s a rundown of what happened today, Friday, and Thursday:

  1. Users reported that they can no longer join group chat

  2. Some users report that messages are not getting through (but I can send/receive fine)

  3. I observe that my messages are not getting through.

  4. Restarting the Chat client will connect, and some presence is communicated, but no messages are sent/received

  5. No messages/presence information is sent received

  6. Admin console shows memory fully allocated

  7. Unable to connect to Jabber service

Re-start and it seems fine (for another 24 hours).

There is nothing in the error log to help. Just something about the vCard directory being read-only. As it’s Active Directory, I would expect that. I’ve seen no changes in the kinds of clients connecting (iChat, Pidgin, GAIM). Not sure why this is happening.

We had a similar problem months ago. I just had a quick talk with our sysadmin, but neither of us can remember what caused it. We did play around a bit with types of garbage collectors around that time - try using another one, see what happens…

Thanks for your post Guus, it’s certainly something I’ll look at. I’m seeing a brief slowdown every 7 minutes which would seem to indicte a regular process, so the GC fits the bill. Could you find out what was done to the GC settings, as I fear that much more instability is going to lead the the users revolting.

This was all some time ago, before we made good use of an issue tracking mechanism. We’re currently running Openfire with these arguments (which should only output statistics, not change functionality):

-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintClassHistogram

I notice that these three options are explicitly commented out of the startup script. If applicable, you might want to disable them:

-XX:+UseParallelGC -XX:ParallelGCThreads=4
-XX:+UseConcMarkSweepGC

From what I remember, we switched JVMs a couple of times. We currently running with this:

guus:~$ java -version
java version "1.6.0_04"
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b19, mixed mode)

I hope this gets you anywhere!

Hello,

I am experiencing similar problems on our 3.4.5. installation on Suse Linux with a DB/2 backend. Have you been able to solve the issue?

Kind regards,

Walter

Eventually, I just had to eliminate variables, although I’m experiencing some different issues now. I moved it to a Windows 2003 machine running Sun’s Java, and was finally able to customize the instance to what it needed to be. My guess would be to check the version of Java that OpenFire is using, 1.6.x seems to be better than 1.5. Increase the memory available to Openfire (currently using 512MB - 250 Users). I haven’t played around with garbage collectors yet, but plan to. I recently had my server freeze with a GC Overhead error, which lead to a out of memory condition.

Sorry I couldn’t give you more useful information,

Tony