Openfire 3.7.1 java deadlock after server reboot

Hello! I’m using Openfire 3.7.1 on Debian 6 32-bit; configured with MySQL and had it running for quite some time.

About a week ago my vserver rebooted unexpectedly (still analyzing that) and everything but openfire went back up.

Since then I have not been able to start openfire again successfully. The configuration is unchanged and mysql is running and also being used successfully by other services.

When I try to start openfire it either doesn’t seem to do anything (no java process at all, but pidfile created), or the process is running (java with the expected options) and no ports are opened at all and sometimes some or all ports are being opened, looking like a successful start. However, you might be able to login to the admin webpage or not, connect successfully with a jabber client or not and it’s highly unstable mostly crashing after serving one connection.

To make matters worse, mostly there is nothing being logged at all!

The config looks ok, the database is okay too, I’ve reinstalled the package anyway without any change.

Has anyone encountered this or can give me a hint where to look?

Thanks in advance!

I’m not 100% how the Debian init.d scripts are setup, but at least on RedHat the JVM writes stdout to openfire/logs/nohup.out - That should give you some indication of what is failing, even if it doesn’t get far enough to start using log4j to write to the regular log files.

Thanks for your thoughts.

I’ve been looking for the nohup.out pipe too and didn’t find it. However, I started openfire without the init script and verbose now:

/usr/lib/jvm/java-6-sun/bin/java -server -DopenfireHome=/usr/share/openfire -Dopenfire.lib.dir=/usr/share/openfire/lib -classpath /usr/share/openfire/lib/startup.jar -jar /usr/share/openfire/lib/startup.jar -v

Which mostly outputs

Openfire 3.7.1 [Sep 3, 2012 8:38:18 PM]

A fatal error has been detected by the Java Runtime Environment:

Internal Error[thread -1908933776 also had an error] (safepoint.cpp:610)

, pid=11983, tid=2400787312

fatal error: Deadlock in safepoint code. Should have called back to the VM before blocking.

JRE version: 6.0_26-b03

Java VM: Java HotSpot™ Server VM (20.1-b02 mixed mode linux-x86 )

An error report file with more information is saved as:

/root/hs_err_pid11983.log

[thread -1907623056 also had an error]

If you would like to submit a bug report, please visit:

http://java.sun.com/webapps/bugreport/crash.jsp

Now I found some threads regarding openfire on a hosted VM, but a) it worked until recently and b) I’m not giving up that easily.

Could this be related (I’m no java guy so I’m really guessing here)

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7187046

Okay, so I have progressed quite a bit further.

First off, I have no idea why, but today I don’t have the Java deadlock anymore without me being aware why (the only notable thing I have changed is install sun-java6-fonts). Nonetheless, there are still issues.

When running ‘/usr/lib/jvm/java-6-sun/bin/java -server -DopenfireHome=/usr/share/openfire -Dopenfire.lib.dir=/usr/share/openfire/lib -classpath /usr/share/openfire/lib/startup.jar -jar /usr/share/openfire/lib/startup.jar -v’ now everyhing works - as root. When invoking the initscript, it does not. Manually switching to the openfire user and running it does not work either, as to be expected.

So I noticed several things being wrong regarding users and file permissions. The UID and GID were not unique - I fixed that and also the file permissions with the same commands from the post install script of the original package.

Nonetheless, I get no console output, error or logs when running in openfire user context.

Anyone got a clue?

This is where it gets really freaky.

I dabbled a bit, getting nowhere. I noticed that everything in my previous post didn’t seem true anymore, and was baffled and pondering why.

Then it hit me - I installed openjdk-6-jre, just to see if it made a difference. And it does! Somehow, that seems to resolve some requirements that aren’t covered by the dependencies!? And why did it work previously without a hitch?

I have no idea.

Anyway, after installing openjdk-6-jre (openfire is still using sun-java6-jre!) everything works yet again.

(I have tested this by uninstalling openjdk-6-jre and dependencies and then it didn’t work again)

I swear, everytime I think I’m on to something, test the theory and type it here, everything’s completely different upon completion of the post. Sigh.

So no, nothing.

All I have is rumors on OpenVZ VMs having issues with java, even though I had it running for quite some time…

I had same probleme on Centos 6, Java 1.6.35, ipv6 desactivate, iptable off, on Proxmox OpenVZ VM

.

It had running 5 minutes, then after it crach always.

I had install it many time, but i get the same problem.

any suggestion ?

java error message is :

A fatal error has been detected by the Java Runtime Environment:

Internal Error (objectMonitor.cpp:1318), pid=695, tid=2535693168

guarantee(Self == _owner) failed: complete_exit not owner

JRE version: 6.0_35-b10

Java VM: Java HotSpot™ Server VM (20.10-b01 mixed mode linux-x86 )

An error report file with more information is saved as:

/opt/openfire/logs/hs_err_pid695.log

If you would like to submit a bug report, please visit:

http://java.sun.com/webapps/bugreport/crash.jsp

Well, your java error is a totally different one. Besides, I didn’t receive any java error anymore.

Maybe my hoster was messing with the OpenVZ configuration.

Nevertheless, I never got it to work. I even reinstalled the debian package completely new, with default config, new database… no dice. Java errors are gone (without me really changing anything - hence I assume my hoster’s done something) but openfire still does not start stable. It just seems to stop working at some random point when starting up. Since it’s been a month now (since it went down, not since I posted) I opted to install ejabberd which works flawlessly.

Please don’t misinterpret this – I love openfire and I’ve been using it for around 2 years. Something must have happened to my OpenVZ Server VM to introduce major openfire issues. I am just circumventing this now by using ejabberd.

Thank you

Finally I find a way to run openfire

Solutions was :

  • on openVZ attribuate same number of cpu on openVZ than the reality !

My openVZ is on Proxmox 2.1, I had read that on v1.9 many user had this problem. Then I had adapted there solutions on v2.1 ! and it is OK.

The problème is beetween jetty, Java and number CPU emulated !

The problem don’t exist on openVZ 1.8 ! (it is the only difference between before and now )