Openfire 3.10.2 stops processing messages

We are running Openfire 3.10.2 using Centos and mysql. We have roughly 250-275 active users and a similar number of conversations.

After several years of near flawless operation, we now have to restart the openfire service nearly every day on our server. The symptom prior to the service restart is that our users, who communicate with one another using Spark, will start reporting that the other side is not receiving their message.

We were experiencing issues with CPU hitting 100% utilization, but we have since added an additional cpu. This no longer seems to be an issue, but the primary issue still remains unfortunately.

There do seem to be some errors in the various logs, but I’m not sure what to look for. I’ve also seen some other discussions that seem like they may be experiencing similar issues with this release. What should I look for or what can I provide to assist in troubleshooting and resolving this issue?

Are you out of memory for the openfire process?

Hi,

We have the same trouble in openfire 3.9.1,and we use jprofiler to monitor the jvm run state, and finally we found that the log module Log4j impropriate all the cpu.now we use logbakc to replace the Log4j. And to modify the method setDebugEnabled() in class org.jivesoftware.util.Log like this

final Level newLevel;

if (enabled) {

newLevel = Level.ALL;

} else {

newLevel = Level.INFO;

}

LoggerContext loggerContext = (LoggerContext) LoggerFactory

.getILoggerFactory();

loggerContext.getLogger(“ROOT”).setLevel(newLevel);

Hmm, not that I’m aware of, but it is quite possible I’m not looking in the right place. I haven’t seen any errors mentioning memory specifically and didn’t notice anything unusual when monitoring with top. The server has 2GB allocated and 1GB is usable as Java Memory. When client messages stop being sent, I can still log into the administrative portal and navigate around in it without any issues. The Java memory is not shown as fully utilized nor is the CPU when this occurs.

If you think it is worthwhile, I could try providing additional memory to the VM running Openfire and potentially double it from 2GB to 4GB.

Please review all logs for memory related issues and yes, try 2 GB to see if that helps.

I thought maybe I had this issue resolved when I performed some mysql database repairs ( a few tables had issues showing up in the logs), but it appears not. After a couple of days of operation, the server stopped responding again this morning. I looked at the error and warn logs and noticed this:

at org.apache.lucen2015.08.17 08:41:36 org.jivesoftware.openfire.filetransfer.proxy.ProxyConnectionManager - Error creating server socket

java.net.BindException: Address already in use

This is followed by other errors such as:

2015.08.17 08:41:36 org.jivesoftware.openfire.FlashCrossDomainHandler - Could not listen on port: 5229

java.net.BindException: Address already in use

At the point when this occurred, the server had been running for a little over 3 days and had already been used this morning for a couple of hours. When this occurred, no one’s messages were being processed by the server any more.

I attempted to resolve this by performing a “service openfire stop” followed by a “service openfire start”. I was surprisingly met by a message stating it was already running. The next step was a server restart.

On the first restart, the server hung starting mysqld. I restarted again and mysqld started correctly. Openfire also appeared to start, but when trying to access the management portal, I was met by the setup wizard. I checked /opt/openfire/conf and saw that openfire.xml and security.xml were now owned by root rather than daemon, so I used chown to set them back to daemon. I restarted openfire again, this time using /etc/init.d/openfire stop and /etc/init.d/openfire start. It again said that openfire was already running after successfully stopping it.

At this point, messages appear to be flowing and the archiving plugin is showing conversations and traffic. I’m at a loss as to what is going on with this server all of a sudden though. It seems like the security.xml and openfire.xml permissions change to root on restart, which causes some issues. It also seems like multiple instances of openfire are trying to startup for some reason. I don’t know if this is a new issue with 3.10.2 or not, but I suspect it is. Prior to that we had a cpu utilization issue, but this is something different entirely.

What’s my best course of action at this point? Can this be repaired fairly easily? Am I better off building a new, clean, Openfire server and migrating mysql database and openfire settings somehow (I’m not an expert with this)? Rollback to 3.9.3 (how would I do this)?

Thanks for your help with this!

It appears that are issues were caused by the storage volume filling up. I found that there was a 40GB nohup.out file located in /etc/init.d. I deleted it and restarted the server and mysqld had no issues starting. However, I noticed when I log into the web admin portal now that the archiving tab where I would search and view active conversations is gone. The monitoring plugin is still installed though. Would this have anything to do with this nohup.out file or do I just need to reload the plugin or something?

I believe at this point I’m going to build a new Centos server, install openfire, and migrate the mysql database. Do I need to do anything to move the archive data over as well?

Thanks for your time and attention.