Service stops responding and clients unresponsive until server reboot

Running Openfire 3.6.4 on Windows Server 2003 x64 R2 SP2 with SQL 2005. The server is solely dedicated to Openfire. I have 8G RAM with 4G designated for SQL and 1G designated for Openfire via the vmoptions file. I have 150 to 400 clients connecting during a 24 hour period, and have been providing service on the server for about one month. The past week the service has developed an issue (explained below). PEP is disabled and username2roster cache set to 20 minutes because of the memory leak discussions elsewhere. Neither one of the changes have solved the issue.

The issue I’m having is that clients stop responding, but don’t get disconnected, when the Openfire service appears to stop functioning. The Web GUI is still accessible, but inside of a client there appears to be no response from the server. The Openfire service cannot be stopped or restarted in system services. The server must be rebooted to restart the service. After restart, during perods of heavier traffic, everything is fine for 15 or 20 minutes until the service appears to stop functioning inside the clients. During the night shift (9pm-9am), when approximately 150 users are connected, the service does well. A few hours after the day shift comes on and traffic builds the cycle of unresponsiveness and rebooting continues.

Thoughts?

Hi Cody, I´m Having a similar issue. Did you find any solution?

Thanks.

Gustavo,

We’ve found no resolution yet. We even switched servers and are still having the same issue. We’re now going to try to switch servers a second time and install Server 2008 x64, SQL 2008 x64, and Java 7 x64 with Openfire 3.7.1 and the executable at the following link for using Java x64: http://community.igniterealtime.org/docs/DOC-1331

Would you happen to have a copy of the 3.7.1 zip file for Windows? The download link on this site is broken.

Yes Cody, I do have a copy of the 3.7.1 zip file for Windows. I suggest you to check the link on the site right now. The link has been fixed. If you still can´t donwload the zip file, I can send it to you via e-mail.

I’ve been struggling with openfire by 2 weeks. Everytime I can solve an issue, another issue shows up.

I have tested it in different OS. W2K3R2/W2K8/CentOS/Debian…

I’ve given up with my W2K3 schema and came back to CentOS 5.6. Java Jdk 7 with the surprise that I was having similar Issues!

I added this lines in System properties

cache.group.size

10485760

cache.userGroup.size

10485760

cache.username2roster.size

10485760

cache.vcardCache.size

10485760

xmpp.pep.enabled

false

and set into Client Connections=> Clients ports => SSL DISABLED

For the moment the Server is doing Well, I hope it lasts and I hope this can be useful with your Windows Schema.

Regards.

Gustavo,

We had our first full day without the “unresponsive” issue. The only two things that we did that might have solved the problem is to sync the times on the local server with Openfire on it and the SQL server that we’re pointing to. The times on each server were off with one another as well as the Domain Controllers on the network. We used the following Windows command in order to sync the time:

W32tm /config /syncfromflags:DOMHIER

We’ll see how things work tomorrow during our normal blackout time period.

What type of database are you using, and is it on the local machine?

Are you on a network that has Domain Controllers?

Also, what value is represented by your above entries of 10485760?