Clients lockup and can''t reconnect

Almost on a daily basic we have a random client that locks up after a status change and can’'t reconnect. Once this happens, no new connections are possible and we are forced to restart the wildfire service. We authenticate using LDAP, but are still using mysql for the groups. Below is our configuration file and the error that is listed in the debug log when this happens.

java.net.SocketException: Connection reset

at java.net.SocketOutputStream.socketWrite(Unknown Source)

at java.net.SocketOutputStream.write(Unknown Source)

at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(Unknown Source)

at com.sun.net.ssl.internal.ssl.OutputRecord.write(Unknown Source)

at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(Unknown Source)

at com.sun.net.ssl.internal.ssl.AppOutputStream.write(Unknown Source)

at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)

at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(Unknown Source)

at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(Unknown Source)

at sun.nio.cs.StreamEncoder.flush(Unknown Source)

at java.io.OutputStreamWriter.flush(Unknown Source)

at java.io.BufferedWriter.flush(Unknown Source)

at org.jivesoftware.util.XMLWriter.flush(XMLWriter.java:190)

at org.jivesoftware.wildfire.net.XMLSocketWriter.flush(XMLSocketWriter.java:31)

at org.jivesoftware.wildfire.net.SocketConnection.deliver(SocketConnection.java:47 8)

at org.jivesoftware.wildfire.ClientSession.deliver(ClientSession.java:756)

at org.jivesoftware.wildfire.ClientSession.process(ClientSession.java:750)

at org.jivesoftware.wildfire.roster.Roster.broadcastPresence(Roster.java:481)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.broadcastUpdate(Presenc eUpdateHandler.java:258)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateH andler.java:96)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateH andler.java:153)

at org.jivesoftware.wildfire.PresenceRouter.handle(PresenceRouter.java:92)

at org.jivesoftware.wildfire.PresenceRouter.route(PresenceRouter.java:61)

at org.jivesoftware.wildfire.PacketRouter.route(PacketRouter.java:73)

at org.jivesoftware.wildfire.net.SocketReader.processPresence(SocketReader.java:44 5)

at org.jivesoftware.wildfire.net.ClientSocketReader.processPresence(ClientSocketRe ader.java:56)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:242)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:119)

at java.lang.Thread.run(Unknown Source)

I would be thankful for any ideas or help.

Thanks,

Tim Schroeder

Hey Tim,

The next time it happens could you generate a thread dump of the JVM? Follow this KB document[/url] that explains how to get a thread dump. Once you have the thread dump, check that it contains one or more processes whose name is "Socket Listener at port ". Since clients are no longer able to log in again my guess is that those processes are gone. If they are still there then post them here so I can review them or send me the thread dump so I can analyze it. However, if they are no longer there then you may want to use the nightly build version that includes a possible fix for this problem.

Regards,

– Gato

If you are going to upgrade to the latest nightly build version then follow these steps:

  1. back up your database.

  2. back up config/wildfire.xml, lib/wildfire.jar, plugins/admin and resources\database.

  3. Unzip the nightly build version in a temp folder

  4. Start the nightly build version just to force packed files to be unpacked. You can stop the server after files were unpacked. Check files in lib folder, you should now see .jar files

  5. Copy lib/wildfire.jar, plugins/admin and resources\database over your existing Wildfire installation.

  6. Optional step: You can manually upgrade your database by executing db script located in resources\database\upgrade\6. If you don’'t upgrade the database then Wildfire will do it for you when the server is started.

  7. Remove all log files before starting the server so we can get any info generated by Wildfire 2.6.0 beta.

  8. Start the server and test.

Regards,

– Gato

It happened again, but I wasn’‘t able to get a thread dump when following the KB document. There was nothing new showing up in the standard out device. I’‘ve now started wildfire behind the script command in a screen session. Hopefully this will allow me to gather the needed information. I’'ll let you know.

Thanks,

Tim

Hey Gato,

I now have the thread dump but didn’‘t see any processes whose name is "Socket Listener at port ". I have emailed you the dump file if you want to take a look at it. Instead of upgrading with the nightly build, I’'ll probably just wait for the official release if you believe it will address the problem.

Thanks again,

Tim

Hey Tim,

Checking your thread dump I found that there was a deadlock in the server. So threads started to pile up. I also found more than 1.2K concurrent connections (threads) so under this circumstance it is possible that the server run out of memory and the first thread that found the out of memory problem was the listener at port 5222 (c2s). Today I checked in a fix for JM-615 so you may want to use the next nightly build version. Let me know if the problem still persists.

Steps to install latest nightly build:

  1. back up your database.

  2. back up config/wildfire.xml, lib/wildfire.jar, plugins/admin and resources\database.

  3. Unzip the nightly build version in a temp folder

  4. Start the nightly build version just to force packed files to be unpacked. You can stop the server after files were unpacked. Check files in lib folder, you should now see .jar files

  5. Copy lib/wildfire.jar, plugins/admin and resources\database over your existing Wildfire installation.

  6. Optional step: You can manually upgrade your database by executing db script located in resources\database\upgrade\6. If you don’'t upgrade the database then Wildfire will do it for you when the server is started.

  7. Remove all log files before starting the server so we can get any info generated by Wildfire 2.6.0 beta.

  8. Start the server and test.

Regards,

– Gato

I’'ve exactly the same Problem!