Wildfire completely unresponsive

Hi, I had my own little horror story tonight. All of a sudden, Wildfire stopped doing anything. Clients were unable to log-in (logged-in clients weren’‘t logged out, but got no response from the domain). I was unable to get onto the webadmin page (I got to the login screen, but logging in would time out every time. What’‘s particularly frightening is that I’‘ve found no indication of failure in any log. Finally, I tried to restart Wildfire, but I noticed the “/etc/init.d/wildfired stop” command wouldn’'t stop the server either. I ended up doing a kill -6 on the process. All in all, it looks like the entire process just freezed for some reason.

I’'m at a loss as the cause of my problem. Does anyone have tips on how to get to the cause of my problem?

-edit-

The thread below describes the same symptoms: http://www.jivesoftware.org/community/thread.jspa?threadID=20090&tstart=0

Message was edited by: Guus

Hi Guus,

now that you did restart Wildfire it’‘s hard to tell if you did encounter a deadlock or in which state Wildfire was. A bunch of javacore files (kill -3) would have been nice but it’'s really hard to think of this when you need your production up and running again.

Was your wildfire process consuming cpu cycles or completely idle?

LG

No, as far as I know the machine was virtually idle. I did make a jstat -gc dump, but none of the memory spaces had been completely filled.

During the time the server was unresponsive, two messages have been added to the error log. I’‘m thinking they’'re unrelated, but maybe the give someone a hint:

2006.08.19 08:46:35 org.jivesoftware.wildfire.net.BlockingReadingMode.run(BlockingReadingMode.java: 104) Connection closed before session established

Sockethttp://addr=/64.233.166.129,port=42182,localport=5269

2006.08.19 09:08:20 org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateH andler.java:141) Internal server error. Triggered by packet:

java.lang.NullPointerException

at org.jivesoftware.wildfire.SessionManager.changePriority(SessionManager.java:864 )

at org.jivesoftware.wildfire.ClientSession.setPresence(ClientSession.java:666)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateH andler.java:98)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateH andler.java:153)

at org.jivesoftware.wildfire.PresenceRouter.handle(PresenceRouter.java:92)

at org.jivesoftware.wildfire.PresenceRouter.route(PresenceRouter.java:61)

at org.jivesoftware.wildfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:75)

at org.jivesoftware.wildfire.net.SocketReader.processPresence(SocketReader.java:29 6)

at org.jivesoftware.wildfire.net.ClientSocketReader.processPresence(ClientSocketRe ader.java:57)

at org.jivesoftware.wildfire.net.SocketReader.process(SocketReader.java:191)

at org.jivesoftware.wildfire.net.BlockingReadingMode.readStream(BlockingReadingMod e.java:156)

at org.jivesoftware.wildfire.net.BlockingReadingMode.run(BlockingReadingMode.java: 62)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:123)

at java.lang.Thread.run(Thread.java:595)

We had a freeze again today. This time, I made a full thread dump. Sadly, there’'s no report of a deadlock in it.

We noticed something strange. I reported earlier that I was unable to login in the admin console when the server freezed. This time, I was already logged in. I was able to navigate through a couple of pages successfully. Looking at user details wouldn’'t work though.

Our first freeze, saturday, followed a couple of days after we started making use of connection managers more intensively than before. As the dates seem to coincide, I removed all connection managers as a precaution.

Guus,

I took a look at the thread dump you sent me by email. It looks like your connection to the database died. One possible scenario is that your database server got restarted without Wildfire being restarted. We have an issue filed to make the connection pool logic better at recovering from database outages. For now, we always recommend restarting Wildfire after restarting your database.

Regards,

Matt

Thanks for your help Matt. It’'s still somewhat puzzling to me how the database-connection could have died (the database itself has been up all the time). In any case, I voted for JM-343.