Wilfire 2.4.0 hanging

Hi,

I’‘m having periodic hangs (seems to be every 12 hours or so) of the server. It gets to the point where I can’‘t even connect anymore, and the admin console doesn’‘t respond. I’‘m using Sun’'s Java version 1.5.0_06-b05. I also have pyMSNt and pyAIMt transports connected. The memory the server seems to be using according to top is: Virt/Res/Shr: 308m/141m/19m.

I would blindly migrate to 2.4.2, but as I don’‘t see any related changes in that version, I don’'t think that would solve any of my problems.

I am attaching a stacktrace (produced by kill -3 pid), in hopes that it will help.

Thanks.

  • Amir

Full thread dump Java HotSpot™ Server VM (1.5.0_06-b05 mixed mode):

“Server SR - 28381082” daemon prio=1 tid=0x0873af08 nid=0x4565 in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x433bd0e8> (a sun.misc.SoftCache)
    at java.lang.Object.wait(Object.java:474)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:903)
  • locked <0x433bd0e8> (a sun.misc.SoftCache)

at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:743)

at java.util.ResourceBundle.getBundle(ResourceBundle.java:576)

at org.jivesoftware.util.LocaleUtils.getLocalizedString(LocaleUtils.java:400)

at org.jivesoftware.util.LocaleUtils.getLocalizedString(LocaleUtils.java:341)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:158)

at java.lang.Thread.run(Thread.java:595)

“Client SR - 30104828” daemon prio=1 tid=0x085e8780 nid=0x4417 runnable

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:284)

at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:319)

at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:720)

  • locked <0x42cbf9f8> (a java.lang.Object)
    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:67 7)
    at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
  • locked <0x42cbfa08> (a com.sun.net.ssl.internal.ssl.AppInputStream)
    at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
    at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
  • locked <0x42f21ac8> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:67)

at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)

at org.dom4j.io.XMPPPacketReader.parseDocument(XMPPPacketReader.java:285)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:172)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:118)

at java.lang.Thread.run(Thread.java:595)

“Client SR - 14606772” daemon prio=1 tid=0x0873a650 nid=0x7185 runnable

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)

at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)

  • locked <0x42e20678> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:67)

at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)

at org.dom4j.io.XMPPPacketReader.parseDocument(XMPPPacketReader.java:285)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:172)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:118)

at java.lang.Thread.run(Thread.java:595)

“Client SR - 20055805” daemon prio=1 tid=0x08234f18 nid=0xf0c runnable

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)

at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)

  • locked <0x465e1298> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1410)

at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:322)

at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)

at org.jivesoftware.wildfire.net.SocketReader.createSession(SocketReader.java:567)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:109)

at java.lang.Thread.run(Thread.java:595)

“Client SR - 6971605” daemon prio=1 tid=0x08989f98 nid=0xd1c runnable

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)

at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)

  • locked <0x4372dbc0> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:67)

at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)

at org.dom4j.io.XMPPPacketReader.parseDocument(XMPPPacketReader.java:285)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:172)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:118)

at java.lang.Thread.run(Thread.java:595)

“SocketListener0-1” prio=1 tid=0xb4e39d10 nid=0x3fbe in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x4367f008> (a org.mortbay.util.ThreadPool$PoolThread)
    at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:522)
  • locked <0x4367f008> (a org.mortbay.util.ThreadPool$PoolThread)

“SocketListener0-0” prio=1 tid=0xb4e39550 nid=0x53a in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x4367f080> (a org.mortbay.util.ThreadPool$PoolThread)
    at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:522)
  • locked <0x4367f080> (a org.mortbay.util.ThreadPool$PoolThread)

“SessionScavenger” daemon prio=1 tid=0x0883b2b0 nid=0x327b waiting on condition

at java.lang.Thread.sleep(Native Method)

at org.mortbay.jetty.servlet.AbstractSessionManager$SessionScavenger.run(AbstractS essionManager.java:588)

“Client SR - 20859482” daemon prio=1 tid=0x0883ad00 nid=0x3a7a runnable

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:284)

at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:319)

at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:720)

  • locked <0x4351d550> (a java.lang.Object)
    at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:67 7)
    at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
  • locked <0x4351d560> (a com.sun.net.ssl.internal.ssl.AppInputStream)
    at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
    at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
  • locked <0x4365d850> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:67)

at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)

at org.dom4j.io.XMPPPacketReader.parseDocument(XMPPPacketReader.java:285)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:172)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:118)

at java.lang.Thread.run(Thread.java:595)

“Client SR - 6964063” daemon prio=1 tid=0x0873cb08 nid=0x3fdb runnable

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)

at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)

  • locked <0x436205c0> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:67)

at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)

at org.dom4j.io.XMPPPacketReader.parseDocument(XMPPPacketReader.java:285)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:172)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:118)

at java.lang.Thread.run(Thread.java:595)

“Client SR - 6141358” daemon prio=1 tid=0x088a00d8 nid=0x458e runnable

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:196)

  • locked <0x435d8bb8> (a java.lang.Object)
    at org.jivesoftware.wildfire.net.TLSStreamReader.doRead(TLSStreamReader.java:62)
    at org.jivesoftware.wildfire.net.TLSStreamReader.access$000(TLSStreamReader.java:2 7)
    at org.jivesoftware.wildfire.net.TLSStreamReader$1.read(TLSStreamReader.java:153)
  • locked <0x4363e530> (a org.jivesoftware.wildfire.net.TLSStreamReader$1)
    at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
    at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
  • locked <0x436361b0> (a java.io.InputStreamReader)

at java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.jivesoftware.wildfire.net.MXParser.nextImpl(MXParser.java:67)

at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)

at org.dom4j.io.XMPPPacketReader.parseDocument(XMPPPacketReader.java:285)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:172)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:118)

at java.lang.Thread.run(Thread.java:595)

“Rollover” daemon prio=1 tid=0x08595808 nid=0x3ec3 waiting on condition

at java.lang.Thread.sleep(Native Method)

at org.mortbay.util.RolloverFileOutputStream$Rollover.run(RolloverFileOutputStream .java:302)

“DestroyJavaVM” prio=1 tid=0xb4e0f558 nid=0x6e49 waiting on condition

“pool-1-thread-1” prio=1 tid=0xb4e0e1a0 nid=0x2dba waiting on condition

at sun.misc.Unsafe.park(Native Method)

at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)

at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Abs tractQueuedSynchronizer.java:1767)

at java.util.concurrent.DelayQueue.take(DelayQueue.java:131)

at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Schedule dThreadPoolExecutor.java:504)

at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Schedule dThreadPoolExecutor.java:497)

at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:674)

at java.lang.Thread.run(Thread.java:595)

“Auditor” prio=1 tid=0x0835aa90 nid=0x575e in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x4342c0f0> (a java.util.TaskQueue)
    at java.util.TimerThread.mainLoop(Timer.java:509)
  • locked <0x4342c0f0> (a java.util.TaskQueue)

at java.util.TimerThread.run(Timer.java:462)

“MUC cleanup” prio=1 tid=0x08477458 nid=0x2201 in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x434258e0> (a java.util.TaskQueue)
    at java.util.TimerThread.mainLoop(Timer.java:509)
  • locked <0x434258e0> (a java.util.TaskQueue)

at java.util.TimerThread.run(Timer.java:462)

“Sessions cleanup” prio=1 tid=0x0847bc20 nid=0x53d5 in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x43408c10> (a java.util.TaskQueue)
    at java.util.TimerThread.mainLoop(Timer.java:509)
  • locked <0x43408c10> (a java.util.TaskQueue)

at java.util.TimerThread.run(Timer.java:462)

“Low Memory Detector” daemon prio=1 tid=0x08101848 nid=0x734d runnable

“CompilerThread1” daemon prio=1 tid=0x081003c0 nid=0x7753 waiting on condition

“CompilerThread0” daemon prio=1 tid=0x080ff360 nid=0x7c04 waiting on condition

“AdapterThread” daemon prio=1 tid=0x080fe220 nid=0x258d waiting on condition

“Signal Dispatcher” daemon prio=1 tid=0x080fd400 nid=0x2e2f waiting on condition

“Finalizer” daemon prio=1 tid=0x080f31b8 nid=0x55b2 in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x43376230> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
  • locked <0x43376230> (a java.lang.ref.ReferenceQueue$Lock)

at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)

at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

“Reference Handler” daemon prio=1 tid=0x080f2d28 nid=0x4b5 in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x433762b0> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:474)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
  • locked <0x433762b0> (a java.lang.ref.Reference$Lock)

“VM Thread” prio=1 tid=0x080f0890 nid=0x5f7b runnable

“VM Periodic Task Thread” prio=1 tid=0x08102ce8 nid=0x5f3d waiting on condition

/code

Hi,

do you have some cpu stats like a “vmstat 60” log? It would be good to know if the server is busy (100% cpu -> which process?) or paging or if it is idle.

LG

Hey Amir,

Thanks for providing the thread dump.

I don’'t see the the threads responsible for accepting client, server or external component new connections neither the threads responsible for accepting new connections to the admin console. Could you check for relevant errors in the log files? Is it possible that for some reason the network (firewall or something) closed those ports or the listening process at the OS level?

BTW, today we are going to release Wildfire 2.4.3 that includes many important bug fixes so you may want to migrate to the latest release (once its out).

Regards,

– Gato

I don’'t have such a log, but it seems to be that the process is sitting idle, memory growth has not been increasing as I have been ‘‘monitoring’’ it visually. I ran vmstat for about two minutes, and here is what I got:

procs -


memory----


-swap -


io---- system -


cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 132344 6468 57136 390676 0 0 1 2 1 2 8 1 89 2

0 0 132344 9100 55932 389184 0 0 522 372 1797 0 12 3 68 17

0 0 132344 7152 54372 392464 0 0 459 168 1932 0 13 3 75 8

0 1 132344 5588 54424 394232 0 0 362 55 1722 0 12 3 76 8

0 0 132348 6640 51188 396244 0 0 786 693 1775 0 13 5 69 13

0 0 132348 7564 51604 395056 0 0 339 280 1677 0 11 3 77 9

0 0 132348 6072 52520 395668 0 0 644 164 1759 0 13 4 67 15

0 0 132348 7144 51688 395304 0 0 519 319 1624 0 10 3 80 7

0 1 132348 9604 51248 393212 0 0 341 409 1828 0 11 3 74 11

0 0 132348 5536 51556 396924 2 0 375 57 1662 0 15 4 75 6

0 0 132352 10116 51276 392776 0 0 459 596 1870 0 13 4 71 13

1 0 132352 5712 51556 396816 0 0 364 466 1723 0 14 3 77 7

0 0 132352 6184 51480 396296 0 0 547 186 1717 0 14 4 74 8

/code

I do have other processes running on the machine, but here are stats from top:

top - 13:12:58 up 276 days, 3:37, 2 users, load average: 0.20, 0.31, 0.42

Tasks: 174 total, 1 running, 173 sleeping, 0 stopped, 0 zombie

Cpu(s): 7.6% us, 1.7% sy, 0.0% ni, 83.7% id, 5.3% wa, 1.0% hi, 0.7% si

Mem: 970280k total, 962192k used, 8088k free, 45108k buffers

Swap: 498004k total, 132352k used, 365652k free, 402208k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

28233 jabber 16 0 308m 141m 19m S 0.0 14.9 9:36.16 java

/code

Thanks.

  • Amir

Gato,

According to netstat, the process is still listening and has some connections in the ESTABLISHED state. The way I know it is dead is I tried to telnet to localhost on port 9090, and I get the TCP connection, but it ignores everything I say (eg. GET / HTTP/1.0). In debug.log, I see a lot of Logging off user (org.jivesoftware.wildfire.ClientSession) due to presence becoming unavailable (TCP socket timeout I suppose?). I will wait for 2.4.3 though

Thanks!

  • Amir

Hi Amir,

did you run vmstat while the process was not responding or is this the normal state? You could run it with “nohup vmstat 30 >/tmp/vmstat.log &”.

LG