Hi all.
Maybe I should have post this message here before: http://www.jivesoftware.org/community/thread.jspa?threadID=17714&tstart=0 .
As it2000 advise me, I going to take dumps when the problem will arise again. Can I do something else?
Regards.
Hi all.
Maybe I should have post this message here before: http://www.jivesoftware.org/community/thread.jspa?threadID=17714&tstart=0 .
As it2000 advise me, I going to take dumps when the problem will arise again. Can I do something else?
Regards.
Yes, whatever diagnostic information you can get would be great.
Regards,
Matt
Argg. The JVM crashes on the kill -3… Too bad. However I hope there is enough informations.
I’'ve got the dump.hprof file, what can I do now?
you can send it by email to Matt or Gato. Check their profiles for addresses.
Ok, here is the hat result on the dump: http://ondolinde.dyndns.org:7000/ . I don’‘t know if it will help, I can’'t really read the informations. 4 dumps was taken: 3 shortly after startup, one after running for a while (approx 75% heap was used), and the last one (which causes the crash).
I’‘m trying another approach: i’'m trying the jrockit memory leak detector.
Regards.
Hi Aurélien,
to be honest I can’'t figure out how this dump information could be useful, but I took only a short look and missing some time for a detailed review. I did not found an option to list more than the direct reference and the size of it. Calculating the size of all associated references to one element will be more than very hard work.
LG
Hello.
First of all, sorry for the unuseful dump. I launched wilfdire with JRockit 1.5 and enabled the Memory Leak detector. After 13 hours, the char[] type is taking 58% of the memory, starting at 44%.
See screenshots at http://zorel.org/static/wildfire/
Capture-1/4 were taken after at different times (last one after 13 hours).
It seems org.jivesoftware.wildfire.net.MXParser references a lot of char[] (see 1.png).
I will see for improving this tool usage, for giving more informations.
Hey Aurélien,
Thanks for the bug report. I created JM-558 for this problem and checked in a fix for this issue. You may want to try again with the next nightly build. I’'m now profiling other parts of the server to confirm that there are no more leakings.
Thanks,
– Gato
Great! No wonder why I use Wildfire: the community and the developpers responsiveness is perfect.
Hi Gaston,
You know about this issue already. Changed OS (now RHEL4 on x64, RHEL3 on i386 before) and the VM continues crashing. I wonder if it is related to the memory leak described here in this thread.
Java VM: Java HotSpot™ 64-Bit Server VM (1.5.0_06-b05 mixed mode)
Problematic frame:
C 0x0000002b263077f4
T H R E A D -
Current thread (0x0000002b244a8f40): JavaThread “Client SR - 318402945” daemon
siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x0000002b263077f4
Registers:
RAX=0x0000000000000000, RBX=0x000000000000002c, RCX=0x0000002b263077f4, RDX=0x000000004307f570
RSP=0x000000004307f568, RBP=0x0000000000000000, RSI=0x000000004307f6a0, RDI=0x000000000000000d
R8 =0x0000000000000000, R9 =0x0000000000000000, R10=0x0000000000000000, R11=0x0000000000000246
R12=0x000000000000001d, R13=0x000000004307fa70, R14=0x0000000000000000, R15=0x0000002b244a8f40
RIP=0x0000002b263077f4, EFL=0x0000000000010246, CSGSFS=0x0000000000005918, ERR=0x0000000000000014
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x000000004307f568)
0x000000004307f568: 00000030f492e410 0000000000000000
…
0x000000004307f748: 0000ffff00001fa0 0000000000000000
0x000000004307f758: 0000000000000000 0000000000000000
Instructions: (pc=0x0000002b263077f4)
0x0000002b263077e4:
Stack: [0x0000000042f81000,0x0000000043082000), sp=0x000000004307f568, free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C 0x0000002b263077f4
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j java.net.SocketOutputStream.socketWrite0(Ljava/io/FileDescriptor;[BII)V+0
v ~C2IAdapter
J java.net.SocketOutputStream.write([BII)V
v ~I2CAdapter
j com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(Ljava/io/OutputStream;[BI I)V+5
j com.sun.net.ssl.internal.ssl.OutputRecord.write(Ljava/io/OutputStream;)V+339
j com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(Lcom/sun/net/ssl/interna l/ssl/OutputRecord;)V+143
j com.sun.net.ssl.internal.ssl.SSLSocketImpl.sendAlert(BB)V+223
j com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(BLjava/lang/String;Ljava/lang/ Throwable;)V+77
j com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(BLjava/lang/Throwable;)V+4
j com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(Ljava/lang/Exception ;)V+108
j com.sun.net.ssl.internal.ssl.AppInputStream.read([BII)I+82
v ~C2IAdapter
J sun.nio.cs.StreamDecoder$CharsetSD.readBytes()I
J sun.nio.cs.StreamDecoder$CharsetSD.implRead([CII)I
J sun.nio.cs.StreamDecoder.read([CII)I
v ~I2CAdapter
j java.io.InputStreamReader.read([CII)I+7
j org.xmlpull.mxp1.MXParser.fillBuf()V+319
v ~C2IAdapter
J org.xmlpull.mxp1.MXParser.more()C
v ~I2CAdapter
j org.jivesoftware.wildfire.net.MXParser.nextImpl()I+1169
j org.xmlpull.mxp1.MXParser.nextToken()I+6
j org.dom4j.io.XMPPPacketReader.parseDocument()Lorg/dom4j/Document;+26
j org.jivesoftware.wildfire.net.SocketReader.readStream()V+16
j org.jivesoftware.wildfire.net.SocketReader.run()V+72
j org.jivesoftware.wildfire.net.SocketReader.run()V+72
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
P R O C E S S -
Java Threads: ( => current thread )
0x0000002b24478b90 JavaThread “Client SR - 2098097341” daemon
0x0000002b22f92100 JavaThread “Client SR - 509897723” daemon
0x0000002b22f3b750 JavaThread “Client SR - 1218988871” daemon
0x0000002b1f59d100 JavaThread “pool-10-thread-1”
0x0000002b1f59c650 JavaThread “Server SR - 158746702” daemon
0x0000002b23907020 JavaThread “Outgoing Server Reader” daemon
0x0000002b22f92ab0 JavaThread “pool-4-thread-5”
0x0000002b23906580 JavaThread “pool-9-thread-1”
0x0000002b23d07880 JavaThread “Server SR - 560983781” daemon
=>0x0000002b244a8f40 JavaThread “Client SR - 318402945” daemon
0x0000002b22661f20 JavaThread “pool-4-thread-4”
0x0000002b2386cb40 JavaThread “Client SR - 1778163592” daemon
0x0000002b226a6ca0 JavaThread “pool-4-thread-3”
0x0000002b22edc1f0 JavaThread “Client SR - 1561232967” daemon
0x0000002b22ed7f90 JavaThread “Client SR - 1746565407” daemon
0x0000002b229012f0 JavaThread “pool-4-thread-2”
0x0000002b22905960 JavaThread “pool-4-thread-1”
0x0000002b2269b4a0 JavaThread “Queued Packets Processor” daemon
0x0000002b226bb240 JavaThread “Client SR - 702703106” daemon
0x0000002b2291a330 JavaThread “Client SR - 865040129” daemon
…
Any clues?
Hola Luis,
this is usually a JVM bug. RHEL4 uses NPTL (Native Posix Thread Library) and as one can see here “Current thread … threadin_native” it was a native thread which caused the (Memory) Segmentation Violation.
You may want to run the JVM with the “-verbose:gc” option to monitor the memory usage to make sure that this is not a memory issue.
A kernel update (or downgrade if none available of using a former 150 JRE version) could solve the problem (or make it even worse).
Redhat and Sun should be the ones to solve this problem.
LG
Moin it2000,
That’'s exactly what gato told me some time ago. I already reported the bug to Sun without much luck. Enabling verbose on the garbage collector makes the crash happen sooner, but does not show any relevant information (so, as you said, no memory leaks). I already tried different JRE versions and different kernels (from both 2.4 and 2.6 series), so I guess this is one of those really uncommon bugs. I wonder how many people are running wildfire on RHEL 3/4 systems.
Thanks anyway. Regards,
Solved the problem. It was really awful.
We are using the NativeAuth provider using shaj, which provides PAM auth on UNIX systems. It was configured to authenticate using a custom auth pam module, which in turn used a custom authentication library. The whole problem was due to this authentication library setting a signal handler (via the signal() function) for SIGPIPE. When a Wildfire socket was closed unexpectely, the authentication library handler for the signal was being called, instead of the JVM’'s.
Removed the signal() from our authentication library and now Wildfire is running fine.
Thanks for all your help.
Hola Luis,
Excelente noticia!!!
Saludos,
– Gato