Memory leak

Hi all.

Maybe I should have post this message here before: http://www.jivesoftware.org/community/thread.jspa?threadID=17714&tstart=0 .

As it2000 advise me, I going to take dumps when the problem will arise again. Can I do something else?

Regards.

Yes, whatever diagnostic information you can get would be great.

Regards,

Matt

Argg. The JVM crashes on the kill -3… Too bad. However I hope there is enough informations.

I’'ve got the dump.hprof file, what can I do now?

you can send it by email to Matt or Gato. Check their profiles for addresses.

Ok, here is the hat result on the dump: http://ondolinde.dyndns.org:7000/ . I don’‘t know if it will help, I can’'t really read the informations. 4 dumps was taken: 3 shortly after startup, one after running for a while (approx 75% heap was used), and the last one (which causes the crash).

I’‘m trying another approach: i’'m trying the jrockit memory leak detector.

Regards.

Hi Aurélien,

to be honest I can’'t figure out how this dump information could be useful, but I took only a short look and missing some time for a detailed review. I did not found an option to list more than the direct reference and the size of it. Calculating the size of all associated references to one element will be more than very hard work.

LG

Hello.

First of all, sorry for the unuseful dump. I launched wilfdire with JRockit 1.5 and enabled the Memory Leak detector. After 13 hours, the char[] type is taking 58% of the memory, starting at 44%.

See screenshots at http://zorel.org/static/wildfire/

Capture-1/4 were taken after at different times (last one after 13 hours).

It seems org.jivesoftware.wildfire.net.MXParser references a lot of char[] (see 1.png).

I will see for improving this tool usage, for giving more informations.

Hey Aurélien,

Thanks for the bug report. I created JM-558 for this problem and checked in a fix for this issue. You may want to try again with the next nightly build. I’'m now profiling other parts of the server to confirm that there are no more leakings.

Thanks,

– Gato

Great! No wonder why I use Wildfire: the community and the developpers responsiveness is perfect.

Hi Gaston,

You know about this issue already. Changed OS (now RHEL4 on x64, RHEL3 on i386 before) and the VM continues crashing. I wonder if it is related to the memory leak described here in this thread.

  1. An unexpected error has been detected by HotSpot Virtual Machine:

  1. SIGSEGV (0xb) at pc=0x0000002b263077f4, pid=7061, tid=1124604256

  1. Java VM: Java HotSpot™ 64-Bit Server VM (1.5.0_06-b05 mixed mode)

  2. Problematic frame:

  3. C 0x0000002b263077f4


T H R E A D -


Current thread (0x0000002b244a8f40): JavaThread “Client SR - 318402945” daemon

siginfo:si_signo=11, si_errno=0, si_code=1, si_addr=0x0000002b263077f4

Registers:

RAX=0x0000000000000000, RBX=0x000000000000002c, RCX=0x0000002b263077f4, RDX=0x000000004307f570

RSP=0x000000004307f568, RBP=0x0000000000000000, RSI=0x000000004307f6a0, RDI=0x000000000000000d

R8 =0x0000000000000000, R9 =0x0000000000000000, R10=0x0000000000000000, R11=0x0000000000000246

R12=0x000000000000001d, R13=0x000000004307fa70, R14=0x0000000000000000, R15=0x0000002b244a8f40

RIP=0x0000002b263077f4, EFL=0x0000000000010246, CSGSFS=0x0000000000005918, ERR=0x0000000000000014

TRAPNO=0x000000000000000e

Top of Stack: (sp=0x000000004307f568)

0x000000004307f568: 00000030f492e410 0000000000000000

0x000000004307f748: 0000ffff00001fa0 0000000000000000

0x000000004307f758: 0000000000000000 0000000000000000

Instructions: (pc=0x0000002b263077f4)

0x0000002b263077e4:

Stack: [0x0000000042f81000,0x0000000043082000), sp=0x000000004307f568, free space=1017k

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)

C 0x0000002b263077f4

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)

j java.net.SocketOutputStream.socketWrite0(Ljava/io/FileDescriptor;[BII)V+0

v ~C2IAdapter

J java.net.SocketOutputStream.write([BII)V

v ~I2CAdapter

j com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(Ljava/io/OutputStream;[BI I)V+5

j com.sun.net.ssl.internal.ssl.OutputRecord.write(Ljava/io/OutputStream;)V+339

j com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(Lcom/sun/net/ssl/interna l/ssl/OutputRecord;)V+143

j com.sun.net.ssl.internal.ssl.SSLSocketImpl.sendAlert(BB)V+223

j com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(BLjava/lang/String;Ljava/lang/ Throwable;)V+77

j com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(BLjava/lang/Throwable;)V+4

j com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(Ljava/lang/Exception ;)V+108

j com.sun.net.ssl.internal.ssl.AppInputStream.read([BII)I+82

v ~C2IAdapter

J sun.nio.cs.StreamDecoder$CharsetSD.readBytes()I

J sun.nio.cs.StreamDecoder$CharsetSD.implRead([CII)I

J sun.nio.cs.StreamDecoder.read([CII)I

v ~I2CAdapter

j java.io.InputStreamReader.read([CII)I+7

j org.xmlpull.mxp1.MXParser.fillBuf()V+319

v ~C2IAdapter

J org.xmlpull.mxp1.MXParser.more()C

v ~I2CAdapter

j org.jivesoftware.wildfire.net.MXParser.nextImpl()I+1169

j org.xmlpull.mxp1.MXParser.nextToken()I+6

j org.dom4j.io.XMPPPacketReader.parseDocument()Lorg/dom4j/Document;+26

j org.jivesoftware.wildfire.net.SocketReader.readStream()V+16

j org.jivesoftware.wildfire.net.SocketReader.run()V+72

j org.jivesoftware.wildfire.net.SocketReader.run()V+72

j java.lang.Thread.run()V+11

v ~StubRoutines::call_stub


P R O C E S S -


Java Threads: ( => current thread )

0x0000002b24478b90 JavaThread “Client SR - 2098097341” daemon

0x0000002b22f92100 JavaThread “Client SR - 509897723” daemon

0x0000002b22f3b750 JavaThread “Client SR - 1218988871” daemon

0x0000002b1f59d100 JavaThread “pool-10-thread-1”

0x0000002b1f59c650 JavaThread “Server SR - 158746702” daemon

0x0000002b23907020 JavaThread “Outgoing Server Reader” daemon

0x0000002b22f92ab0 JavaThread “pool-4-thread-5”

0x0000002b23906580 JavaThread “pool-9-thread-1”

0x0000002b23d07880 JavaThread “Server SR - 560983781” daemon

=>0x0000002b244a8f40 JavaThread “Client SR - 318402945” daemon

0x0000002b22661f20 JavaThread “pool-4-thread-4”

0x0000002b2386cb40 JavaThread “Client SR - 1778163592” daemon

0x0000002b226a6ca0 JavaThread “pool-4-thread-3”

0x0000002b22edc1f0 JavaThread “Client SR - 1561232967” daemon

0x0000002b22ed7f90 JavaThread “Client SR - 1746565407” daemon

0x0000002b229012f0 JavaThread “pool-4-thread-2”

0x0000002b22905960 JavaThread “pool-4-thread-1”

0x0000002b2269b4a0 JavaThread “Queued Packets Processor” daemon

0x0000002b226bb240 JavaThread “Client SR - 702703106” daemon

0x0000002b2291a330 JavaThread “Client SR - 865040129” daemon

Any clues?

Hola Luis,

this is usually a JVM bug. RHEL4 uses NPTL (Native Posix Thread Library) and as one can see here “Current thread … threadin_native” it was a native thread which caused the (Memory) Segmentation Violation.

You may want to run the JVM with the “-verbose:gc” option to monitor the memory usage to make sure that this is not a memory issue.

A kernel update (or downgrade if none available of using a former 150 JRE version) could solve the problem (or make it even worse).

Redhat and Sun should be the ones to solve this problem.

LG

Moin it2000,

That’'s exactly what gato told me some time ago. I already reported the bug to Sun without much luck. Enabling verbose on the garbage collector makes the crash happen sooner, but does not show any relevant information (so, as you said, no memory leaks). I already tried different JRE versions and different kernels (from both 2.4 and 2.6 series), so I guess this is one of those really uncommon bugs. I wonder how many people are running wildfire on RHEL 3/4 systems.

Thanks anyway. Regards,

Solved the problem. It was really awful.

We are using the NativeAuth provider using shaj, which provides PAM auth on UNIX systems. It was configured to authenticate using a custom auth pam module, which in turn used a custom authentication library. The whole problem was due to this authentication library setting a signal handler (via the signal() function) for SIGPIPE. When a Wildfire socket was closed unexpectely, the authentication library handler for the signal was being called, instead of the JVM’'s.

Removed the signal() from our authentication library and now Wildfire is running fine.

Thanks for all your help.

Hola Luis,

Excelente noticia!!!

Saludos,

– Gato