BUG in NIOConnection?

Hi,

My quite busy server powered by slightly modified Wildfire 3.2.4 (sun jvm 1.6.0_02) crashes almost every night with OutOfMemoryError: Java heap space. Investigating the issue i dumped the full heap. I can see that 94% of 1.5GB heap is retained by byte[] objects created by SimpleByteBufferAllocator. I’'m looking at the deliver function of the NIOConnection class:

public void deliver(Packet packet) throws UnauthorizedException {

if (isClosed()) {

backupDeliverer.deliver(packet);

}

else {

ByteBuffer buffer = ByteBuffer.allocate(4096);

buffer.setAutoExpand(true);

boolean errorDelivering = false;

try {

//XMLWriter xmlSerializer = new XMLWriter(buffer.asOutputStream(), new OutputFormat());

XMLWriter xmlSerializer = new XMLWriter(new ByteBufferWriter(buffer, encoder), new OutputFormat());

xmlSerializer.write(packet.getElement());

xmlSerializer.flush();

if (flashClient) {

buffer.put((byte) ‘’\0’’);

}

buffer.flip();

ioSession.write(buffer);

}

catch (Exception e) {

Log.debug(“Error delivering packet” + “\n” + this.toString(), e);

errorDelivering = true;

}

if (errorDelivering) {

close();

// Retry sending the packet again. Most probably if the packet is a

// Message it will be stored offline

backupDeliverer.deliver(packet);

}

}

}

According to MINA documentation we should release ByteBuffer manually if we allocated a buffer, but didn’‘t pass the buffer to any of IoSession.write(Object), IoFilter.NextFilter.filterWrite(IoSession,IoFilter.WriteRequest) or ProtocolEncoderOutput.write(ByteBuffer). So, theoretically, if there is an exception somewhere before ioSession.write(buffer); the buffer will never released? Shouldn’'t we call buffer.release() in the catch block?

It’‘s just my speculation, I’'m trying to figure out what the problem is almost for a week now. I set “xmpp.socket.heapBuffer” to false now to see what happens with memory usage tonight. Please let me know what do you think about it.

Thanks,

Eugene.

Quick update:

Using direct buffers the server crashes after an hour or so.

Crash log:

===================================================

  1. An unexpected error has been detected by Java Runtime Environment:

  1. java.lang.OutOfMemoryError: requested 32756 bytes for ChunkPool::allocate. Out of swap space?

  1. Internal Error (414C4C4F434154494F4E0E4350500065), pid=1916, tid=4184

  1. Java VM: Java HotSpot™ Client VM (1.6.0_02-b05 mixed mode)

  2. If you would like to submit a bug report, please visit:

  3. http://java.sun.com/webapps/bugreport/crash.jsp


T H R E A D -


Current thread (0x68356400): JavaThread “CompilerThread0” daemon

Stack: [0x68510000,0x68560000)

Current CompileTask:

C1:1896 sun.nio.ch.Util.releaseTemporaryDirectBuffer(Ljava/nio/ByteBuffer;)V (130 bytes)


P R O C E S S -


Java Threads: ( => current thread )

0x6eff8400 JavaThread “btpool0-27”

0x74e86c00 JavaThread “btpool0-25”

0x683a6800 JavaThread “client-17” daemon

0x688a8400 JavaThread “client-16” daemon

0x686b8800 JavaThread “client-15” daemon

0x683c7400 JavaThread “timer-wildfire” daemon

0x6838d000 JavaThread “client-14” daemon

0x687be400 JavaThread “client-13” daemon

0x68899400 JavaThread “client-12” daemon

0x68879400 JavaThread “client-11” daemon

0x687b6400 JavaThread “SocketAcceptorIoProcessor-0.3”

0x68868800 JavaThread “client-10” daemon

0x68795400 JavaThread “client-9” daemon

0x6838cc00 JavaThread “client-8” daemon

0x6877dc00 JavaThread “client-7” daemon

0x687b8800 JavaThread “client-6” daemon

0x68367000 JavaThread “client-5” daemon

0x68b63800 JavaThread “SocketAcceptorIoProcessor-0.2”

0x68b62800 JavaThread “client-4” daemon

0x68b24800 JavaThread “client-3” daemon

0x68a77400 JavaThread “SocketAcceptorIoProcessor-0.1”

0x68b5b800 JavaThread “client-2” daemon

0x68b5a800 JavaThread “client-1” daemon

0x68b14000 JavaThread “SocketAcceptorIoProcessor-0.0”

0x68b12800 JavaThread “btpool0-2 - Acceptor0 AdminConsolePlugin$JiveSslConnector @ 0.0.0.0:9091”

0x68b0e000 JavaThread “btpool0-1 - Acceptor0 SelectChannelConnector @ 0.0.0.0:9090”

0x68a9d800 JavaThread "btpool0-0 - Invalidator - "

0x68a8c400 JavaThread “pool-10-thread-1”

0x68a8d400 JavaThread “SocketSendingTracker” daemon

0x68a6d800 JavaThread “SocketAcceptor-1”

0x68a60000 JavaThread “SocketAcceptor-0”

0x68a6e400 JavaThread “PooledByteBufferExpirer-0” daemon

0x68a4d000 JavaThread “PubSub maintenance”

0x689d2800 JavaThread “Auditor”

0x687a2c00 JavaThread “MUC cleanup”

0x68a27400 JavaThread “Sessions cleanup”

0x689da400 JavaThread “Thread-0” daemon

0x683ac800 JavaThread “Exe4JStartupThread” daemon

0x6835b000 JavaThread “Low Memory Detector” daemon

=>0x68356400 JavaThread “CompilerThread0” daemon

0x68355400 JavaThread “Attach Listener” daemon

0x68347400 JavaThread “Finalizer” daemon

0x68343000 JavaThread “Reference Handler” daemon

0x003c6c00 JavaThread “main”

Other Threads:

0x6833e800 VMThread

0x68365400 WatcherThread

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

Heap

def new generation total 9216K, used 7020K [0x02aa0000, 0x03490000, 0x0a000000)

eden space 8256K, 77% used [0x02aa0000, 0x030e1738, 0x032b0000)

from space 960K, 64% used [0x032b0000, 0x03349bf0, 0x033a0000)

to space 960K, 0% used [0x033a0000, 0x033a0000, 0x03490000)

tenured generation total 121024K, used 99152K [0x0a000000, 0x11630000, 0x620a0000)

the space 121024K, 81% used [0x0a000000, 0x100d41b8, 0x100d4200, 0x11630000)

compacting perm gen total 16896K, used 16662K [0x620a0000, 0x63120000, 0x660a0000)

the space 16896K, 98% used [0x620a0000, 0x630e5990, 0x630e5a00, 0x63120000)

No shared spaces configured.

Dynamic libraries:

0x00400000 - 0x0043b000 D:\Servers\wildfire\bin\wildfire-service.exe

0x7c800000 - 0x7c8c0000 C:\WINDOWS\system32\ntdll.dll

0x77e40000 - 0x77f42000 C:\WINDOWS\system32\kernel32.dll

0x77f50000 - 0x77feb000 C:\WINDOWS\system32\ADVAPI32.DLL

0x77c50000 - 0x77cef000 C:\WINDOWS\system32\RPCRT4.dll

0x76f50000 - 0x76f63000 C:\WINDOWS\system32\Secur32.dll

0x77c00000 - 0x77c48000 C:\WINDOWS\system32\GDI32.dll

0x77380000 - 0x77411000 C:\WINDOWS\system32\USER32.dll

0x77ba0000 - 0x77bfa000 C:\WINDOWS\system32\msvcrt.dll

0x7c8d0000 - 0x7d0ce000 C:\WINDOWS\system32\SHELL32.DLL

0x77da0000 - 0x77df2000 C:\WINDOWS\system32\SHLWAPI.dll

0x76290000 - 0x762ad000 C:\WINDOWS\system32\IMM32.DLL

0x77420000 - 0x77523000 C:\WINDOWS\WinSxS\x86_Microsoft.Windows.Common-Controls_6595b64144ccf1df_6 .0.3790.3959_x-ww_D8713E55\comctl32.dll

0x6d7c0000 - 0x6da09000 c:\program files\java\jre1.6.0_02\bin\client\jvm.dll

0x76aa0000 - 0x76acd000 C:\WINDOWS\system32\WINMM.dll

0x7c340000 - 0x7c396000 c:\program files\java\jre1.6.0_02\bin\MSVCR71.dll

0x71bc0000 - 0x71bc8000 C:\WINDOWS\system32\rdpsnd.dll

0x771f0000 - 0x77201000 C:\WINDOWS\system32\WINSTA.dll

0x71c40000 - 0x71c97000 C:\WINDOWS\system32\NETAPI32.dll

0x76b70000 - 0x76b7b000 C:\WINDOWS\system32\PSAPI.DLL

0x6d310000 - 0x6d318000 c:\program files\java\jre1.6.0_02\bin\hpi.dll

0x6d770000 - 0x6d77c000 c:\program files\java\jre1.6.0_02\bin\verify.dll

0x6d3b0000 - 0x6d3cf000 c:\program files\java\jre1.6.0_02\bin\java.dll

0x6d7b0000 - 0x6d7bf000 c:\program files\java\jre1.6.0_02\bin\zip.dll

0x6d570000 - 0x6d583000 C:\Program Files\Java\jre1.6.0_02\bin\net.dll

0x71c00000 - 0x71c17000 C:\WINDOWS\system32\WS2_32.dll

0x71bf0000 - 0x71bf8000 C:\WINDOWS\system32\WS2HELP.dll

0x71b20000 - 0x71b61000 C:\WINDOWS\System32\mswsock.dll

0x76ed0000 - 0x76efa000 C:\WINDOWS\system32\DNSAPI.dll

0x76f70000 - 0x76f77000 C:\WINDOWS\System32\winrnr.dll

0x76f10000 - 0x76f3e000 C:\WINDOWS\system32\WLDAP32.dll

0x76f80000 - 0x76f85000 C:\WINDOWS\system32\rasadhlp.dll

0x68d40000 - 0x68d9a000 C:\WINDOWS\system32\hnetcfg.dll

0x71ae0000 - 0x71ae8000 C:\WINDOWS\System32\wshtcpip.dll

0x6d590000 - 0x6d599000 C:\Program Files\Java\jre1.6.0_02\bin\nio.dll

0x69030000 - 0x69065000 C:\WINDOWS\system32\rsaenh.dll

0x6d750000 - 0x6d758000 C:\Program Files\Java\jre1.6.0_02\bin\sunmscapi.dll

0x761b0000 - 0x76243000 C:\WINDOWS\system32\CRYPT32.dll

0x76190000 - 0x761a2000 C:\WINDOWS\system32\MSASN1.dll

VM Arguments:

jvm_args: -Dexe4j.isInstall4j=true -Dexe4j.isService=true -Dexe4j.moduleName=D:\Servers\wildfire\bin\wildfire-service.exe -Dexe4j.processCommFile=C:\WINDOWS\TEMP\e4j_p1916.tmp -Dexe4j.tempDir= -Dexe4j.unextractedPosition=0 -Dexe4j.consoleCodepage=cp0 -Xrs -Xms128m -Xmx1526m -XX:+HeapDumpOnOutOfMemoryError

java_command:

Launcher Type: generic

Environment Variables:

PATH=C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\Microsoft SQL Server\90\Tools\binn;c:\program files\java\jre1.6.0_02\bin

OS=Windows_NT

PROCESSOR_IDENTIFIER=x86 Family 6 Model 15 Stepping 6, GenuineIntel


S Y S T E M -


OS: Windows Server 2003 family Build 3790 Service Pack 2

CPU:total 4 (2 cores per cpu, 1 threads per core) family 6 model 15 stepping 6, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3

Memory: 4k page, physical 2097151k(2097151k free), swap 4194303k(4194303k free)

vm_info: Java HotSpot™ Client VM (1.6.0_02-b05) for windows-x86, built on Jun 14 2007 15:45:56 by “java_re” with unknown MS VC++:1310

Hey Eugene,

Direct buffers are not as fast and efficient as heap buffers. Try adding the following parameter -XX:+HeapDumpOnOutOfMemoryError to the startup script so that a heap dump is created when Java runs out of memory. You can use jhat to load and analyze the dump. You can read more about heap dumps here.

Regards,

– Gato

Hey Gato,

I know that direct buffers allocations/deallocations might be slower than heap buffers, thanks. Now I’'m not even looking for efficiency and performance. I need stability. I want to sleep at night

You can see in my first post that I have analyzed the dumped heap (with YourKit, not with jhat). Again, the memory leak is the result of 1.5GB of byte[] objects created by SimpleByteBufferAllocator. I believe that http://www.igniterealtime.org/forum/thread.jspa?threadID=25528 refers to the similar JVM bug.

The server eats up all the 1.5GB of memory only at the peak time. There are about ~500 concurrent users. Each of them after coming online builds its roster from a shared group. BTW, I had to modify Group class as the current implementation doesn’'t respect much the possibility of concurrent modifications of a shared group members. We already had a discussion about it here. After building the roster my client requests a vCard for each roster item. This flow produces a lot of requests (at least (online_users)^2) to the server. It seems like at some point connection manager gets crazy, takes all the memory available and freezes the server.

My original question was doesn’'t it look right to you to call buffer.release() in the catch block of the deliver function in the NIOConnection class?

Regards,

Eugene.

Hey Eugene,

I see that you are using Wildfire 3.2.4. Attached you will find the load statistics plugin that will print information about the server queues. I’'m wondering if your incoming MINA queues are filling up quickly. Anyway, a few OOM problems have been fixed since 3.2.4. Do you think that you can download 3.3.2 and give it a try?

Regards,

– Gato
loadStats.jar (8404 Bytes)

Hi Gato,

I have been collecting statistics for a few days. Results are quite expected. At some point all MINA queues are filling up as you guessed. The question now is what triggers it. I didn’'t find anything unusual on the net when it happens. The heap dump looks like there is either MINA or JVM bug.

I made a few amendments to the code. Do you think changes between 3.2.4 and 3.3.2 are significant and it is worth the effort to patch the latest code?

Thanks,

Eugene.

Hey Eugene,

I see that the outgoing queue is the one that is filling up. You may want to read this wiki entry that explains how to handle this situation.

Regards,

– Gato

Hey Gato,

Well, setting socket linger value to 3 helped. MINA queue is still sporadically filling up to 100% of 1.5Gb. But GC is able to clean it up back to ~600-700Mb. I think it means that at some point server stops closing sockets properly and allocated buffers remain retained. This is with 3.2.4.

But the picture is changed dramatically when I eventually upgraded to 3.3.2. The socket settings are: linger to 3 sec, send buffer 64K, core threads and max threads are 32 and TCPNoDelay is true. It is stable for almost two days, memory usage is about 100Mb and CPU utilization went down to 5-7% from 40-45% in wildfire 3.2.4! Hallelujah!

The wiki entry helped me a lot, thanks. I would add a link there to the full list of Openfire Properties. I just found it after searching in code parameter responsible for max threads value. It is also very nice to have parameters units (for example bytes for send and receive buffers) and their default values. I understand that socket parameters by default come from hosting OS. It would be also nice if server would print them to the Info log on startup.

Most important issues for me in the existing code are in Group implementation. Lack of MembersCollection.remove() method makes manipulating group members programmatically impossible. And a lot of ConcurrentModificationExceptions doesn’'t look good either. Do you have any plans to look at it in future versions?

Many Thanks,

Eugene.