Can you help me reproduce the problem on my end? The combination of file transfer and MUC room confuses me a little. Exactly what does you code do?
I’m currently looking at the heap dumps that you provided. What makes you conclude that the FileTransfer
objects are the cause of the memory leak?
Although both heapdumps show a fair amount of FileTransfer
objects (both have slightly over 1,000 objects), the retained heap (see below) for these is under half a megabyte. That’s well under 0.5% of the total heap size.
From JProfiler’s help files:
Shallow vs. Retained Heap
Shallow heap is the memory consumed by one object. An object needs 32 or 64 bits (depending on the OS architecture) per reference, 4 bytes per Integer, 8 bytes per Long, etc. Depending on the heap dump format the size may be adjusted (e.g. aligned to 8, etc…) to model better the real consumption of the VM.
Retained set of X is the set of objects which would be removed by GC when X is garbage collected.
Retained heap of X is the sum of shallow sizes of all objects in the retained set of X, i.e. memory kept alive by X.
In both heap dumps, I’m seeing a more likely candidate for a memory leak: both dumps have exactly 51 NioSocketSession
instances (which is a representation of a TCP connection). Their retained heap is significant: 61% in one heap, 72% of the other heap.
A significant amount of these instances have a retained heap that’s larger than one megabyte. The arbitrary selection that I reviewed all had that memory used by the writeRequestQueue
property.
From MINA’s javadoc on writeRequestQueue
’s getter:
(…) the queue that contains the message waiting for being written. As the reader might not be ready, it’s frequent that the messages aren’t written completely, or that some older messages are waiting to be written when a new message arrives. This queue is used to manage the backlog of messages.
I’ve used OQL to extract the buffered values of all of these messages, using this query: SELECT r.originalMessage.buf.hb.toString() FROM org.apache.mina.core.write.DefaultWriteRequest r
This query returns data from all DefaultWriteRequest
instances, the type that’s put in the queue. When this type is used elsewhere (which I doubt), the results might be skewed.
This is a dump of the results: dump.txt (28.3 MB)
They consists solely of stanzas related to file transfers, it seems. I’ve performed the following grep to find lines that include a stanza ID matching something like id="transfer3_1145492"
$ grep -v id=\"transfer dump.txt | wc -l
7
$ wc -l dump.txt
28939 dump.txt
Of those 7 lines that do not match, most of them are empty.
How to interpret all this? The problem appears to be located in the queue that holds outbound stanzas. Why these queues are filling up needs further analysis, but a good part of that will involve looking at the code that generates the data.
From the data, it’s clear that some kind of test is being performed. I’m thinking that I’m seeing clients that loop over a bit of code that perform a file transfer. My first thought is that it might be the test code itself that’s causing the problem - it appears that the client code is not reading all data fast enough, causing the server-sided write buffers to fill up.
Were all 51 socket connections still active at the time that the dumps were created? What happens with the memory after they disconnect? What happens if you keep them connected for a couple of minutes, but not have them continuously push new file transfer requests any more? I wonder if things “catch up”