powered by Jive Software

Memory leak in Openfire 4.4.4

Hallo,

I have another evidence of memory leaks in OF 4.4.4 (please see attachments) to support another finding mentioned here by suf126a.

Loadtest:
OS: Solaris 10 sparc
JDK: 1.8.0_191
50 concurrent users
5 chatrooms

The memory leak appears specifically when sending attachments (file transfers). A large number of FileTransfer objects appear to be kept in memory. A large number of DomainPairs also seem to exist (I 'll send screenshots in the near future).

StanzaHandler.processIQ() calls MetaFileTransferInterceptor.interceptPacket() for each IQ it processes (see profiler screenshot) which calls:

FileTransfer transfer = createFileTransfer(from, to, childElement);
and
acceptIncomingFileTransferRequest(transfer)

These file transfers seem to be cached in method acceptIncomingFileTransferRequest(transfer):
cacheFileTransfer(ProxyConnectionManager.createDigest(streamID, from, to), transfer)
and Cache seems to work fine (it autocleans itself after a while).

So, I wonder who keeps references to these FileTransfer objects.

FileTransfer references escape only via retrieveFileTransfer() method but this is called by DefaultFileTransferManager.registerProxyTransfer() and cached again.

I hope further discussion to the topic will help find the problem.

openfirememoryleak.zip (309.3 KB)

Kind regards,

John

1 Like

Interesting. Can you provide a memory dump that contains the instances that are leaking? I’d like to find out what keeps references to them open.

I 'm afraid I won’t be able to provide you with a heap dump (I tried a number of things but it won’t be possible to take it out of our system). So, I 'm afraid we have to continue with screenshots.

attachments(1).zip (257.8 KB)

If these won’t help, I can provide more later.

Thanks.

Please find 2 heapdumps attached.
heapdump-1580507303881-new.zip (14.3 MB) heapdump-1580506589266-new.zip (14.9 MB)

Hallo Guus. What exactly are you caching may I ask? I send the same file 3 times between the same 2 users, and DefaultFileTransferManager.cacheFileTransfer() creates a different key each time, as a result the same file is cached 3 times in fileTransferMap
Cache. If the purpose was to reuse the cached filetransfer, then why use a different key each time? Then, what is the purpose to cache file transfers in the first place?

The Cache ‘autocleans’ itself after a while, of course, but when you ‘bombardise’ the server with attachments, then the cache can fill up after a while if the autoclean doesn’t happen quickly enough.

Can you help me reproduce the problem on my end? The combination of file transfer and MUC room confuses me a little. Exactly what does you code do?

I’m currently looking at the heap dumps that you provided. What makes you conclude that the FileTransfer objects are the cause of the memory leak?

Although both heapdumps show a fair amount of FileTransfer objects (both have slightly over 1,000 objects), the retained heap (see below) for these is under half a megabyte. That’s well under 0.5% of the total heap size.

From JProfiler’s help files:

Shallow vs. Retained Heap

Shallow heap is the memory consumed by one object. An object needs 32 or 64 bits (depending on the OS architecture) per reference, 4 bytes per Integer, 8 bytes per Long, etc. Depending on the heap dump format the size may be adjusted (e.g. aligned to 8, etc…) to model better the real consumption of the VM.

Retained set of X is the set of objects which would be removed by GC when X is garbage collected.

Retained heap of X is the sum of shallow sizes of all objects in the retained set of X, i.e. memory kept alive by X.

In both heap dumps, I’m seeing a more likely candidate for a memory leak: both dumps have exactly 51 NioSocketSession instances (which is a representation of a TCP connection). Their retained heap is significant: 61% in one heap, 72% of the other heap.

A significant amount of these instances have a retained heap that’s larger than one megabyte. The arbitrary selection that I reviewed all had that memory used by the writeRequestQueue property.

From MINA’s javadoc on writeRequestQueue's getter:

(…) the queue that contains the message waiting for being written. As the reader might not be ready, it’s frequent that the messages aren’t written completely, or that some older messages are waiting to be written when a new message arrives. This queue is used to manage the backlog of messages.

I’ve used OQL to extract the buffered values of all of these messages, using this query: SELECT r.originalMessage.buf.hb.toString() FROM org.apache.mina.core.write.DefaultWriteRequest r This query returns data from all DefaultWriteRequest instances, the type that’s put in the queue. When this type is used elsewhere (which I doubt), the results might be skewed.

This is a dump of the results: dump.txt (28.3 MB)

They consists solely of stanzas related to file transfers, it seems. I’ve performed the following grep to find lines that include a stanza ID matching something like id="transfer3_1145492"

$ grep -v id=\"transfer dump.txt | wc -l
7
$ wc -l dump.txt
28939 dump.txt

Of those 7 lines that do not match, most of them are empty.

How to interpret all this? The problem appears to be located in the queue that holds outbound stanzas. Why these queues are filling up needs further analysis, but a good part of that will involve looking at the code that generates the data.

From the data, it’s clear that some kind of test is being performed. I’m thinking that I’m seeing clients that loop over a bit of code that perform a file transfer. My first thought is that it might be the test code itself that’s causing the problem - it appears that the client code is not reading all data fast enough, causing the server-sided write buffers to fill up.

Were all 51 socket connections still active at the time that the dumps were created? What happens with the memory after they disconnect? What happens if you keep them connected for a couple of minutes, but not have them continuously push new file transfer requests any more? I wonder if things “catch up”