powered by Jive Software

Lot of Memory Leaks with Smack 3.4.0

Hi,

I’ve been experiencing a lot of Memory Leaks with Smack 3.4.0.

If I run the following test class, a lot of instances are kept in memory. See the attached VisualVM image.

I suspect the listeners are not cleared during disconnecting.

These problem didn’t occur with Smack 3.3.1.

public class SmackTest {

public static void main(String[] args) {

// Create the main application thread, that will run forever and will keep the program alive.

System.out.println(SmackConfiguration.getVersion());

Executors.newFixedThreadPool(1, new ThreadFactory() {

@Override

public Thread newThread(Runnable r) {

return new Thread(r, “Main Application Thread”);

}

}).execute(new Runnable() {

@Override

public void run() {

for (int i = 0; i < 100; i++) {

Connection connection = new XMPPConnection(new ConnectionConfiguration(“server”, 5222));

System.out.println(“Connecting…”);

try {

connection.connect();

} catch (XMPPException e) {

e.printStackTrace();

} finally {

connection.disconnect();

}

}

}

});

}

}

The idea is that the listeners should get gc’ed once their connection instance is gc’ed.

You create 100 Connection instances, but I only see max 4 instances per class type. I guess that if there was a real memory leak, we would see much more.

But let’s try to investigate this further. Could you:

  • use a tool like Eclipse MAT to see if there are GC roots for the instances, and if so, what these roots are
  • Add something like System.gc() at the start of the loop (maybe the instances where just not gc’ed yet)

I have currently not the time and motivation to use Eclipse MAT because I don’t know it.

I’ve pressed the System.gc() button in VisialVM to see if somthing changes, but it doesn’t.

(using Smack 3.3.1 it releases the instances).

The problem is probably that the connection cannot be GCed because it still has a listener references to the anonymous listener classes created by the managers.

I think this bug was introduces with SMACK-373

Not removing listeners at all is obviously also a bad idea.

EDIT:

Just tested my code above with System.gc() in every loop. Same result.

The problem is probably that the connection cannot be GCed because it still has a listener references to the anonymous listener classes created by the managers.
Why should that prevent the connection from being gc’ed? Neither the listeners nor their managers should have a strong reference to any GC root now. (Side note: That is why we use WeakHashMaps for the manager instances).

Not removing listeners at all is obviously also a bad idea.

That is not yet proven, so please don’t jump to quick conlusions. And there is good reason why we decided to keep the state between disconnect() and connect().

We still don’t know why, if there is really a memory leak, there are not thousends of instances, because every Connection instance has multiple listeners and managers (I’d say at least more then 10).

Just tested my code above with System.gc() in every loop. Same result.

System.gc() is only a suggestion to the VM to run the garbage collector. It’s by no means deterministic.

I’ll run some tests and have a look at the heap.

Why should that prevent the connection from being gc’ed? Neither the listeners nor their managers should have a strong reference to any GC root now. (Side note: That is why we use WeakHashMaps for the manager instances).

I believe it’s due to some cyclic referencing. The connection can’t be GCed because it has a reference to the listener, which in turn has a reference to the manager, which has again a reference to the connection.

connection ==> listener ==> manager ==> connection.

So the garbage collection can’t GC anything. I am no expert with GC, but I think this causes a leak.

I also think the WeakHashMap approach is dangerous, or at least I am not sure, you are aware of the following problem: Since the value (manager class) holds a strong reference to its key (connection), the map entry can’t be GCed in case the key (connection) has no other reference anymore.

See http://stackoverflow.com/questions/8051912/will-a-weakhashmaps-entry-be-collecte d-if-the-value-contains-the-only-strong-re

“Thus care should be taken to ensure that value objects do not strongly refer to their own keys, either directly or indirectly, since that will prevent the keys from being discarded.”

It works now, because managers manually delete the key on disconnection.

But actually, WeakHashMap doesn’t make sense then, if you manually remove keys anyway.

Good news, I found the problem. Until the fix is verified and commited, please don’t use 3.4.0 if you are affected by the memory leak. Logged as SMACK-540

Compare the shallow heap size of 100 Connections after connect() disconnect() without the fix

smack_w_ml1.png

and with the fix

smack_wo_ml1.png

What was the problem? In order to analyse memory leaks, we need to have a look at the heap and the objects within the heap.

Analyzing the heap:

smack_w_ml2.png

we found that there is still a strong reference to a Connection instance. This reference is from the nodeInformationProvider that MultiUserChat registered at ServiceDiscoveryManager. This should become a weak reference, like (most) references of Smack components to a Connection instance.

The same view with the fix:

smack_wo_ml2.png

Thanks for reporting the issue CSH.

1 Like

CSH wrote:

Why should that prevent the connection from being gc’ed? Neither the listeners nor their managers should have a strong reference to any GC root now. (Side note: That is why we use WeakHashMaps for the manager instances).

I also think the WeakHashMap approach is dangerous, or at least I am not sure, you are aware of the following problem:

Since the value (manager class) holds a strong reference to its key (connection), the map entry can’t be GCed in case the key (connection) has no other reference anymore.

It works now, because managers manually delete the key on disconnection.

But actually, WeakHashMap doesn’t make sense then, if you manually remove keys anyway.

No manager should hold a strong reference to the connection. Those where converted to weak references a while ago. If you still spot one, please notify us so that it can be fixed.

And if you see Managers deleting themselves from their instances map, it’s also a old design that hasn’t converted to the new one. They should not do that any more.

KeepAliveManager is probably using old design. Though I don’t know if this can be considered a “real” manager, but uses HashMap and removes itself.

KeepAliveManager is going to get removed anyways, as his functionality is replaced by PingManager

I’ve got two reports that the memory leak is fixed with the latest HEAD of the 3.4 branch (05ccada). Until 3.4.1 is officialy released, please use the nighlty builds. Here is a direct link to the build artifacts that contains the fix:

http://bamboo.igniterealtime.org/browse/SMACK-NIGHTLYBRANCH-1277/artifact/shared /Project-binary-files/

Also available on the nightly builds page off of downloads.

http://www.igniterealtime.org/downloads/nightly_smack.jsp

The fixed version is not yet available from that page (but soon).