Potential MUC Memory Leak?

I believe that MultiUserChatServerImpl contains a small memory leak if the MUC idle timeout is set to -1 (disabled).

It appears that the only time removeUser() is called is from the UserTimeoutTask, which is not run if that option is disabled. This means that over time if you are generating lots of new user accounts like we are, and have heavy usage of MUC, you will run out of memory since the ‘users’ map in MultiUserChatServerImpl will grow infinitely.

I was able to fix this by modifying the source code to make removeUser() public and calling it manually when a user goes offline, but this is obviously a poor solution. Maybe a better one would be to have the user cleanup task always run and remove users who are not in any rooms regardless of the value of the ‘xmpp.muc.tasks.user.timeout’ property.

Any thoughts on this?

No thoughts on a permanent fix for this issue? Should I just create a ‘best effort’ patch and post it, would that be helpful?

Hey BenV,

I like your last proposal. If you can implement it and send us the patch then we will gladly review it and incorporate it.

Thanks,

– Gato

I’ll take a stab at it!

Here is a solution which fixes the leak for us, let me know if this seems sane to you or not Gato, this is the checkForTimedOutUsers() function in MultiUserChatServerImpl.java:

private void checkForTimedOutUsers() {         final long deadline = System.currentTimeMillis() - user_idle;         for (LocalMUCUser user : users.values()) {             try {                 // If user is not present in any room then remove the user from                 // the list of users                 if (!user.isJoined()) {                     removeUser(user.getAddress());                     continue;                 }                                 // Do nothing if this feature is disabled (i.e USER_IDLE equals -1)                 if( user_idle == -1 ){                     continue;                 }                                 if ( user.getLastPacketTime() < deadline) {                     // Kick the user from all the rooms that he/she had previuosly joined                     MUCRoom room;                     Presence kickedPresence;                     for (LocalMUCRole role : user.getRoles()) {                         room = role.getChatRoom();                         try {                             kickedPresence =                                     room.kickOccupant(user.getAddress(), null, null);                             // Send the updated presence to the room occupants                             room.send(kickedPresence);                         }                         catch (NotAllowedException e) {                             // Do nothing since we cannot kick owners or admins                         }                     }                 }             }             catch (Throwable e) {                 Log.error(LocaleUtils.getLocalizedString("admin.error"), e);             }         }     }

You’ll notice that basically the if(user_idle == -1) check has been moved after the logic which checks to see if a user is still in any rooms. This will allow users who are not in any rooms to be removed from the users map, which should fix the leak. Let me know what you think!

Not sure why this code window is showing up so narrow, sorry about that

Don’t want this to get lost in all the 3.4.4 excitement, bump!

Hey BenV,

This issue was filed as JM-1251 and your fixed was checked in for Openfire 3.5.0.

Thanks,

– Gato

Gato, just an update on this. I have been running with this patch in production for a little over a week now, and it has GREATLY helped our memory consumption. Looking forward to having this in the official release!

The implemented patch fixes but a part of the problem. See JM-1312

Hey Guus,

Thanks for catching that one. Your suggested solution was correct. The fix was applied to Openfire 3.5.0 RC2 to be released later today.

Regards,

– Gato

Hey Guus, just wanted to say thanks a lot for noticing this. I FINALLY got around to installing 3.4.5 and noticed this incorrect fix (continue being switched with return), and was about to write up a report for it (which would have been sadly too late for 3.5.0) so I just wanted to thank you for taking charge on this issue