Openfire 4.8.1 - Memory leak?


I could see preAuthenticatedSession in LocalSessionManager class with thousands of sessions not removed. Even it contains sessions with CLOSED state. Is this expected?

Please note: I have upgraded netty in my setup to 4.1.108-Final (which was the fix for other memory issue in 4.8.1)

Interesting! I’ve never ran into this. Can you share how you found this issue, and what made you think that upgrading Netty would resolve it?

Hi Guus
Netty has nothing to with this issue. As we know 4.8.1 has already memory issue and fix is there in 4.8.2 ([OF-2818] - Ignite Realtime Jira).
Added the note to clarify that, it is not same issue we are hitting here.

We got this problem in out load setup, where around 3k of user login. Got the out of memory not able to create native thread issue. Then when I checked heapdump with before OOM and after - two memory snapshots, first one has preAuthenticatedSession around 3k (6MB) values and later has 48k values (190MB of retained heap)

Thanks

From a cursory look through the code, I’m not seeing an immediate problem. Can you share exactly how this problem is reproduced? Do the clients that connect finish authentication? Do the entries in preAuthenticatedSession (eventually) disappear, when timeouts occur?

I think we are experiencing this same issue, with a 4.9.0 deployment.

We run an OpenFire instance that peaks up to ~50k sessions daily for busiest hours. We have both web and native clients connecting using Strophe.js and Qt XMPP library, respectively.

The server runs on a ~40Gb heap size and, after a few days working, runs out of heap. This was not happening with our previous 4.7.3 deployment, please see the following capture of our Grafana monitor for 30 days. The flat part is before the 4.9 upgrade. The peaks down are due to server restarts we triggered manually restarting the service.

We have managed to analyze a heap dump using MAT (I could provide any further details/files). Attaching three reports the tool generates, in HTML form. “Leak_Suspects” speak of " preAuthenticatedSessions as per this image:

We cannot reproduce this problem, but it happens in our production server just after a few days working, heap usage increases every day and does not get any lower.

Our XMPP clients sometimes disconnect/logout explicitly, and others just go away due to network errors or just application shutdowns. The count of sessions does decrease every day, as seen in this other Grafana panel (last week period):


I am attaching 3x zipfiles with the MAT generated reports where this all can be seen in more detail.

Could this be a leak in the server present in recent versions? Again, this was not happening for us in 4.7.3.

Sure count on us for anything else we can help with, thanks!

openfire-memory_Leak_Suspects.zip (115.3 KB)
openfire-memory_System_Overview.zip (80.6 KB)
openfire-memory_Top_Components.zip (493.4 KB)

2 Likes

@guus (hope its okay to @-cite you :pray:)

I have been looking at the OpenFire code and there seems to be some inconsistencies handling preAuthenticatedSessions, precisely. See if I can explain:

The map is only put in two places:

  • org.jivesoftware.openfire.SessionManager#createClientSession(org.jivesoftware.openfire.Connection, org.jivesoftware.openfire.StreamID, java.util.Locale)
  • org.jivesoftware.openfire.SessionManager#createClientHttpSession

Both puts are in terms of session.getAddress().getResource().

But after authentication, org.jivesoftware.openfire.session.LocalClientSession#setAuthToken(org.jivesoftware.openfire.auth.AuthToken, java.lang.String) is called, which does setAddress() of a new address, post authentication.

At this point, the preAuthenticatedSessions map contains an entry for a resource that comes from an address that has just been changed, and lost track.

Subsequent operations (more importantly, removals) in the map using the resource of this new set address will then be of no effect (and entries will never be removed and remain forever).

Also, the remove operation done on preAuthenticatedSessions in org.jivesoftware.openfire.SessionManager#addSession is using the stream id as the key. But the map is never put anything using the stream id as key, only the resource.

I think this then leads (and explains) to entries never being deleted from preAuthenticatedSessions.

2 Likes

Thanks for the detailed analysis Miguel! I’ve created a new ticket in our issue tracker for this: [OF-2896] - Ignite Realtime Jira

2 Likes

Great! Glad I could help.

@dhina_apec ping just in case you’re not notified here

And, for the time being, this is how we have patched the server: (patch file attacched, too)

OF-2896.patch (3.2 KB)

diff --git a/xmppserver/src/main/java/org/jivesoftware/openfire/SessionManager.java b/xmppserver/src/main/java/org/jivesoftware/openfire/SessionManager.java
index 79a2c34b3..b69f2ba33 100644
--- a/xmppserver/src/main/java/org/jivesoftware/openfire/SessionManager.java
+++ b/xmppserver/src/main/java/org/jivesoftware/openfire/SessionManager.java
@@ -691,11 +691,12 @@ public class SessionManager extends BasicModule implements ClusterEventListener
      *
      * @param session the session that was authenticated.
      */
-    public void addSession(LocalClientSession session) {
+    public void addSession(LocalClientSession session, JID previousAddress) {
         // Add session to the routing table (routing table will know session is not available yet)
         routingTable.addClientRoute(session.getAddress(), session);
         // Remove the pre-Authenticated session but remember to use the temporary ID as the key
-        localSessionManager.getPreAuthenticatedSessions().remove(session.getStreamID().toString());
+        localSessionManager.getPreAuthenticatedSessions().remove(previousAddress.getResource());
+
         SessionEventDispatcher.EventType event = session.getAuthToken().isAnonymous() ?
                 SessionEventDispatcher.EventType.anonymous_session_created :
                 SessionEventDispatcher.EventType.session_created;
diff --git a/xmppserver/src/main/java/org/jivesoftware/openfire/session/LocalClientSession.java b/xmppserver/src/main/java/org/jivesoftware/openfire/session/LocalClientSession.java
index c977d26e1..f4c279a78 100644
--- a/xmppserver/src/main/java/org/jivesoftware/openfire/session/LocalClientSession.java
+++ b/xmppserver/src/main/java/org/jivesoftware/openfire/session/LocalClientSession.java
@@ -579,6 +579,7 @@ public class LocalClientSession extends LocalSession implements ClientSession {
         } else {
             jid = new JID(auth.getUsername(), getServerName(), resource);
         }
+        final JID previousAddress = getAddress();
         setAddress(jid);
         authToken = auth;
         setStatus(Session.Status.AUTHENTICATED);
@@ -588,7 +589,7 @@ public class LocalClientSession extends LocalSession implements ClientSession {
             setDefaultList( PrivacyListManager.getInstance().getDefaultPrivacyList( auth.getUsername() ) );
         }
         // Add session to the session manager. The session will be added to the routing table as well
-        sessionManager.addSession(this);
+        sessionManager.addSession(this, previousAddress);
     }

     @Override
@@ -607,11 +608,12 @@ public class LocalClientSession extends LocalSession implements ClientSession {
     public void setAnonymousAuth() {
         // Anonymous users have a full JID. Use the random resource as the JID's node
         String resource = getAddress().getResource();
+        final JID previousAddress = getAddress();
         setAddress(new JID(resource, getServerName(), resource, true));
         setStatus(Session.Status.AUTHENTICATED);
         authToken = AuthToken.generateAnonymousToken();
         // Add session to the session manager. The session will be added to the routing table as well
-        sessionManager.addSession(this);
+        sessionManager.addSession(this, previousAddress);
     }

     /**

I’ve opted for a different approach, which can be evaluated in OF-2896: Fix memory leak when dealing with pre-authenticated Sessions by guusdk · Pull Request #2568 · igniterealtime/Openfire · GitHub

I would appreciate a review!

This change will likely be part of Openfire 4.9.1.

Openfire 4.9.1, that contains the fix for this problem, has now been released! Please upgrade and let us know if that fixes your problem once and for all!

1 Like

That is great, thanks so much!

We will update our server ASAP and will report back with our observations.