Stream Management bug?

I’ve been testing a lot of new clients lately that support XEP-0198 Stream Management. I am not sure if it is a bug in Openfire or the clients (Monal, ChatSecure, Siskin IM) but it seems like when the connection is broken, sometimes they go into a “detached” state until Openfire determines the connection is dead and closes it (which I believe is the intended behavior), and sometimes (more often than not) it goes into an “Invalid session/connection” which stays open forever and cannot be closed from the Client Session screen. Curiously, if/when the client reconnects, the “session” returns to active for the duration the client is active, then usually returns to “Invalid session/connection.”

If I disable Stream Management, the connections are terminated by Openfire as soon as the connection drops. There seems to be no difference to the end user client though in terms of re-connection speed or message delivery, although from what I understand, XEP-0198 is desirable to have enabled…

The only thing that I see logged when this happens is this:

|2020.04.25 18:36:00 WARN [TaskEngine-pool-357]: org.jivesoftware.openfire.SessionManager - Not removing detached session ‘im.fqdn.com/73kzn8fzzy’ (73kzn8fzzy) that appears to have been replaced by another session.
2020.04.25 18:36:00 WARN [TaskEngine-pool-357]: org.jivesoftware.openfire.SessionManager - Not removing detached session ‘im.fqdn.com/2pzueibpxg’ (2pzueibpxg) that appears to have been replaced by another session.
2020.04.25 18:37:00 WARN [TaskEngine-pool-357]: org.jivesoftware.openfire.SessionManager - Not removing detached session ‘im.fqdn.com/6yrflxpr7w’ (6yrflxpr7w) that appears to have been replaced by another session.
2020.04.25 18:37:00 WARN [TaskEngine-pool-357]: org.jivesoftware.openfire.SessionManager - Not removing detached session ‘im.fqdn.com/83s28is4ns’ (83s28is4ns) that appears to have been replaced by another session.
2020.04.25 18:38:00 WARN [TaskEngine-pool-357]: org.jivesoftware.openfire.SessionManager - Not removing detached session ‘im.fqdn.com/75fpty9nx3’ (75fpty9nx3) that appears to have been replaced by another session.
2020.04.25 18:39:00 WARN [TaskEngine-pool-357]: org.jivesoftware.openfire.SessionManager - Not removing detached session ‘im.fqdn.com/dc3s8opt1’ (dc3s8opt1) that appears to have been replaced by another session.
2020.04.25 18:39:00 WARN [TaskEngine-pool-357]: org.jivesoftware.openfire.SessionManager - Not removing detached session ‘im.fqdn.com/60p35cd0z5’ (60p35cd0z5) that appears to have been replaced by another session.|

I have noticed this too. It seems to happen a lot more with iOS based clients than with Conversations (the only Android based client that I see in heavy use). I’m guessing it’s a bug in Openfire.

Yes, I am only observing this on iOS clients. Both Windows (using Miranda NG) and Android (Pix-Art Messenger) clients don’t seem to cause this behavior.

One thing I have anecdotally noticed, it seems like the first time an iOS client connects and disconnects from Openfire, it correctly goes into the “Detached” state until it times out and is closed, but after it reconnects and loses connection, it enters the “Invalid session/connection” and is un-killable without a restart.

Hi @Bill_Roland and @guus, I’d like to revive this tread, we are seeing this problem as well after we upgraded to 4.5.1 (from 4.1.3) and enabled stream management. In fact we are seeing it quite a bit and it is affecting our service. Here are some observations that may help diagnose and fix this problem:

In the admin console under Sessions the user name and resource appear correctly however in the log we have mangled JID’s (see below). See lines 1,2 and 4,5 and notice how user@ is missing and (streamID) is used instead of the resource. Also notice some correct deletes of Detached sessions (line 3, 6, 11, 12,13,14), notice how the FULL JID includes user@service.com/resource and how the (streamID) is not repeated inside the full jid.

  1. 18:31:37.933 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘service.com/p36iqpvxo’ (p36iqpvxo) has been detached for longer than PT3S and will be cleaned up.
  2. 18:31:37.933 [TaskEngine-pool-2864] WARN org.jivesoftware.openfire.SessionManager - Not removing detached session ‘service.com/p36iqpvxo’ (p36iqpvxo) that appears to have been replaced by another session.
  3. 18:31:37.933 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘userone@service.com/mobile’ (58tc2lpj4f) has been detached for longer than PT3S and will be cleaned up.
  4. 18:31:37.950 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘service.com/539mbil51’ (539mbil51) has been detached for longer than PT3S and will be cleaned up.
  5. 18:31:37.950 [TaskEngine-pool-2864] WARN org.jivesoftware.openfire.SessionManager - Not removing detached session ‘service.com/539mbil51’ (539mbil51) that appears to have been replaced by another session.
  6. 18:31:37.950 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘usertwo@service.com/mobile’ (2cs9dj5oj5) has been detached for longer than PT3S and will be cleaned up.
  7. 18:31:37.966 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘service.com/2b9ek0uc4g’ (2b9ek0uc4g) has been detached for longer than PT3S and will be cleaned up.
  8. 18:31:37.966 [TaskEngine-pool-2864] WARN org.jivesoftware.openfire.SessionManager - Not removing detached session ‘service.com/2b9ek0uc4g’ (2b9ek0uc4g) that appears to have been replaced by another session.
  9. 18:31:37.966 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘service.com/81nqao7706’ (81nqao7706) has been detached for longer than PT3S and will be cleaned up.
  10. 18:31:37.967 [TaskEngine-pool-2864] WARN org.jivesoftware.openfire.SessionManager - Not removing detached session ‘service.com/81nqao7706’ (81nqao7706) that appears to have been replaced by another session.
  11. 18:31:37.967 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘userthree@service.com/mobile’ (am00w1099r) has been detached for longer than PT3S and will be cleaned up.
  12. 18:31:37.983 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘userfour@service.com/mobile’ (9wz90343a4) has been detached for longer than PT3S and will be cleaned up.
  13. 18:31:38.006 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘userfive@service.com/mobile’ (2pjh2a3xtu) has been detached for longer than PT3S and will be cleaned up.
  14. 18:31:38.026 [TaskEngine-pool-2864] DEBUG org.jivesoftware.openfire.SessionManager - Detached session ‘usersix@service.com/mobile’ (98hem6gyld) has been detached for longer than PT3S and will be cleaned up.

It is as if something is mangling/screwing-up the session. The log lines above come from here: https://github.com/igniterealtime/Openfire/blob/ede343d2ca453a2920935ccabe6a4a58fcc3eb77/xmppserver/src/main/java/org/jivesoftware/openfire/SessionManager.java (lines 1727 and 1753).

We have tried to add an if statement to fix the “address” in the session so that it can be deleted:

                    if (!session.getAddress().toString().contains("@")) {
                        Log.debug("Mangled session "+ session.toString() + " Address: "+ session.getAddress());
                        JID newJID = new JID(session.getAddress().getNode() + "@" +
                            session.getAddress().getDomain() + "/" +
                            session.getAddress().getResource());

                        session.setAddress(newJID);
                        Log.debug("Fixed Mangled session Address: "+ session.getAddress() + " Stream:" + session.getStreamID() + "Servername:" + session.getServerName());

                    }

but this did not work. we got back NULL for address:

22:54:46.063 [TaskEngine-pool-67] DEBUG org.jivesoftware.openfire.SessionManager - Mangled session LocalClientSession{address=service.com/515mvh4ttf, streamID=515mvh4ttf, status=1 (connected), isSecure=false, isDetached=true, serverName=‘service.com’, isInitialized=false, hasAuthToken=true, peer address=’(not available)’, presence=’
22:54:46.063 [TaskEngine-pool-67] DEBUG org.jivesoftware.openfire.SessionManager - Fixed Mangled session Address: null@service.com/515mvh4ttf Stream:515mvh4ttfServername:service.com
22:54:46.065 [TaskEngine-pool-67] WARN org.jivesoftware.openfire.SessionManager - Not removing detached session ‘null@service.com/515mvh4ttf’ (515mvh4ttf) that appears to have been replaced by another session.

We tried this hack because while we were seeing the correct name and resource in the openfire admin gui (under Sessions), despite the session being in Client IP = “Invalid session/connection”. We looked at
Openfire-4.5.1/xmppserver/src/main/webapp/session-row.jspf and noticed that

<% String name = sess.getAddress().getNode(); %>
<%= StringUtils.escapeHTMLTags(sess.getAddress().getResource()) %>

so we thought the data in the session object may still be correct. But almost feels like there are two different session objects, one for the admin console and one that is broken in the back-end.

Here are our two questions:

  1. What could be mangaling the session so that it produced a broken JID as you saw in log lines above?
  2. In the cleanup code mentioned above (or somewhere else) how can we remove these sessions without a restart?

Any help would be appreciated.
Thank you.
DT

FYI: service.com is fictitious for our domain name. Also userone … usersix are fake to protect our user names.

be removed correctly. We got JSP file shows

Hi Daniel,

This, to me looks like https://issues.igniterealtime.org/browse/OF-2015 which has been fixed after the 4.5.1 release. Some other SM-related fixes went in, too.

Are you able to reproduce this with the 4.6.0-beta release that we did a couple of weeks ago?

hi @guus, thank you for the tips. Upgrading to 4.5.3 (from 4.5.1) fixed our Invalid Session/Connection problems it seems (after testing in dev for a day and just rolling it out to prod). Thank you that did the trick.

We looked at changelog but skipped over OF-2015 without reading inside to see that it has to do with Session Management.

Cheers,
DT.

1 Like

Good to hear that this resolved things for you! As you’re apparently in a position to quickly run tests on a dev environment, I’d like to invite you to try either 4.6.0-beta, or a late nightly build. I’d love to get more feedback on that, before we perform the full release! Would you be able to help us out with that?

What’s the timeline for 4.6.0 to go GA?

I looked at the changelog and it looks like some very interesting fixes. We will test it for sure, as there are some fixes in the RoutingTableImpl and OF-1888, OF-2012, OF-1975 may help. We have found a bug that may be in RoutingTableImpl, we have found that some “bad” sessions remain in the RoutingTable - see problem description here: Group presence and group messages not getting delivered, loosing customers

Do you think this will fix that problem?

Will report back here. Thx for all the help.
DT.