Openfire 4.2.3 fails to detect disconnection and continues to show user as online

I have Openfire 4.0.2 which works fine when Smack client disconnect from server, the Openfire detects the disconnection and immediately shows user status as offline.

When I upgraded Openfire to 4.2.3, the user status continues to show online even after the same Smack client has disconnected from server. No change is made on the client.

To separate the issue, I have verified both Openfire versions by installing them afresh, and just created 1 user in the server. The issue remains.

1 thing worthy to note is that I enabled Stream Management on the client side in the above test. If I disable the client Stream Management, Openfire 4.2.3 can detect disconnection and shows user offline.

It seems like a bug in Openfire 4.2.3 that once Stream Management is enabled at client side, the Openfire fails to work preperly on detecting the disconnection.

Have anyone encoutered the problem? If yes, how did you resolve it?

It might be related to https://issues.igniterealtime.org/browse/OF-1497
You can test with the latest alpha version or wait for a beta of 4.3.0.

Don’t know what OS you are using, here are today’s builds for Windows https://bamboo.igniterealtime.org/browse/OPENFIRE-NIGHTLYMAVEN-574/artifact/shared/install4j-generated-media/

Ok, so it seems like there is a bug.

Will Openfire 4.3.0 be available in Linux? We may need to deploy it on Linux machine.

p/s: I have just tested Opefire 4.3.0. It’s not very positive. The Openfire server continues to show user online even the user has disconnected.

Then it is something new, but still related to Stream Management. Maybe you can provide the code how you enable SM in your app and how you disconnect (is it a programmatic disconnect, what stanzas are sent to the server, errors received). You may also check Openfire logs for anything relevant and post here, then maybe some of Openfire developers (i am not) could look into this and maybe file a new bug.

When the client disconnects, what I mean is the client app closed abruptly without programmatically disconnect xmpp connection. There is no stanzas (because resource has been released) being sent except there is aggressive exchange of TCP protocol which was captured on wireshark.

So my guess is Openfire will base on the TCP closing activity to decide client to go offline, but this won’t happen when Stream Management is enabled at client side.

I have checked the Openfire logs. Openfire 4.2.3 (whether with SM enabled or not) has additional DNS exception as compared to Opefire 4.0.2. But I don’t think this is the cause of the failure to detect the disconnection. Attached is the exception:

2018.11.16 11:46:36 ERROR [Jetty-QTP-AdminConsole-47]: org.jivesoftware.openfire.net.DNSUtil - Can't process DNS lookup!
javax.naming.OperationNotSupportedException: DNS service refused [response code 5]; remaining name '_xmpp-client._tcp.imserver.'
	at com.sun.jndi.dns.DnsClient.checkResponseCode(Unknown Source)
	at com.sun.jndi.dns.DnsClient.isMatchResponse(Unknown Source)
	at com.sun.jndi.dns.DnsClient.doUdpQuery(Unknown Source)
	at com.sun.jndi.dns.DnsClient.query(Unknown Source)
	at com.sun.jndi.dns.Resolver.query(Unknown Source)
	at com.sun.jndi.dns.DnsContext.c_getAttributes(Unknown Source)
	at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(Unknown Source)
	at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(Unknown Source)
	at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(Unknown Source)
	at javax.naming.directory.InitialDirContext.getAttributes(Unknown Source)
	at org.jivesoftware.openfire.net.DNSUtil.srvLookup(DNSUtil.java:203)
	at org.jivesoftware.openfire.admin.index_jsp._jspService(index_jsp.java:312)
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
	at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:118)
	at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:73)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:49)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:226)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:215)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:499)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
	at java.lang.Thread.run(Unknown Source)

I’m not sure if this is not how things should work with SM as intended. What problems do you have with the current behavior? Do you lose messages? I don’t have much experience with SM, but i imagine, that TCP connection should stay alive when client drops its connection, so it would reconnect faster, when network is up again (designed for mobile clients with weak mobile network).

It is strange though, that your problem appeared with 4.2.3, as SM was introduced with 4.0.0 version. If it is really a bug it might be worth to test with every version between 4.0.2 and 4.2.3 to help narrow the scope to find a change causing that. Could be a lot of work though. All versions are here https://github.com/igniterealtime/Openfire/releases

@guus @gdt @akrherz what do you think?

Well, the posted exception is a red herring; it’s from the admin console checking if the DNS lookup of the specified XMPP domain is working or not.

Unfortunately, I can’t help with SM at all, but it certainly sounds like there’s another bug there,

Greg

What amount of time did you wait for the client to disconnect? Openfire should periodically check if a connection is still viable, but that might take up to a few minutes.

Do you have a unit test available to reproduce the bug? If so, would you be willing to share?

Did you try a recent nightly build of (the as of yet unreleased) 4.3.0? Some things have changed in the last few weeks.

Does the problem go away if you disable stream management server-sided, by setting the property ‘stream.management.active’ to ‘false’?

Thinking about it, this may be fixed by https://github.com/igniterealtime/Openfire/pull/1209 (which has yet to be merged - I’m not sure any testing I can do would be worthwhile).

Greg

The only problem with the current behaviour is when client app shutdown abruptly without properly closing the xmpp connection. I believe Android designed the system such that it does close TCP connection during app shutdown. So I believe TCP closing activity signals the Openfire server to decide client to go offline. Otherwise there is no way server would know if client is offline almost instantly.

6 minutes like what is configured in the server settings ping idle timeout.

I will try to create one, but give me sometime.

Yes, tried. The result is not positive.

The problem disappeared.

In Openfire 4.5.0 and 4.4.5, another fix for Stream Management has been addressed: OF-1923

With this, I’m fairly confident that Stream Management functionality is in order.