ReconnectionManager issues

Steffen_Conrad · December 1, 2016, 11:11am

Hi there,

we are developing a Smack based XMPP communication stack in an IoT environment, where reliability of network communication is out of our control. Therefore we need a highly automated and flawless maintenance of XMPP communication by the Smack client. We must make sure that the device tries to reconnect to the network in all curcumstances. How often is not an issue, as long as there is a determinable time frame to retry.

Regarding these requirements, I found some issues with ReconnectionManager:

Make ReconnectionManager a (Re)ConnectionManager
If the first connect of the XMPPConnection fails, ReconnectionManager is never triggered. If the device restarts after a power outage and there is no network available in the first place, the device would remain offline forever.

That means you need to implement the first retry loop yourself. The according logic is in ReconnectionManager already, so it would be great to actually be able to use it for that scenario and not to copy parts of it. Suggestion: Add another ConnectionListener callback that triggers at start of a connect (not at success) and/or call connectionClosedOnError at failure of connect (without being connected beforehand), so the ReconnectionManager can actually take over after the first connect failed.

Add an option that makes ReconnectManager retry connecting under all circumstances
Currently, ReconnectManager will stop trying to reconnect on StreamError condition ‘conflict’. I can see the reason for doing so, but there is a (theoretic) scenario that would make a device fail to communicate:

What if there are two devices with same XMPP credentials due to a misconfiguration? One will connect/login, the other device will fail to login and never retry to connect. If someone solves the issue by fixing the first device, the second device will remain unconnected, since its ReconnectManager already has stopped retrying.

Minor issue: The ReconnectManager notifies reconnectingIn(0) twice
We use reconnectingIn(0) to signal a “we are currently trying to connect” to our application, so I have to filter out the second notification. The while-loop checks for >0 first, then sleeps/decrements the seconds, then notifies. After the while loop it notifies reconnectingIn(0) again. Notifying first, then sleeping/decrementing would solve the issue.

I am looking forward to your comments.

Regards,

Steffen

Flow · December 1, 2016, 2:36pm

We must make sure that the device tries to reconnect to the network in all curcumstances.

I usually implement the “reliable keep an established and stable XMPP connection up an running” logic on top of Smack. But there is nothing wrong with improving ReconnectionManager. But the Manager was initially build with desktop PCs in mind decades ago, and thus is often not suitable for today’s mobile scenarios.

If the first connect of the XMPPConnection fails, ReconnectionManager is never triggered.
I didn’t had a close look, but does ReconnectionManager.setEnabledPerDefault(true) help? What prevents reconnection if the first connect()/login() failes?

Currently, ReconnectManager will stop trying to reconnect on StreamError condition ‘conflict’. I can see the reason for doing so, but there is a (theoretic) scenario that would make a device fail to communicate:

Never use client assigned resources, especially in an IoT/M2M XMPP scenario.

Notifying first, then sleeping/decrementing would solve the issue.
Patches welcome

Steffen_Conrad · December 1, 2016, 3:38pm

I usually implement the “reliable keep an established and stable XMPP connection up an running” logic on top of Smack. But there is nothing wrong with improving ReconnectionManager. But the Manager was initially build with desktop PCs in mind decades ago, and thus is often not suitable for today’s mobile scenarios.

Ok, will do so then.

I didn’t had a close look, but does ReconnectionManager.setEnabledPerDefault(true) help? What prevents reconnection if the first connect()/login() failes?

ReconnectionManager reacts on the connectionClosedOnError(Exception e) call on the ConnectionListener. It seems that isn’t actually called before a connection has been created successfully (which isn’t when the first connect fails).

Never use client assigned resources, especially in an IoT/M2M XMPP scenario.

Can you elaborate on that? I do not have that big of XMPP experience (I did study rfc-6120 though, especially on JIDs and stanzas).

It is actually a requirement in a standard specification (currently in draft phase), that basically states: “use this fixed resource ID and reject communication with any other resource ID”. XMPP serves only for transport of protocol messages between preconfigured JIDs, there is no presence exchanged or anything else. I actually have no idea if this is a bad idea or not.

Flow · December 1, 2016, 6:49pm

It is actually a requirement in a standard specification (currently in draft phase), that basically states: “use this fixed resource ID and reject communication with any other resource ID”.
Which standard are we talking about? I would consider such a requirement bad practice. You run into all sorts of issues if you require a fixed string resource.

Steffen_Conrad · December 2, 2016, 7:55am

Which standard are we talking about? I would consider such a requirement bad practice. You run into all sorts of issues if you require a fixed string resource.

It is the draft for IEC 61850-8-2 (using XMPP in power utility communications) and after checking again it actually does not require that, but there has been comments on the draft suggesting using a fixed resource ID. We are currently making an according implementation to see if its doable - but it seems that may be a bad idea. Maybe we can provide feedback on the draft that this proposal is not a good idea.

I think the motivation behind that proposal is to reduce the overhead on connection establishment, since every device knows beforehand who to talk to. Using a fixed JID here means you do not need any roster or presence handling.

Flow · December 2, 2016, 9:00am

I think the motivation behind that proposal is to reduce the overhead on connection establishment, since every device knows beforehand who to talk to. Using a fixed JID here means you do not need any roster or presence handling.
The bare JID can still be fixed. Just the resource should be dynamic (and possible server assigned).

Steffen_Conrad · December 2, 2016, 9:05am

The bare JID can still be fixed. Just the resource should be dynamic (and possible server assigned).

I still have limited experience with XMPP, I assume it requires processing presence information on both clients then to find out which full JID is currently online to talk to, am I right?

Flow · December 2, 2016, 9:14am

I still have limited experience with XMPP, I assume it requires processing presence information on both clients then to find out which full JID is currently online to talk to, am I right?

No, data can be exchanged in XMPP without requiring the full JID of the partner(s). Without involving any of presence and/or roster (RFC 6121, XMPP-IM).

Flow · December 2, 2016, 11:17am

It is the draft for IEC 61850-8-2 (using XMPP in power utility communications) and after checking again it actually does not require that, but there has been comments on the draft suggesting using a fixed resource ID. We are currently making an according implementation to see if its doable - but it seems that may be a bad idea. Maybe we can provide feedback on the draft that this proposal is not a good idea.
Unrelated side note: I appears the draft was written without consulting the XMPP community. There are probably further bad practices in it, but since it’s not freely available, I can’t comment on it. We are currently rebooting the IoT efforts within the XMPP community in form of a new Special Interested Group (SIG). Happy to see people from the IEC there.