Openfire behaviour with multiple resources with same priority

IK11 · August 29, 2014, 9:44am

Hi,

I was wondering how Openfire handles multiple active connections of the same JID with different resources that all have the same priority. When sending a message to the JID (without specifying a resource), the server has some options how to handle this.

According to the XMPP spec (RFC 3921, section 11.1) the options in this case are

“…If two or more available resources have the same priority, the server MAY use some other rule (e.g., most recent connect time, most recent activity time, or highest availability as determined by some hierarchy of values) to choose between them or MAY deliver the message to all such resources…”.

I have noticed that there is some setting (route.all-resources) to switch to the second option (deliver all)? But what is the default strategy of Openfire to select a resource to deliver to?

Without digging too much into the code, is it possible to achieve a kind of round-robin message delivery in this case? This might not make sense for instant messages, but I would like to use this for M2M communication where the messages contain data that has to be processed by a service. In order to cluster the service it would be great to have every cluster node be connected with the same JID but different resource and to have the messages distributed among the cluster nodes. The server would do some kind of load-balancing job.

Thanks,

Ingo

Flow · August 31, 2014, 4:16pm

According to the XMPP spec (RFC 3921, section 11.1)

Note that RFC 3921 is obsoleted by RFC6121

But what is the default strategy of Openfire to select a resource to deliver to?
Quoting javadoc of openfire/spi/RoutingTableImpl.routeToBareJID:

Deliver the message sent to the bare JID of a local user to the best connected resource. If the target user is not online then messages will be stored offline according to the offline strategy. However, if the user is connected from only one resource then the message will be delivered to that resource. In the case that the user is connected from many resources the logic will be the following:

Select resources with highest priority
Select resources with highest show value (chat, available, away, xa, dnd)
Select resource with most recent activity

The server would do some kind of load-balancing job.

Why not simply let the clients randomly select a resource and do so the load-balacning? This would come with the advantage that you don’t depend on the server implementation and allow you to switch the server (more) easily.

IK11 · September 1, 2014, 7:10am

Hi,

thanks for your answers.

But what is the default strategy of Openfire to select a resource to deliver to?
Quoting javadoc of openfire/spi/RoutingTableImpl.routeToBareJID:

Deliver the message sent to the bare JID of a local user to the best connected resource. If the target user is not online then messages will be stored offline according to the offline strategy. However, if the user is connected from only one resource then the message will be delivered to that resource. In the case that the user is connected from many resources the logic will be the following:

Select resources with highest priority

Select resources with highest show value (chat, available, away, xa, dnd)

Select resource with most recent activity
I see, this means that if all cluster nodes themselves are sending messages in regular intervals, they would become the connection with the most recent activity and so the server will do some kind of distribution.

The server would do some kind of load-balancing job.

Why not simply let the clients randomly select a resource and do so the load-balacning? This would come with the advantage that you don’t depend on the server implementation and allow you to switch the server (more) easily.

This would mean that every cluster node use a different resource and the clients that produce the messages (you can think of it as a set of sensor nodes that report measurements to the cluster) randomly pick one of the cluster node resources to send their reports to. Conceptually, this would imply that every client has to be aware of the cluster nodes which is something I want to prevent. Another thing is that the cluster nodes also subscribe to PubSub nodes using their bare JID. In that case the load-balancing wouldn’t work either. Depending on the server implementation is fine for me.

Flow · September 1, 2014, 7:32am

I see, this means that if all cluster nodes themselves are sending messages in regular intervals, they would become the connection with the most recent activity and so the server will do some kind of distribution.

Hmm? Every cluster note has a single connection. I would expect that the effect is quite the opposite, the connection (== “cluster node” in your terms) which handled the last request while likely also be the one with the recent activity, which means that the algorithm with select it for the next stanza, resulting in a concentration of requests on it, not a distribution.

This would mean that every cluster node use a different resource

That should already be the case, as every connection needs a different resource.

Another thing is that the cluster nodes also subscribe to PubSub nodes using their bare JID. In that case the load-balancing wouldn’t work either. Depending on the server implementation is fine for me.

Yes, PubSub is once again the show stopper here.

I sure don’t know your requirements well enough, but doing the load balancing on the clients by subscriping to the cluster nodes JID and randomly selecting a resource and having pubsub messages send undistributed to the cluster nodes appears to be your best option.

IK11 · September 1, 2014, 11:53am

Another thing is that the cluster nodes also subscribe to PubSub nodes using their bare JID. In that case the load-balancing wouldn’t work either. Depending on the server implementation is fine for me.

Yes, PubSub is once again the show stopper here.

I sure don’t know your requirements well enough, but doing the load balancing on the clients by subscriping to the cluster nodes JID and randomly selecting a resource and having pubsub messages send undistributed to the cluster nodes appears to be your best option.

Unfortunately the overall design, which i cannot change, is that every client has its own PubSub node and every client publishes on its own node. The clustered service is a subscriber for every node. Putting the load-balancing logic to the client is not an option.

In my opinion the XMPP server is the best place for doing the load-balancing as it is aware of the connected resources (of the cluster nodes). After having a look on the RoutingTableImpl class I am thinking of adding an additional strategy (comparable to the route.all-resources) for the server that simply randomly picks one of the connections to forward the message. This would still be compliant to the XMPP standard and would also work with the PubSub items that have to be forwarded to one of the cluster nodes. What do you think? Or am I missing something?

Flow · September 1, 2014, 12:50pm

In my opinion the XMPP server is the best place for doing the load-balancing
I completely aggree.

After having a look on the RoutingTableImpl class I am thinking of adding an additional strategy
Yes, that would be a good short-mid term solution.

For I long term solution I’ve created an early draft of an XEP that allows the message routing algorithm to be customizeable: debian Pastezone

What do you think?

IK11 · September 1, 2014, 3:00pm

For I long term solution I’ve created an early draft of an XEP that allows the message routing algorithm to be customizeable: http://paste.debian.net/plain/118723

What do you think?

Cool. I am not that familiar with the process how the XEPs evolve until they get a final “approved” label. Do you think there is a big chance that this standardization effort will be accepted? From my point of view this would be a cool feature. I think in the Internet-of-Things hype that’s around these days XMPP has high potential for such kind of M2M communication.

Some remarks concerning your proposal…

Don’t you think that the weighted load balancing will break the semantics of the initial RFC. As far as I remember the server can only use its own algorithm when the priorities are the same. So using the priorities for that purpose might not be compliant. Or is it ok that a XEP defines a behavior which breaks the initial semantic?
Apart from the round-robin mechanism there should be some kind of hash-based algorithm. In networking there are often packet scheduling algorithms that do not require state (who was the last recipient etc.) but rely on hashing some header fields (e.g. SFQ). Such algorithms can be implemented more efficient and may work equally good (depending on your hash function and fields). Perhaps hashing the id of the message, modulo the number of active connections might also result in a reasonable distribution among the active connections instead on relying on a pure round-robin.

One thing that came into my mind is - how does the clustering of the Openfire itself work. Are all connections of the same bare JID managed by the same Openfire cluster node? If yes than that node has all the information it needs for applying the message routing algorithm. Otherwise such algorithms like round-robin would imply the synchronization between these nodes. However, I haven’t had a closer look on how Openfire is clustered. Perhaps you have more information on that.

Thanks,

Ingo

wroot · September 1, 2014, 4:51pm

IK, as you have quoted a link, your message had to be moderated (only a few users can post links without it, protection from spam bots). I’m bumping this post so Flow would get a notification, because it is not sent after the message has been approved

Flow · September 1, 2014, 7:25pm

Do you think there is a big chance that this standardization effort will be accepted?
Dunno, I plan to polish the XEP up a bit and the post to the XEP Editors for discussion (where, IIRC, everyone can participate). The current version of the XEP can now be found at

http://geekplace.eu/xeps/xep-cmr/xep-cmr.html

the source is at

From my point of view this would be a cool feature. I think in the Internet-of-Things hype that’s around these days XMPP has high potential for such kind of M2M communication.
That is exactly why I think that “Customizeable Message Routing” would be helpful. BTW did you read the sentence about XEP-254?

Don’t you think that the weighted load balancing will break the semantics of the initial RFC.
Yes, but if you really stick to the RFC, then even round-robin will break the semantics. The only specified routing algorithms of RFC 6121 are ‘all’ and ‘mostactive’.

Or is it ok that a XEP defines a behavior which breaks the initial semantic?
Usually not, but I think that it also depends if the XMPP community consider it a worthy trade-off. Also the changes from RFC 6121 are minimal, ie. we only change the routing behavior of message stanzas of type ‘normal’ or ‘chat’ send to a bare JID.

Apart from the round-robin mechanism there should be some kind of hash-based algorithm.
Sure, but I think further algorithms are out of the scope of this XEP, they can be defined later on. I’m not even sure if “5.4 Message Routing Hints” should be part of this XEP. I’d like to keep things minimal, but OTOH message routing hints are short enough and strongly related to core idea of CMR to be an part of XEP.

Also, to be honest, I don’t see the advantage of an hash-based algorithm. The seem to achieve the same as round robin while requiring a more complex implementation. Could you elaborate that?

One thing that came into my mind is - how does the clustering of the Openfire itself work

Well, the specifics of a cluster related implementation and their problems are not relevant for the XEP. From top of my head, Openfire cluster nodes are aware of all sessions of a JID, although some of them may be remote sessions.

Without further analysis I can not comment how round-robin routing would affect cluster performance. But it’s certainly doable.

IK11 · September 2, 2014, 7:33am

From my point of view this would be a cool feature. I think in the Internet-of-Things hype that’s around these days XMPP has high potential for such kind of M2M communication.

That is exactly why I think that “Customizeable Message Routing” would be helpful. BTW did you read the sentence about XEP-254?

yes I did. I think XEP-254 is somewhat different as it focuses on PubSub only. Furthermore, it requires an additional interaction of the subscriber to delete the item after it was consumed. Don’t know if this delivery guarantee is really required in all use cases.


> Don't you think that the weighted load balancing will break the semantics of the initial RFC.
Yes, but if you really stick to the RFC, then even round-robin will break the semantics. The only specified routing algorithms of RFC 6121 are 'all' and 'mostactive'.

Oh I see, the rules have somehow changed since 3921. In that case it is even more easier to introduce other message routings. It depends on the interpretation of “most available” resource in RFC6121. The RFC also states that

'Allowed a server to use its own algorithm for determining the

“most available” resource for the purpose of message delivery, but

mentioned the recommended algorithm from RFC 3921 (based on

presence priority) as one possible algorithm’

If one argues that “most available” is the resource (= aka cluster node) that had the least work in the past (this is exactly what happens in case of round-robin), then it should be fine. Going one step further one can argue that a hash based approach that approximates round robin is also fine.

Concerning your question. The hash-based approach might be implemented as follows.

List nonNegativePrioritySessions = getNonNegativeSessions(sessions, 0);

int idx = packet.hashCode() % nonNegativePrioritySessions.size();

ClientSession sessionToDeliver = nonNegativePrioritySessions.get(idx);

sessionToDeliver.process(packet);

Flow · September 2, 2014, 8:08am

Don’t know if this delivery guarantee is really required in all use cases.

Yes, exaclty my thinking. It also introduces a lot more complexity.

If one argues that “most available” is the resource (= aka cluster node) that had the least work in the past
Good point, will add that argumentation to the XEP.

Concerning your question. The hash-based approach might be implemented as follows.
I understand the approach. Hashing is a costly operation, and now you need it to do for every message stanza. What I don’t understand is where you see the advantage compared to simply doing round robin?