Stanza arrives in the wrong order when using CM

Hi everyone,

I’'ve obsered some strange behaviour on openfire lately. Some of my users are receiving message not in the order there where sent.

  • User 1 sends a first message stanza to User2 whose content is “Hello”

  • Shortly after User 1 resends a message stanza to User1 saying “How are you?”

  • User 2 receives messages: “How are you?” and then “Hello”

This is the component I’'m using:

  • OpenFire (OF) 3.1.1

  • 2 Connection Manager (CM) 3.1.1

I’'m suspecting some troubles in the way the Connection Manager works. In the CM configuration we have specified that we want 10 connections between each CM and OF.

When a user sends a stanza through the CM, the CM receives it and ‘‘randomly’’ choose one of the connection to forward it to OF.

When sending two stanzas, they can be routed through two different connections and OF has no way to know which stanzas has arrived first. By doing so, we can loose ordering. I beleive that clients should be assigned to one and only one connection on the CM so that we are sure to maintain ordering.

What do you think?

Thinking of it a bit more, I beleive that this stanzas ordering issues might also be responsible for some authentication problem.

In fact, once in a while, my client is stuck while trying to authenticate.

This what I suspect might happen:

  1. clients open stream : stream:stream

  2. CM sends to OF through one of his connection a session creation notification

  3. CM sends to the clients stream:stream and might advertize for avaible stream:features

  4. clients sends an iq auth to the CM

  5. CM route the iq auth to OF through a different connection than the one used in step 2.

For some reason, on the OF side, the iq auth is received prior the session creation notification. Since the stanza is received for a non existing session the iq is discarded and the client is then desperately waiting for an answer.

This is just some thoughts and I have no proof that this is really what happens, but it seems to me as a possible explanation.

Hope somebody can help me.

Hi,

this problem is already known as JM-835 but unfortunately not yet fixed. Vote for it to make it a high-priority issue but with currently one vote it seems to be less than important for most users.

LG

I’'m experiencing some trouble during authentication and I beleive stanza ordering is the cause.

Client requests supported authorization methods for user ‘‘robo’’

|----


I beleive that openfire gets the iq get prior the session creation notification from the CM.

Hi Yann,

this problem may be related to the number of connections, I think five is still the default ( in manager.xml). Reducing these to one could solve this problem.

I have no idea how performance will be after changing this value.

LG

I’‘ve tried to reduce the number of connections between the CM and OF. For sure it makes this issue less probable but doesn’'t solve it.

Reducing the number of connection to one will solve the problem for sure. Therefore we’'ve discovered some serious isssues in WF-CM 3.1.1 which will cause connections between CM and WF to be unavailble until the message is sent to the end users.

In our environment, lowering the number of connections between CM and WF increases the chance to observe delays in stanza delivering (can be up to 20min).

We haven’‘t reported the issue in the forums since it doesn’'t occurs with latest version 3.3.0 that uses NIO. We are currently planning to move as fast as possible to the newest version.

Basically with version 3.1.1 a single client with a bad network connection (or bad intention) can easily block all traffic from the CM and WF and causes the whole system to be unavailable to other users.

I can detail the process to block WF working with CM prior NIO if you want. Basically I strongly discouraged people to use a CM prior NIO in there production environment this it creates more issues than solutions.

Yann

Stanza ordering issues are realling causing a lot of errors on my side.

It the source of a lot of problem: in MUC, for typing indicator, authentication, …

Is there any plan to fix this issue?