An owner cannot unlock a MUC room

I would like to report a timing bug found in MUC. I am using Openfire 4.2.3. A user created and joined a MUC room. When the user (owner) tried to unlock the room with a data form, the IQ response would get a bad-request error and the debug log showed “Ignoring stanza received from a non-occupant of ‘myroom@conference.mydomain.com’ : …”

Here is the root cause. When the LocalMUCUser created by MultiUserChatServiceImpl.getChatUser() was calling the LocalMUCRoom.joinRoom(), before LocalMUCUser added a role to the room, a background task MultiUserChatServiceImpl.UserTimeoutTask (running every 5 minutes) removed this LocalMUCUser because the user was not in any rooms yet (checkForTimedOutUsers()):

    // If user is not present in any room then remove the user from
    // the list of users
    if (!user.isJoined()) {
      removeUser(user.getAddress());
      continue;
    }

The join room operation was completed but it did not know that the LocalMUCUser had been removed. When the client tried to unlock the room in a following IQ request, it called MultiUserChatServiceImpl.getChatUser() which created a new LocalMUCUser object without any roles, but the LocalMUCUser.process(IQ packet) would return a “bad-request” error.

Since this is a timing issue, this bug is not easy to reproduce.

Thanks, I’ve raised https://issues.igniterealtime.org/browse/OF-1649 to cover this off. Should be an easy fix, but you seem to be doing very well at finding these race conditions!

Greg

@gdt That’s the power of a single core CPU! :joy: I have a private fix, but it is not the best. What is your idea for the fix?

1 Like

It “should” just be a case of synchronising on the user - getChatUser() already does this already, but checkForTimedOutUsers() does not.

Though I’d like to improve the synchronisation to prevent doing it on the bare user, as that may be too wide.

Greg

Fix at https://github.com/igniterealtime/Openfire/pull/1241

I probably over-complicated it by introducing a new AutoCloseableReentrantLock - that’s something that I’ve been mulling at the back of my mind for a while, and this seemed to be a good opportunity to do so.

Don’t expect to see this in Openfire 4.3 as that’s fairly well down the route to being released, but you should be able to merge it quite easily to your local source.

Greg

@gdt If a participant (e.g. bot) sends two consecutive IQ’s (e.g. update its nickname and update its presence), won’t this AutoCloseableReentrantLock restrict all IQ requests executed sequentially in process(Packet)? The tricky part of this bug is how to synchronize between the joinRoom() starting at getChatUser() and removeUser() in checkForTimedOutUsers(), but not in any other cases.

Currently I have a simpler solution but not as elegant as yours. Here is my assumption: if a newly created user does not join any rooms in 30 seconds, the user can be removed. I set a timstamp in LocalMUCUser constructor for this user not joining in any rooms, in addRole() the timestamp is reset to -1, in removeRole() the timestamp will be set again if this user is not in any rooms. In the checkForTimedOutUsers(), if a user is not in any rooms and its timestamp passes a grace period (30 sec), then remove the user. Do you think that my assumption is flawed?

Yes, it does force two requests for the same user to be processed one after the other. In the example you gave that is a slow down in performance (though perhaps not on a single core, single CPU host :wink:).

However, it does also reduce the chance of a user joining then immediately leaving a room, and having the leave complete before the join. And possibly other potential situations like that (e.g. changing nickname or status twice).

Greg

Good point on the join and leave room situation.

-Vincent