Smack thread indefinite freeze when joining MUC

Discourse didn’t recognize you as a recurring user. I’ve manually bumped up your “yes-we-can-trust-this-person” score.

You guys are using Prosody, right? That takes away one of my worries that this was specific to Openfire (which is where I ran into the issue).

Yes, and over the weekend I had played with some automated testing and I can repro it after 40-60 join and leaves in the same room.

1 Like

Can you distill some minimalistic unit test from your automation, not depending on any properietary code? That’s be of interest while diving into this - if anything to verify any fixes that we might come up with.

Not sure how easy that will be … but I can give it a try …

Seems I still cannot build the project this is discussed in Similar class names, would you reconsider? - #3 by guus any hints?

warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'smack-core/src/main/java/org/jivesoftware/smack/provider/IQProvider.java'
  'smack-core/src/main/java/org/jivesoftware/smack/provider/IqProvider.java'

I see there are some mocking around connection I may try it, if I manage to build the project, but don’t promise that something will come out of it …

Ugh, you’re on a Mac, I suppose? That issue is so annoying. There’s no easy work-around, I’m afraid. Lots of technical debt and not enough resouces to tackle it. Sorry for that. :-/

What’s the premise of your reproduction? Create a chatroom, join, send presence, leave (lather rinse repeat)?

Yep the flow is the following:

  1. any client first sends an iq with room name to jicofo
  2. Jicofo is the first to join and create the room (there is a prosody mode where only admins(jicofo) are allowed to create the room for that muc component)
  3. Then Jicofo responds to the client which triggers the client to join.
  4. When the last client leaves jicofo leave the room

The steps will be:

  1. Two participants in the room, when participant1 leaves, then participant2 also leaves (this is jicofo in our case)
  2. Make sure before leaving there are some self-presences received(status==110) by participant2
  3. When the next time participant2 tries to join the same room:
MultiUserChat muc = manager.getMultiUserChat(roomJid);
if (muc.getOccupants().size() > 0) {
    System.err.println("Houston, we have a problem!");
}
muc.createOrJoin(nickname);

After this, you will see the stuck thread as in the first post in this thread.

Of course, it is a sync and timing issue and in my case I was hitting it after 50-60 times executing the same test case, using the same room name.

Ok, that helps. I’m trying to dig up the details of your setup from memory. Are you still using an actual user-entity in jicofo (the ‘focus’ user) that, much like a regular user, joins the room to create it? Is that focus user still in the room when participant1 and participant2 leave?

Oh sorry, I didn’t mention it, participant2 is jicofo. I’m editing now my previous post.

1 Like

To repro without any additional infrastructure use a server on the same host to make the race easier to hit (I don’t think the server software makes a difference), join a MUC. Then send updated presence and immediately call leave(). Try joining the same MUC with the same nickname.

Here’s Jonathan’s workaround: https://github.com/jitsi/jicofo/pull/816

2 Likes

I’m having a tough time reproducing the problem. So far, I’ve created this test in the Smack Integration Test Framework, which I’m running against a server running on the same computer.

Without logging on, I can easily increase the iteration count to 10,000 without running into some kind of deadlock.

Am I correctly reproducing the scenario here?

@SmackIntegrationTest
public void mucTest() throws Exception {
    EntityBareJid mucAddress = getRandomRoom("smack-inttest-message");

    MultiUserChat mucAsSeenByOne = mucManagerOne.getMultiUserChat(mucAddress);
    Resourcepart nickname = Resourcepart.from("one-" + randomString);

    // Join a new room, configure it to be persistent.
    mucAsSeenByOne.join(nickname);
    FillableForm configForm = mucAsSeenByOne.getConfigurationForm().getFillableForm();
    configForm.setAnswer("muc#roomconfig_persistentroom", true);
    mucAsSeenByOne.sendConfigurationForm(configForm);

    for (int i=0; i<2000; i++) {
        Presence presence = mucAsSeenByOne.getXmppConnection().getStanzaFactory()
                .buildPresenceStanza()
                .ofType(Presence.Type.available)
                .setMode(Presence.Mode.away)
                .to(mucAddress)
                .build();

        mucAsSeenByOne.getXmppConnection().sendStanza(presence);
        mucAsSeenByOne.leave();
        mucAsSeenByOne.createOrJoin(nickname);
    }

    // Clean-up after test.
    tryDestroy(mucAsSeenByOne);
}

This is a snippet from the data exchange, which includes two of the many iteraterations.

<presence to='smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E' id='G9S1U-21'>
  <priority>
    0
  </priority>
  <x xmlns='http://jabber.org/protocol/muc'>
  </x>
  <c xmlns='http://jabber.org/protocol/caps' hash='sha-1' node='https://igniterealtime.org/projects/smack' ver='lWpWNgqtfynY+MxupmhOr2LZqOg='/>
</presence>
15:10:19 RECV (1): 
<presence to="smack-inttest-one-hmf6w@example.org/one-hmf6w" id="G9S1U-21" from="smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E">
  <priority>
    0
  </priority>
  <c xmlns="http://jabber.org/protocol/caps" hash="sha-1" node="https://igniterealtime.org/projects/smack" ver="lWpWNgqtfynY+MxupmhOr2LZqOg=">
  </c>
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="smack-inttest-one-hmf6w@example.org/one-hmf6w" affiliation="owner" role="moderator"/>
    <status code="110"/>
    <status code="100"/>
    <status code="170"/>
  </x>
</presence>
<message from="smack-inttest-message-hmf6w-t18zpg@conference.example.org" type="groupchat" id="36554a5b-63c5-4049-b03f-09604c7b656f" to="smack-inttest-one-hmf6w@example.org/one-hmf6w">
  <subject/>
</message>
15:10:19 SENT (1): 
<presence to='smack-inttest-message-hmf6w-t18zpg@conference.example.org' id='G9S1U-22'>
  <priority>
    0
  </priority>
  <show>
    away
  </show>
  <c xmlns='http://jabber.org/protocol/caps' hash='sha-1' node='https://igniterealtime.org/projects/smack' ver='lWpWNgqtfynY+MxupmhOr2LZqOg='/>
</presence>
<presence to='smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E' id='G9S1U-23' type='unavailable'/>
15:10:19 RECV (1): 
<presence to="smack-inttest-one-hmf6w@example.org/one-hmf6w" id="G9S1U-22" from="smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E">
  <priority>
    0
  </priority>
  <show>
    away
  </show>
  <c xmlns="http://jabber.org/protocol/caps" hash="sha-1" node="https://igniterealtime.org/projects/smack" ver="lWpWNgqtfynY+MxupmhOr2LZqOg=">
  </c>
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="smack-inttest-one-hmf6w@example.org/one-hmf6w" affiliation="owner" role="moderator"/>
    <status code="110"/>
  </x>
</presence>
15:10:19 RECV (1): 
<r xmlns='urn:xmpp:sm:3' />
<presence to="smack-inttest-one-hmf6w@example.org/one-hmf6w" id="G9S1U-23" type="unavailable" from="smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E">
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="smack-inttest-one-hmf6w@example.org/one-hmf6w" affiliation="owner" role="none"/>
    <status code="110"/>
  </x>
</presence>
15:10:19 SENT (1): 
<a xmlns='urn:xmpp:sm:3' h='30'/>
15:10:19 SENT (1): 
<presence to='smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E' id='G9S1U-24'>
  <priority>
    0
  </priority>
  <x xmlns='http://jabber.org/protocol/muc'>
  </x>
  <c xmlns='http://jabber.org/protocol/caps' hash='sha-1' node='https://igniterealtime.org/projects/smack' ver='lWpWNgqtfynY+MxupmhOr2LZqOg='/>
</presence>
<r xmlns='urn:xmpp:sm:3'/>
15:10:19 RECV (1): 
<presence to="smack-inttest-one-hmf6w@example.org/one-hmf6w" id="G9S1U-24" from="smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E">
  <priority>
    0
  </priority>
  <c xmlns="http://jabber.org/protocol/caps" hash="sha-1" node="https://igniterealtime.org/projects/smack" ver="lWpWNgqtfynY+MxupmhOr2LZqOg=">
  </c>
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="smack-inttest-one-hmf6w@example.org/one-hmf6w" affiliation="owner" role="moderator"/>
    <status code="110"/>
    <status code="100"/>
    <status code="170"/>
  </x>
</presence>
<message from="smack-inttest-message-hmf6w-t18zpg@conference.example.org" type="groupchat" id="6c1721f0-27c7-4dc3-a898-ebba16571dd0" to="smack-inttest-one-hmf6w@example.org/one-hmf6w">
  <subject/>
</message>
15:10:19 RECV (1): 
<a xmlns='urn:xmpp:sm:3' h='30' />
15:10:19 SENT (1): 
<presence to='smack-inttest-message-hmf6w-t18zpg@conference.example.org' id='G9S1U-25'>
  <priority>
    0
  </priority>
  <show>
    away
  </show>
  <c xmlns='http://jabber.org/protocol/caps' hash='sha-1' node='https://igniterealtime.org/projects/smack' ver='lWpWNgqtfynY+MxupmhOr2LZqOg='/>
</presence>
<presence to='smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E' id='G9S1U-26' type='unavailable'/>
15:10:19 RECV (1): 
<presence to="smack-inttest-one-hmf6w@example.org/one-hmf6w" id="G9S1U-25" from="smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E">
  <priority>
    0
  </priority>
  <show>
    away
  </show>
  <c xmlns="http://jabber.org/protocol/caps" hash="sha-1" node="https://igniterealtime.org/projects/smack" ver="lWpWNgqtfynY+MxupmhOr2LZqOg=">
  </c>
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="smack-inttest-one-hmf6w@example.org/one-hmf6w" affiliation="owner" role="moderator"/>
    <status code="110"/>
  </x>
</presence>
15:10:19 RECV (1): 
<presence to="smack-inttest-one-hmf6w@example.org/one-hmf6w" id="G9S1U-26" type="unavailable" from="smack-inttest-message-hmf6w-t18zpg@conference.example.org/one-2N492E">
  <x xmlns="http://jabber.org/protocol/muc#user">
    <item jid="smack-inttest-one-hmf6w@example.org/one-hmf6w" affiliation="owner" role="none"/>
    <status code="110"/>
  </x>
</presence>

Hum, can you enable milliseconds in your message logs?

At first glance, I think it is the same … @Boris_Grozev wdyt?

I don’t know actually. This is the default logging that comes with the framework. :confused:

Oo one difference, we leave the room in a different thread.

So can you modify your test that way, and do the join a little bit after that … maybe 1-2 seconds later.

Another difference is that we always lookup the cache for the room mucManagerOne.getMultiUserChat before joining, not sure whether that makes any difference.

Thanks for your very detailed situation report. I finally had some time to look into it, and I found that what you described is plausible. I am not yet sure about the best solution. Assuming that you correctly identified, I wondered if simply moving the if (mucUser.getStatus().contains(MUCUser.Status.PRESENCE_TO_SELF_110)) at the first position would already help.

I have some untested code for this at

While I do hope to find time to look deeper into this, I am happy to hear your opinion about this approach.

I am able to reproduce with your code (adapted to run in my env, connected to a local prosody server). Removing the code to make the room persistent makes it easier to repro: I ran 4 times with your code and it got stuck twice and passed 2000 iterations without getting stuck twice. After commenting out the config form code it got stuck 6 times out of 6 (with at most 446 iterations).

This was with Smack 4.4.3, but I don’t think master would be different.

Thanks for jumping on it and making a fix so quickly!

@Boris_Grozev @damencho

I’ve updated PR #494 to target the 4.4 branch and plan to include this fix in the upcoming Smack 4.4.4 release. While I am also testing the changes, it would be great to hear from you if PR #494 fixes the issue for you. I can’t give an exact ETA of the 4.4.4 release, but probably within the next 14 days.

I also created SMACK-914 to track the issue.

1 Like