Smack-omemo rework #177 - ignores newly invited muc member if it does not have any previously established omemo session

I observed the following while doing muc testing on aTalk.

  1. While the muc conference is on going between 3 members i.e.
    a. swordfish@atalk.org (the monitoring device - aTalk on Note08)
    b. leopard@atalk.org (aTalk)
    b. hawk@atalk.org (Conversations)

  2. Then swordfish invites swan@atalk.org into muc. swan is a client installed on gajim on ubuntu.
    swan does not have any previously established omemo session with swordfish. Therefore the identity table contains no active record of swan. Hence it does not get included in the omemo message recipients list, since getDevicesOf(swan@atalk.org) returns zero active device.

    public OmemoMessage.Sent encrypt(Set<BareJid> recipients, String message)
            throws CryptoFailedException, UndecidedOmemoIdentityException,
            InterruptedException, SmackException.NotConnectedException,
            SmackException.NoResponseException, SmackException.NotLoggedInException
    {
        synchronized (LOCK) {
            LoggedInOmemoManager guard = new LoggedInOmemoManager(this);
            Set<OmemoDevice> devices = getDevicesOf(getOwnJid());
            for (BareJid recipient : recipients) {
                devices.addAll(getDevicesOf(recipient));
            }
            return service.createOmemoMessage(guard, devices, message);
        }
    }
  1. swordfish muc chat indicates swan has joined the conference when swan accept the invitation.

  2. swordfish send a new omemo message and all the 3 memebers in #1 receive the message, decrypt and display correctly.

  3. swan receives the encrypted message via server relay omem message. However the omemo message does not contact the rid key for swan, so display only the omemo hint text i.e. I sent you an OMEMO encrypted message but your client doesn’t seem to support that.

I am not sure if the problem is due to Gajim problem. As later I found that the omemo is not working on Gajim until I re-install omemo plugin. However after few repeated tests, i am now unable to get gajim omemo working with aTalk again. I also observed with Gajim is that it always use the same omemoDevice even I re-installed the plugin. I would have thought it should refresh a new omemoDevice on fresh installation. Concern this may have problem on aTalk if gajim keep the deviceID but changes its identity.

Does the omemo protocol design take care the above scenario? If so appreciate if you can give a brief process flow how it takes care the above.

Please comment.

If I understand you correctly, swordfish has not subscribed to swans presence, is that correct?
In that case swordfish cannot receive swans deviceList.

The MUC related code of smack-omemo is highly untested, so I really appreciate this kind of feedback :slight_smile:

I should probably add a check, whether the user has subscribed to all MUC members, although this will hopefully change with Daniel Gultsch’s publish-options fix which I’ll implement soonish (Allow all users to access OMEMO related PubSub nodes - regardless of subscription status)

Both swordfish and swan are included as members of the server defined shared roster group i.e.atalk-member. So they are automatically have subscription to either one presence.

Actually I am not sure if the observation is due to Gajim omemo problem. After seeing the problem, I try to setup a omemo chat between swordfish (aTalk) and swan (gajim) and found that swan omemo is not sending any omemo messages i.e. not working. Then I re-intalled the omemo plug in for swan, it works for a while. But later omemo messaging failed again, and I was unable to make it works again by re-installed omemo plugin.

Then I proceed to perform similar test with only aTalk and Conversation clients after clearing all pubsub item data on server and aTalk clients. I observed there are toast messages indicating missing identities and loosing some omemo messages. Unlike the previous case, the muc omemo messaging is able to recover and work again eventually, upon verification of the missing identity fingerprints by all parties. The whole process of recovering seems logically to me. I cannot recall the full test setup. May be I will retest at later time and give you a full picture of what exactly happen.

Note: aTalk currently does not take care and attempt to resend omemo message if UndecidedOmemoIdentityException occurs. The reason why the missing the omemo messages in the above test.

I never used shared roster groups, so I’m not sure how they affect pubsub access. I’ll see if I can find out about that.

Actually the shared roster is only to automatically grouping members of same interest e.g. employee department, without the need for individual user to add each other to his/her roster.
Otherwise, individual user needs to manually add the contacts and request for subscription permission from his/her contacts. With shared roster all members of the group are automatically granted the subscription to each other notification.

@Flow suggested, that shared roster groups may not grant you access to pubsub nodes. You have to manually subscribe to a contacts presence in order to grant them access to your PEP/PubSub.

Do you know why Flow think so. Is it because how Smack is designed today.

With your input, I just created a new account (kingfisher@atalk.org) and added to the shared roster on server.
From the aTalk log, I see that ejabberd server did sent all the pubsub#event of the users whom are currently online. OmemoManager is also able to retrieve all the omemDevice info from the server for the 3 users and added to the identities table.

Is there something I misunderstood?
The pre-defined shared roster groups are common practice for company who divide their employees in groups with shared roster so individual do not have to manually subscribed to each others. If Flow statement is true, it is really a big problem in this case.

aTalk has conducted system test with shared roster since the development starts. Not sure if I have missed out something during the system testing.

01-29 23:25:38.026 D/SMACK: RECV (0): <message to='kingfisher@atalk.org/atalk' from='hawk@atalk.org' type='headline'><event xmlns='http://jabber.org/protocol/pubsub#event'><items node='eu.siacs.conversations.axolotl.devicelist'><item id='5ED31FEDFF98'><list xmlns='eu.siacs.conversations.axolotl'><device id='1796289951'/></list></item></items></event><delay from='hawk@atalk.org/phone' stamp='2018-01-29T05:46:21.069274Z' xmlns='urn:xmpp:delay'/><addresses xmlns='http://jabber.org/protocol/address'><address jid='hawk@atalk.org/phone' type='replyto'/></addresses></message>
01-29 23:25:38.051 D/SMACK: RECV (0): <message to='kingfisher@atalk.org/atalk' from='leopard@atalk.org' type='headline'><event xmlns='http://jabber.org/protocol/pubsub#event'><items node='eu.siacs.conversations.axolotl.devicelist'><item id='5ED31C2198BDF'><list xmlns='eu.siacs.conversations.axolotl'><device id='531601090'/></list></item></items></event><delay from='leopard@atalk.org/atalk' stamp='2018-01-29T05:30:09.630932Z' xmlns='urn:xmpp:delay'/><addresses xmlns='http://jabber.org/protocol/address'><address jid='leopard@atalk.org/atalk' type='replyto'/></addresses></message>
01-29 23:25:38.056 D/SMACK: RECV (0): <message to='kingfisher@atalk.org/atalk' from='swordfish@atalk.org' type='headline'><event xmlns='http://jabber.org/protocol/pubsub#event'><items node='eu.siacs.conversations.axolotl.devicelist'><item id='5ED31C586D076'><list xmlns='eu.siacs.conversations.axolotl'><device id='1155293484'/></list></item></items></event><delay from='swordfish@atalk.org/atalk' stamp='2018-01-29T05:31:04.448315Z' xmlns='urn:xmpp:delay'/><addresses xmlns='http://jabber.org/protocol/address'><address jid='swordfish@atalk.org/atalk' type='replyto'/></addresses></message>

01-29 23:25:47.241 I/aTalk: [10] org.atalk.persistance.DatabaseBackend.storeCachedDeviceList().1327 Identities table updated for activeDevice: kingfisher@atalk.org:601605057
01-29 23:25:47.251 I/aTalk: [10] org.atalk.persistance.DatabaseBackend.storeCachedDeviceList().1327 Identities table updated for activeDevice: kingfisher@atalk.org:601605057
01-29 23:25:47.256 I/aTalk: [10] org.atalk.crypto.omemo.AndroidOmemoService.initializationFinished().121 Initialize OmemoManager successful for kingfisher@atalk.org/atalk
01-29 23:25:47.266 W/aTalk: [11] org.atalk.persistance.DatabaseBackend.storeCachedDeviceList().1321 Identities table contains no activeDevice (create new): hawk@atalk.org:1796289951
01-29 23:25:47.286 W/aTalk: [11] org.atalk.persistance.DatabaseBackend.storeCachedDeviceList().1321 Identities table contains no activeDevice (create new): leopard@atalk.org:531601090
01-29 23:25:47.301 W/aTalk: [11] org.atalk.persistance.DatabaseBackend.storeCachedDeviceList().1321 Identities table contains no activeDevice (create new): swordfish@atalk.org:1155293484

Hm, you might be right… Let me quickly investigate.

That should not happen. If swans client supports OMEMO, it MUST publish a deviceList. If that is done, the list must arrive via PEP update. Your next quote suggests, that the updates indeed arrive.

When the updates arrive, the deviceList will be merged. As a result, getDevicesOf() MUST return them.
I’m not sure, what goes wrong, but based on the logs you provided, smack-omemo should in fact work well with shared roster groups.

Maybe the deviceList updates are not reliably stored?

Attached below is a snapshot of the kingfisher@atalk.org identities table entry upon first registration on the network (with the new implementation) before any omemo chatSession is setup.

The table contains all the 3 buddies that were mentioned in the earlier discussion. These table entries get filled when storeCachedDeviceList() is being executed, but only partially filled by the available information provided.

(With reference to CryptoFragment aTalk source)
When kingfisher setup a chat session with e.g. hawk, then selects Omemo chat option; aTalk execute doHandleOmemoPressed() where the contact is checked for Trust State. hawk identity row entries are then get filled in this process. If Blind Trust is enabled, it sets trust column to TRUSTED else UNDECIDED, and prompt user for verification.

In the case of muc, where kingfisher is being invited into conference. If kingfisher tries to send an omemo message to e.g. leopard, OmemoService encrypt throws a UndecidedOmemoIdentityException, aTalk then prompts user to verify all the undecided members of the muc. If muc members include also swordfish, then both leopard and swordfish identities row get filled in this process. However the last event trigger message will not get sent since the recipients does not have keys entry at that moment.

A while ago, aTalk loop back immediately upon contact verification, trying to send the event triggered message. but failed as the table keys filling is async.

atalk_kingfisher.html (972 Bytes)

===================== aTalk old implementation =========
The process flow is slightly difference in aTalk old implementation; where it has implemented the PEPListener deviceListUpdateListener as described in:

Also initially proposed for new implementation but drops as described in:

So the identity table will get filled immediately on kingfisher first login.
aTalk also performs rebuildSessionWith() if an active identity is not found during storeCachedDeviceList.
New implementation just get partially filled as shown in the attached DB and get fille only on omemo session setup.

I believe the missing message in muc described earlier does not occur in old implementation.

The database looks fine. When an deviceList update is received, only the deviceIds are stored.

I’m not sure if it is a wise decision to set the trust state to TRUSTED inside the database. What happens, if you turn off BTBV afterwards? I’d rather modify the TrustCallback to return another trust state if BTBV is activated, but leave the database untouched.

What do you mean by that? Kingfisher should have the keys after the user decided, whether or not to trust the fingerprint.

You should in any case avoid building a session with a device if you don’t have the intention to send an encrypted message (see XEP-0384: OMEMO, §5 Business Rules)

Thanks for the input, I have added that decision to getTrust() to return trust state pending on BTBV option.

Need your advice on the following:

  1. Am I right to say that when BTBV is disabled, the trust should be set to UNDECIDED when identity table is created for first buddy device. Enabling of BTBV subsequently will only have effect on other contacts new device setup, and shall not apply to this buddy device already setup.

  2. With trust state is unset for buddy first device when BTBV enabled, state UNDECIDED is returned on getTrust for user verification; What should the trust state be set to when user do not verify the fingerprint when prompt? UNDECIDED, leave blank or UINTRUSTED. I suppose it should be UNDECIDED.

  3. Does BTBV is dynamic for first device behavior and remains true until it is manually “decided” by user?

It is the way aTalk implementation presently? The buddy fingerprints verification prompt is only get triggered during the chat session setup for omemo messaging. If new members are invited while omemo muc messaging is already enabled, the user verification is not triggered; but later on the next omemo message sending. Look like aTalk needs to fix this.

Thanks for pointing this out.

Actually I was trying to retest muc with unverified buddy giving you an overview what happen. After spending more than an hour trying to setup, but given up eventually; because smack keeps throwing various NoResponseExceptions even the reply for the sent stanza have been received within timeout period. It happens on my 3 devices i.e. Note-2, Note-3 and Note-8 making testing impossible.

Since smack-omemo checks for undecided devices every time you encrypt a message, it should prompt the user to trust undecided fingerprints as soon as the user want to send a message to the muc after a new user joined the room. Prompting the user once OMEMO is selected as encryption method is optional in my opinion.

Regarding BTBV:
I would implement it the following way.

  • When BTBV is enabled, as long as ALL devices of a contact are marked as undecided, all devices are considered trusted (that means the DB value is not set to trusted, but the TrustCallback returns the value trusted).
  • Devices that are marked as untrusted, are considered untrusted.
  • If the user decides to trust at least one device of a contact (verification), that device will get marked as trusted in the DB. Now BTBV is quasi disabled for that single contact, meaning undecided devices must trigger prompting the user to decide whether to trust/untrust a device.
  • BTBV is always disabled for the users own devices.

An implementation could look like this:

boolean btbvIsTrusted(OmemoDevice device) {
    if (isUndecided(device) {
        Set<OmemoDevices> allDevices = getDevicesOf(device.getJid());
        for (OmemoDevice d : allDevices) {
            if (isTrusted(d)) {
                // At least one "verified" device exists -> other devices MUST be decided
                return false;
            }
        }
        // No "verified" device exists -> blindly trust all devices
        return true;
    }
    // User made decision for device -> respect decision
    return isTrusted(device);
}

It might make sense to introduce an additional trust state (verified), but I think the current implementation allows a client to implement BTBV as I described above.

Agreed if the sender buddy identities table are all filled. Attached is the identities table for leopard with buddy kingfisher identities partially filled i.e. prior to any session setup.

N8_identities.html (1.8 KB)

As for your proposal, kingfisher will not receive the message that triggers the fingerprint verification. In current aTalk implementation, the kingfisher row will get filled prior to the very first omemo message being sent.

Actually this is the problem I am trying to resolve in this discussion. When a partially filled buddy get invited in the mid of conference discussion, the first message that sent will trigger his fingerprint verification. However this message will not be sent to him as the identities table entry is empty at this moment.

aTalk just implemented to listen in for new muc participant joining an omemo conference, and trigger the buddy verification to fill the missing identities entry. Just tested and is working i.e. no more missing first message.
(By the way, Conversations aborted under this case, but recovered after restart)

Do you have any good alternatives to resolve the case just described, instead of current aTalk implementation?

In the above case, actually OmemoRatcher will throws CryptoFailedException for all the last few messages (prior joining) relayed to him when the kingfisher joins the conference.

02-01 10:42:21.801 W/aTalk: [36] org.jivesoftware.smackx.omemo.OmemoService.onOmemoMessageStanzaReceived() Could not decrypt incoming message: 
                            org.jivesoftware.smackx.omemo.exceptions.CryptoFailedException: Transported key could not be decrypted, since no suitable message key was provided. Provides keys: [601605057, 1796289951, 1155293484]
                                at org.jivesoftware.smackx.omemo.OmemoRatchet.retrieveMessageKeyAndAuthTag(OmemoRatchet.java:123)
                                at org.jivesoftware.smackx.omemo.OmemoService.decryptMessage(OmemoService.java:456)
                                at org.jivesoftware.smackx.omemo.OmemoService.onOmemoMessageStanzaReceived(OmemoService.java:1228)
                                at org.jivesoftware.smackx.omemo.OmemoManager$3$1.run(OmemoManager.java:990)
                                at java.lang.Thread.run(Thread.java:841)

=============

Currently aTalk Trust implementation has the following states:

    public enum Trust {
        UNDECIDED,
        UNTRUSTED,
        TRUSTED,
        VERIFIED,
    }

TRUST is only used for BTBV, and get set for the very first buddy device created. All subsequent new devices from this contact are set to UNDECIDED and replaced with VERIFIED after user has authenticated. Unlike your case, only the first device can only be BTBV trusted. Implementation similar to Conversation. I think this is better in security. If a hacker mimic the contact at a later time, he will get auto trusted in your case if all devices are undecided.