Omemo session table with corruption records locked out omemo session

With reference to:

While testing of Omemo messaging on aTalk, I encountered the following problem i.e.
neither the swordfish nor leopard is able to launch omemo session. I traced and found that the problem is due to corrupted session records in both the leopard and swordfish devices. The corruption session is with swordfish but different deviceID on the two devices.

Before an omemo session can be launched on aTalk, it checks for any UndecidedOmemoIdentity by sending a message: mOmemoManager.encrypt(bareJid, “Hi buddy!”);

However this causes SessionCipher#encrypt() to throw an UntrustedIdentityException on executed sessionState.getSenderChainKey(); as it contains zero key. The exception return only a simple error message i.e. “empty key” on swordfish and “key.length == 0” on leopard.

On swordfish, I am able to unlock the state by “purge unused identities” which purges the inactive omemoDevices and all its associated data. It happen that the corrupted session record belong to an inactive device.

However this cannot apply to leopard as the corrupted session is with swordfish and not leopard. I have tried but look like there is no simple solution to this problem.

Following are some alternatives I can think of:

  1. Filter through all the session records and delete any corrupted record, but this requires user action.
  2. Smack omemoManager auto repair/delete the corrupted session record
  3. Smack UntrustedIdentityException to include the omemoDevice that is having the corrupted record, so apk can proceed to delete it.

Do you have other alternatives on how to resolve the lock up problem?

=============== new update (20180816) ==============
After I have created the method to filter for corrupted session records, the test result shows that the locked up problem was not due to corrupted session record (in fact all session records on leopard are intact), but a missing session record for swordfish@atalk.org:1123277467. So the earlier proposal would not work.

swordfish@atalk.org:1123277467 is the new active omemoDevice get generated for account swordfish on Note-8. This omemoDevice is listed in the leopard identities table as active, but there is no record in its session table.

Any idea what has led to this problem; and the correct approach to unlock this state?

When is the session table record get generated, and what is the relationship between identity and session record, are they inter-dependent;

====== New input (20180817) =========
I have tried everything i.e. identity regen, purge inactive identities on both the devices. But none can unlock and get both device to have proper omemo messaging operation.

Finally I have to purge the whole omemo tables on both devices and rebuilt the omemo table data from scratch. Only then it works. However this option is only available in my debug version. I would really hope you are able to provide some insight to the whole problem, and a better and cleaner way to unlock the situation instead of purging the whole omemo tables.

1 Like

I still haven’t figured out, how smack-omemo can end up in the lockup in the first place. I.e at which point the omemo tables are getting corrupted.

This is a hint that the raw session is not written back to storage properly. Maybe this is caused by an earlier application crash?

Either way, it looks like I have to spend some more energy in implementing better data sanitation of some sort…

Is it possible to recreate the raw session if found missing from the session table? If so, which method should aTalk call?

aTalk implementation returns an empty new sessionRecord if missing from table. aTalk can trigger the recreation of the raw session when table returns null. A missing raw sessionRecord locks aTalk into an unrecoverable state by user at this moment.

    @Override
    public SessionRecord loadRawSession(OmemoDevice userDevice, OmemoDevice contactDevice)
    {
        SessionRecord session = mDB.loadSession(contactDevice);
        return (session != null) ? session : new SessionRecord();
    }

Use OmemoManager.rebuildSessionWith(OmemoDevice device).

Thanks.

I have problem to rebuild the session during the loadRawSession(), as an instance of OmemeManage is not available in that class. This is an idea place as user involvement is not required.

An alternate place is in “Purge unused identities”, to check for missing sessionRecord for each active identity in the table. Not idea but this is the best alternative place I can think off. Any proposal?

  1. By the way, must the contact be online during the session rebuild process?
  2. Must there be an one-one session Record match for each active identity? What is the prerequisite if this is no the case?

I’d rather spend energy determining, how the session breaks in the first place.

Nope, this is one of the key features of OMEMO.

I’m not sure what you mean by one-one session record. In OMEMO the user has a session with each active device of the contact. If no session exists, a new one will be created. If no session can be built, the client will be informed by an exception.

one-one session Record match:

When I look at the aTalk database for swordfish, there are a total of 24 active identities, however I see there are only 10 session records in DB. If I need to perform to rebuild missing session records, do I have to build for the missing 14 identities that are without session records in this. Or there are other criteria beside active state flag, e.g. identity trust state, last activation etc.

If no session exists, a new one will be created. If no session can be built, the client will be informed by an exception.

When is “no session exists” being determined. The leopard locked up is due to missing session for the contact swordfish active omemoDevice. However no mater what I try including regen, delete inactive devices etc, this missing session does not get rebuilt. May be there is some implementation lack in aTalk, that prevents omemo for detecting the missing session. At the end I have to re-create the whole omemo db for leopard to continue testing.

I’d rather spend energy determining, how the session breaks in the first place.

Yes, I strongly agree with you. The problem is I am not very familiar with the omemo whole processes flow, hence have some difficulties in determine the root cause.

Actually the problem started while I am trying to fix a Field Crash Report in android play store, captured on an aTalk user device.

From the FCR, I figured out that the user might have performed a OmemoDevice regeneration, and proceed to start an omemo session that leads to aTalk crashes. Final conclusion is that aTalk crashes is because trustCallback is not static, and needs to be initialized for each instance of omemoManager for every newly created OmemoDevice. During the course of testing and verification, the fix is working initially. However after I performed multiple times on OmemoDevice regeneration on swordfish device, then the locked problem starts to appear, even on leopard which has not undergo any regen process.

As I do not have a full picture in omemo process flow during omemo regen, I am unable to figure out what can be the possible cause.

============== field crash report ==============

Aug 11, 6:07 AM on app version 1045
Huawei P20 lite (HWANE), Android 8.0
Report 1 of 1
java.lang.IllegalStateException: 
  at org.jivesoftware.smackx.omemo.OmemoManager.isTrustedOmemoIdentity (OmemoManager.java:475)
  at org.atalk.crypto.CryptoFragment.doHandleOmemoPressed (CryptoFragment.java:331)
  at org.atalk.crypto.CryptoFragment.onOptionsItemSelected (CryptoFragment.java:218)
  at android.support.v4.app.Fragment.performOptionsItemSelected (Fragment.java:2476)
  at android.support.v4.app.FragmentManagerImpl.dispatchOptionsItemSelected (FragmentManager.java:3343)
  at android.support.v4.app.FragmentController.dispatchOptionsItemSelected (FragmentController.java:347)
  at android.support.v4.app.FragmentActivity.onMenuItemSelected (FragmentActivity.java:413)
  at com.android.internal.policy.PhoneWindow.onMenuItemSelected (PhoneWindow.java:1353)
  at com.android.internal.view.menu.MenuBuilder.dispatchMenuItemSelected (MenuBuilder.java:761)
  at com.android.internal.view.menu.SubMenuBuilder.dispatchMenuItemSelected (SubMenuBuilder.java:82)
  at com.android.internal.view.menu.MenuItemImpl.invoke (MenuItemImpl.java:178)
  at com.android.internal.view.menu.MenuBuilder.performItemAction (MenuBuilder.java:908)
  at com.android.internal.view.menu.MenuBuilder.performItemAction (MenuBuilder.java:898)
  at com.android.internal.view.menu.MenuPopup.onItemClick (MenuPopup.java:129)
  at android.widget.AdapterView.performItemClick (AdapterView.java:321)
  at android.widget.AbsListView.performItemClick (AbsListView.java:1217)
  at android.widget.AbsListView$PerformClick.run (AbsListView.java:3203)
  at android.widget.AbsListView$3.run (AbsListView.java:4151)
  at android.os.Handler.handleCallback (Handler.java:808)
  at android.os.Handler.dispatchMessage (Handler.java:101)
  at android.os.Looper.loop (Looper.java:166)
  at android.app.ActivityThread.main (ActivityThread.java:7425)
  at java.lang.reflect.Method.invoke (Native Method)
  at com.android.internal.os.Zygote$MethodAndArgsCaller.run (Zygote.java:245)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:921)

The field crash report is probably caused by aTalk not having an OmemoTrustCallback set. This has nothing to do with sessions.

You should register an OmemoTrustCallback after you first obtain an instance of the OmemoManager.
The trust callback is used to determine, whether a user trusts an Omemo identity or not.

As I do not have a full picture in omemo process flow during omemo regen, I am unable to figure out what can be the possible cause.

The process is basically the following:

  • The user decides to regenerate their identity
  • The OmemoManager creates a new random deviceID
  • The OmemoManager creates new keys for that ID
  • The keys are published to PubSub
  • …
  • Contacts see the users new device → if they send messages first, their clients build new sessions and the contacts have to trust the users device
  • If the user tries to send a message to a contact first, the OmemoManager has to build new sessions and the user has to trust the contacts devices.

Regenerating the identity basically means that the OmemoManager starts over from the beginning, ditching all old keys and sessions. I intentionally decided to outsource trust decisions to the client dev by introducing the OmemoTrustCallback, as the client dev can now decide, whether to “reuse” trust decisions made before the regeneration was done, or if the user has to make all trust decisions again after a regeneration.

Many thanks for the explanation; based on your description and tracing through your source, I have found another incorrect implementation in aTalk i.e.

SessionRecord loadRawSession(OmemoDevice userDevice, OmemoDevice contactDevice)m

Currently aTalk returns an empty record instead of null when no record is found; whereas OmemoService uses null to detect “hasSession”. This is the main reason why leopard was locked in a unrecoverable state in my previous test. I have proceed to fix the problem in aTalk.

May be is good idea to add this “should return null when no session record is found … use as flag to build fresh session” in your loadRawsession() prototype header.

Actually I have the following header in my method implementation which has incorrectly described the requirements. Not sure where I extracted this info:

    /**
     * Returns a copy of the {@link SessionRecord} corresponding to the recipientId + deviceId
     * tuple, or a new SessionRecord if one does not currently exist.
     *
     * It is important that implementations return a copy of the current durable information. The
     * returned SessionRecord may be modified, but those changes should not have an effect on the
     * durable session state (what is returned by subsequent calls to this method) without the
     * store method being called here first.
     *
     * Load the crypto-lib specific session object of the device from storage.
     *
     * @param userDevice our OmemoDevice.
     * @param contactDevice device whose session we want to load
     * @return crypto related session
     */

There was (or is?) a parametrized JUnit test for the OmemoStore which could come in handy to check OmemoStore implementations.

Edit: Yep, there is the SignalOmemoStoreTest in smack-omemo-signal, where you could add an instance of your OmemoStore implementation after line 75.

Thanks.

I have already manually reviewed the source and verified the aTalk implementation against the SignalOmemoStore and SignalFileBaseOmemoStore. Hopefully they are OK now.

1 Like