Smack OmemoService prekeys publish to server always failed on slow device

Please refer to the discussion on Smack OmemoManager.purgeDevices() throws NoResponseException prior to sending the stanza

aTalk with smack version 4.2.2 integrated still facing the same problem on slow android device e.g. Samsung SII. The reason is traced to the fact that it takes almost 90 seconds for omemoService to publish the 100 prekeys to the server on SII via PubSubManager#tryToPublishAndPossibleAutoCreate(deviceId, playLoad). On trying, I have to setReplyTimeout() to 120 seconds for the publish to work properly, but not practical in normal operation.

To overcome the problem I have included a patch in OmemoService#publishBundle() as attached i.e. to ignore the publish response timeout, so the OmemoManager can proceed to complete its omemo initialization upon user authentication. The Omemo prekeys publish process actually continues to completion in background. Not sure if the 90s long process time is because the publish process is executed in async mode after the smack timeout timer has started; as I see Smack throws response timeout even before the publish stanza is being send…

I have included this patch in the latest aTalk version 0.9.1 release. The patch is not a final solution, but at least Omemo message is now working on SII.

/**
 * Publish a fresh bundle to the server.
 *
 * @param omemoManager OmemoManager
 * @throws SmackException.NotConnectedException
 * @throws InterruptedException
 * @throws SmackException.NoResponseException
 * @throws CorruptedOmemoKeyException
 * @throws XMPPException.XMPPErrorException
 */
void publishBundle(OmemoManager omemoManager)
        throws SmackException.NotConnectedException, InterruptedException,
        SmackException.NoResponseException, CorruptedOmemoKeyException, XMPPException.XMPPErrorException {
    Date lastSignedPreKeyRenewal = getOmemoStoreBackend().getDateOfLastSignedPreKeyRenewal(omemoManager);
    if (OmemoConfiguration.getRenewOldSignedPreKeys() && lastSignedPreKeyRenewal != null) {
        if (System.currentTimeMillis() - lastSignedPreKeyRenewal.getTime()
                > 1000L * 60 * 60 * OmemoConfiguration.getRenewOldSignedPreKeysAfterHours()) {
            LOGGER.log(Level.INFO, "Renewing signedPreKey");
            getOmemoStoreBackend().changeSignedPreKey(omemoManager);
        }
    } else {
        getOmemoStoreBackend().setDateOfLastSignedPreKeyRenewal(omemoManager);
    }

    // publish

// PubSubManager.getInstance(omemoManager.getConnection(), omemoManager.getOwnJid())
// .tryToPublishAndPossibleAutoCreate(OmemoConstants.PEP_NODE_BUNDLE_FROM_DEVICE_ID(omemoManager.getDeviceId()),
// new PayloadItem<>(getOmemoStoreBackend().packOmemoBundle(omemoManager)));

    PubSubManager pubSubMgr = PubSubManager.getInstance(omemoManager.getConnection(), omemoManager.getOwnJid());
    String deviceId = OmemoConstants.PEP_NODE_BUNDLE_FROM_DEVICE_ID(omemoManager.getDeviceId());
    PayloadItem<OmemoBundleVAxolotlElement> playLoad = new PayloadItem<>(getOmemoStoreBackend().packOmemoBundle(omemoManager));

    try {
        pubSubMgr.tryToPublishAndPossibleAutoCreate(deviceId, playLoad);
    } catch (InterruptedException | SmackException.NoResponseException | SmackException.NotConnectedException | XMPPException.XMPPErrorException e) {
        LOGGER.log(Level.WARNING, "Ignore OmemoManager publish response timeout ~90S on SII"
                + e.getMessage());
    }
}
1 Like

Hi!
In my onging rework, I’ll include your patch. I’m not really sure, why the method takes longer on slow devices though. Can you confirm, that this is due to key generation (ie. does it still take the same amount of time on the second time logging in?).

Thank you very much for the feedback :slight_smile:

I modified the source as below to capture some timing information. Upon first apk installation and execute.
From below captured log;

  1. It takes ~38s for the 100 prekeys generation.
  2. Smack throws response timeout after 10s (aTalk preset default delay)
  3. The actual prekey publish stanza is sent only 21s after (1) prekey generation
  4. The publish reply stanza is received 7s after (3). Note this time will be extended on a overloaded server.

======== Samsung SII debug log (first launch) ============

12-19 23:04:29.125 I/aTalk: [10] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() Start Prekeys preparation
12-19 23:06:07.635 I/aTalk: [10] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() PayLoad generated: <item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic ......

12-19 23:06:17.680 W/aTalk: [10] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() Ignore OmemoManager publish response timeout ~90S on SIINo response received within reply timeout. Timeout was 10000ms (~10s). Waited for response using: IQReplyFilter: iqAndIdFilter (AndFilter: (OrFilter: (IQTypeFilter: type=error, IQTypeFilter: type=result), StanzaIdFilter: id=vLWGi-126)), : fromFilter (OrFilter: (FromMatchesFilter (full): swan@atalk.org, FromMatchesFilter (full): null)).

12-19 23:07:46.345 D/SMACK: SENT (0): <iq to='swan@atalk.org' id='vLWGi-126' type='set'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:960906819'><item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic ......

12-19 23:07:53.420 D/SMACK: RECV (0): <iq xml:lang='en' to='swan@atalk.org/atalk' from='swan@atalk.org' type='result' id='vLWGi-126'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:960906819'><item id='5E9A9F09B3DD'/></publish></pubsub></iq>

On second launch after exit:
From below captured log;

  1. It takes ~3s for the 100 prekeys fetch.
  2. The actual prekey publish stanza is sent 3s after prekey is fetched.
  3. The publish reply stanza is received 7s after (2)

======== Samsung SII debug log (second launch) ============

12-19 23:26:10.115 I/aTalk: [10] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() Start Prekeys preparation
12-19 23:26:13.300 I/aTalk: [10] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() PayLoad generated: <item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic .....

12-19 23:26:16.025 D/SMACK: SENT (0): <iq to='swan@atalk.org' id='4hX41-126' type='set'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:960906819'><item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic .......

12-19 23:26:23.100 D/SMACK: RECV (0): <iq xml:lang='en' to='swan@atalk.org/atalk' from='swan@atalk.org' type='result' id='4hX41-126'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:960906819'><item id='5E9AA35EB4273'/></publish></pubsub></iq>

From the measured result, it seems to take very long in prekey generation (38s) and preparation for sending the stanza (21s) on first instance. However on the second results, it takes only 3s to send publish stanza after prekeys fetched.

There are two constrains that cause the observed results.

  1. The prekey generation does takes some time (38s) on slow device due to processing speed. Possibly half this time is due to JIT code translation (see below)
  2. If you remember our previous discussion. a second delay come from android inherited OS architecture i.e. JIT and AOT. Comparing result #1 & #2, JIT preparations take almost 18s (= 21 - 3) during stanza sending.

For comparison, I run the same code on Note-8 (AOT architure) with first launch result as below:

  1. It takes ~330ms for the 100 prekeys generation.
  2. The actual prekey publish stanza is sent 170ms after prekey is generated.
  3. The publish reply stanza is received 4.5s after (2)

Hope the information help.

====== Note 8 debug log ===============

12-20 00:16:28.844 I/aTalk: [4886] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() Start Prekeys preparation
12-20 00:16:29.177 I/aTalk: [4886] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() PayLoad generated: <item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic ....

12-20 00:16:29.346 D/SMACK: SENT (0): <iq to='swordfish@atalk.org' id='IPKwd-128' type='set'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:409931947'><item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic ....

12-20 00:16:33.736 D/SMACK: RECV (0): <iq xml:lang='en' to='swordfish@atalk.org/atalk' from='swordfish@atalk.org' type='result' id='IPKwd-128'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:409931947'><item id='5E9AAF228703B'/></publish></pubsub></iq>

//======= code modified to gather timing info =======================//

    LOGGER.log(Level.INFO, "Start Prekeys preparation");
    PubSubManager pubSubMgr = PubSubManager.getInstance(omemoManager.getConnection(), omemoManager.getOwnJid());
    String deviceId = OmemoConstants.PEP_NODE_BUNDLE_FROM_DEVICE_ID(omemoManager.getDeviceId());
    PayloadItem<OmemoBundleVAxolotlElement> payLoad = new PayloadItem<>(getOmemoStoreBackend().packOmemoBundle(omemoManager));
    LOGGER.log(Level.INFO, "PayLoad generated: " + payLoad.toXML());

    try {
        pubSubMgr.tryToPublishAndPossibleAutoCreate(deviceId, payLoad);
    } catch (InterruptedException | SmackException.NoResponseException | SmackException.NotConnectedException | XMPPException.XMPPErrorException e) {
        LOGGER.log(Level.WARNING, "Ignore OmemoManager publish response timeout ~90S on SII"
                + e.getMessage());
    }

Hm, I’ll have to investigate, how we can speed that up.
Its strange to me, that it takes 21 seconds to send the publish stanza. There should only be some serializing going on behind the scenes, which shouldn’t take long at all.

Would it make sense to figure out, how Signal handles slow devices?

That smells like a IQ request/response operation being performed in a sync stanza listener. Log the full stack trace in case the timeout occurs. But note that the actually culprit control flow could be somewhere else.

The smack timeout timer starts correctly upon pubSubMgr.tryToPublishAndPossibleAutoCreate(deviceId, payLoad); execution, and it seems to follow and timeout after 10s i.e. the default smack default timeout as set in aTalk app. The strange thing is that why there is a delay before the payload prekey is actually being sent to the server; this time seems to varies from device to device. This delay time is considered as part of the smack response timer. One of the possible reason for this delay is due to android JIT code translation, but cannot accounted for all the delay seen.

From below captured log on Note 2 (JIT architecture);

  1. It takes ~3.4s for the 100 prekeys generation.
  2. The actual prekey publish stanza is sent only 3.2s after (1) prekey generation.
  3. The publish reply stanza is received 4.5s after (2). Note this time will be extended on a overloaded server.

One other observation is that the server needs time to save and response to the published prekey request. This time is ~4.5 seconds even with Note8 and my home server running ejabberd 17.x which is only lightly loaded. What I see is that the current default Smack default time of 5 seconds may encounter problem under most practical condition. aTalk increases this default time to 10s.
Note: From above data, Note2 will have response timeout (for the smack 5s default)

12-23 14:01:01.131 I/aTalk: [10] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() Start Prekeys preparation
12-23 14:01:04.521 I/aTalk: [10] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() PayLoad generated: <item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic .....

12-23 14:01:07.736 D/SMACK: SENT (0): <iq to='eagle@atalk.org' id='ZBPtd-115' type='set'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:2031682162'><item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic .....

12-23 14:01:12.071 D/SMACK: RECV (0): <iq xml:lang='en' to='eagle@atalk.org/atalk' from='eagle@atalk.org' type='result' id='ZBPtd-115'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:2031682162'><item id='5EA22A77E045'/></publish></pubsub></iq>

Just observed another very strange behavior on Note 2, and seems to happen frequent.
On first launch of the aTalk app, the paylod prekeys is only sent after a very long delay of 133s ??? from smack response timeout. Base on other debug log messages, the response reply timeout is reported correctly after start of
pubSubMgr.tryToPublishAndPossibleAutoCreate(deviceId, payLoad);

12-23 17:18:53.456 W/aTalk: [9] org.jivesoftware.smackx.omemo.OmemoService.publishBundle() Ignore OmemoManager publish response timeout ~90S on SIINo response received within reply timeout. Timeout was 10000ms (~10s). Waited for response using: IQReplyFilter: iqAndIdFilter (AndFilter: (OrFilter: (IQTypeFilter: type=error, IQTypeFilter: type=result), StanzaIdFilter: id=c05LM-107)), : fromFilter (OrFilter: (FromMatchesFilter (full): eagle@atalk.org, FromMatchesFilter (full): null)).

12-23 17:20:40.501 D/SMACK: SENT (0): <iq to='eagle@atalk.org' id='c05LM-107' type='set'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:2031682162'><item><bundle xmlns='eu.siacs.conversations.axolotl'><signedPreKeyPublic .....

12-23 17:20:45.011 D/SMACK: RECV (0): <iq xml:lang='en' to='eagle@atalk.org/atalk' from='eagle@atalk.org' type='result' id='c05LM-107'><pubsub xmlns='http://jabber.org/protocol/pubsub'><publish node='eu.siacs.conversations.axolotl.bundles:2031682162'><item id='5EA516C3462C'/></publish></pubsub></iq>

As Flow mentioned, this sounds like race conditions to me. Currently I’m redoing most of the internal logic hopefully handling async tasks in a more sane way :slight_smile: