Openfire version: 4.7.4
Spark version: 3.0.2
smack version: 4.4.6
When STARTTLS and SSL are enabled, our custom smack client is timing out while connecting to Openfire consistently with this error:
org.jivesoftware.smack.SmackException$NoResponseException: No response received within reply timeout.
The same timeout also seems to occur in Spark, and occurs more consistently when the Openfire server has been running for a while. Restarting Openfire typically fixes it for the first connection.
With debugging enabled, the last few frames sent/received from the client are:
20:16:32 SENT (1):
<starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>
20:16:32 RECV (1):
<proceed xmlns="urn:ietf:params:xml:ns:xmpp-tls"/>
20:16:32 SENT (1):
<stream:stream xmlns='jabber:client' to='hostname' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' from='user@hostname' id='80fsko1mqj' xml:lang='en-US'>
While debugging this, we noticed was that this connection timeout would NOT occur when we had certain breakpoints set, specifically when the client sends the streamOpen nonza. In other words, it seems the smack implementation is sending the nonzas too quickly, which results in Openfire never receiving the streamOpen nonza and never responding to the client, leading to a connection timeout.
Adding a small sleep of ~100ms–after the client sends the startTls nonza before it sends the streamOpen nonza–fixes this issue (we implemented this hack with a custom HostnameVerifier that sleeps for a bit before returning).
Obviously this is a very hacky solution for us since we don’t really understand the root cause of this issue. I wanted to bring this to your attention since this seems to also affect the Spark client, and I’m hoping you guys might have an idea for a better solution. Thanks!