powered by Jive Software

Smack looses messages at connection shutdown/failure

Hi,

we are struggeling with several scenarios where smack (3.3.1) looses messages. Would be great if you could take a look at that.

**Issue A) **

XMPPConnection -> protected void shutdown(Presence unavailablePresence)

**
**

if (packetReader != null) {

packetReader.shutdown();

}

Is called immediatly after the presence send. As a result are all messages on the wire between the server recognizing that (i.e. sending no more messages) and the listenerExecutor.shutdown() in PacketReader lost.

We think that this can be solved by moving the reader shutdown behind the Thread.sleep.

Issue B)

XMPPConnection -> protected void shutdown(Presence unavailablePresence)

**
**

The Thread.sleep is pretty short for our case (embedded ARM based hardware and a pretty bad mobile connection) and we are loosing messages here as we are not able to send all messages in the queue in that hard coded timeframe. Is it possible to make that one configurable at the connection API?

Even better would be switch to a timeout based approach where you send everything left in the queue until the timeout is reached. Well, I guess thats going to be a bigger task

Issue C)

PacketWriter -> private void writePackets(Thread thisThread) {

while (!done && (writerThread == thisThread)) {

Packet packet = nextPacket();

if (packet != null) {

writer.write(packet.toXML());

if (queue.isEmpty()) {

writer.flush();

}

}

}

The problem here is the exception handling and the underlying (nextPacket) “queue.poll”. The (IO)exception handling does take care of the connection. However, the “packet” is lost in case of an IOException at writer.write. It would be great if you could peek at the quere bevore the socket write and than doing the real poll after the write.

We are aware that in case of a broken connection we will typically loose more messages to the socket itself than to smack, but at least one message loss at every socket failure can be avoided here

Thanks,

kai

A and B are all part of the problem: Smack simply discards any incoming or outgoing packets when shutdown is called. The solution would be a more graceful shutdown, i.e. one that checks that packetWriter’s packet queue is empty while simultaniously preventing new packets from being added. And also there should be some sort of, timeout secured mechanism to detect the closing of the stream on the server side, therefore ensuring that there are no more packets left from the server.

Logged as SMACK-520

C logged as SMACK-521. I’m not sure what’s the problem you are trying to solve with the peek and pool. For now I assumed that if write() encounters an exception, all packets in the queued will be lost, because of the queue.clear() after the exception.

Thanks for feedback and the ticket registration.

However, as far as I see it queue.clear is never reached.

He will jump directly to:

catch (IOException ioe) {

// The exception can be ignored if the the connection is ‘done’

// or if the it was caused because the socket got closed

if (!(done || connection.isSocketClosed())) {

done = true;

// packetReader could be set to null by an concurrent disconnect() call.

// Therefore Prevent NPE exceptions by checking packetReader.

if (connection.packetReader != null) {

connection.notifyConnectionError(ioe);

}

}

}

However, we have been doing more testing and further investigation here and it seems this issue © was a false trail. Well, it is a problem but it looks that in our cases the PacketReader always gets the hint of a socket issue first and starts the handling (XMPPConcection.notifyConnectionError) and not the writer.

Turned out that this brings us into a situation where reader/writer.*done *is set to true but *connection *is also still on true (shutdown. As a result XMPPConnection.sendPacket(Packet packet) will still accept packets (connection is true) that go directly into nirvana (done is true in Packetwriter.sendPacket). I would call that Issue D I think an IllegalStateException in PacketWriter.sendPacket is the way to go here or keeping done and XMPPConnection.connected faster in sync. Set XMPPConnection.connected to false in XMPPConnection.notifyConnectionError together with the two done(s).?

Anyway, this seems to be the primary reason for the packet loss that we have while the writer exception handling issue could not be found in our logs. We will keep an eye on this though maybe we will find a log entry for the writer case also.

We will continue to look into this and will provide you with feedback if we find something new.

Kai

However, as far as I see it queue.clear is never reached.
Correct, I overlooked that. In fact, the queue get’s reused, which, without further testing, looks like a bug. Because packetWriter may send packets from the previous connection before SASL auth is finished when reconnecting. SMACK-521

Turned out that this brings us into a situation where reader/writer.*done is set to true butconnection *is also still on true

Yes that is a problem in the current logic, ideally as soon as Packet{Reader|Writer}.done is true, Connection.connected should be false. Maybe we could move ‘connected = false’ in the first line of shutdown(), but those core changes need to be carefully evaluated and tested. SMACK-522

We are currently working on XEP-198 Stream Management, which will be the ideal solution for all those problems. Stay tuned for SMACK-333, but be aware that it might take a few more months.

Issue C (not D) is what we have experienced very often in yaxim as well - user sends a message, the message causes an IOException (or some other one) on the socket, the connection is re-established, but the message is gone forever. For that, we integrated a send failed listener into smack, please see https://github.com/ge0rg/asmack/blob/master/patch/46-send-failed-listener.patch (this is a patch against smack’s old master branch).

However, if both sides support XEP-0198, there is an explicit sync mechanism after the reconnection, so you can consider the patch as deprecated - I have no idea how it will interact with 0198.