Dedlock during Smack disconnect

Hi,

I’ve uncovered subtle and intermittent deadlock scenario that can occur (in the 3.0.4 release) when closing a XMPP connection. The scenario occurs as follows, Calling disconnect attempts to close the connections reader, (line 528) which attempts to acquire the readers lock. This reader is also held by the MXParser, which at the time of the close is blocked reading from the socket. This blocked read is holding the reader lock and thus deadlocks the close call. There is an attempt to shutdown the PacketReader (line 570), but as shutdown waits for the current operations to finish, the blocked read is not interrupted and the call to close will hang.

It could easily be mitigated by calling shutdownNow on the packet reader, (interrupting the read and allowing the reader to close) or if this is not desirable, calling shutdown, and starting a timer to force the shutdown (via shutdown now) if the reader is still open after a few seconds.

Best,

Stephen

I found the exactly same problem happening. This is not fixed in 3.1.0 beta. It looks to me in the disconnect() method one should first stop/destroy the packet reader thread (the one runs parsePackets()) to release its lock on the java.io.InputStreamReader. I hope this get fixed in 3.1.0 RC1. Here is the thread dump:

[java] Full thread dump Java HotSpot™ 64-Bit Server VM (1.5.0_07-b03 mixed mode):

[java]

[java] “Smack Packet Reader (12215)” daemon prio=1 tid=0x00002aaaf89a2f30 nid=0x3be9 runnable [0x0000000048ae0000…0x0000000048ae0d10]

[java] at java.net.SocketInputStream.socketRead0(Native Method)

[java] at java.net.SocketInputStream.read(SocketInputStream.java:129)

[java] at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)

[java] at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)

[java] at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)

[java] - locked <0x00002aaae184d920> (a java.io.InputStreamReader)

[java] at java.io.InputStreamReader.read(InputStreamReader.java:167)

[java] at java.io.BufferedReader.fill(BufferedReader.java:136)

[java] at java.io.BufferedReader.read1(BufferedReader.java:187)

[java] at java.io.BufferedReader.read(BufferedReader.java:261)

[java] - locked <0x00002aaae184d920> (a java.io.InputStreamReader)

[java] at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)

[java] at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)

[java] at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1144)

[java] at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)

[java] at org.jivesoftware.smack.PacketReader.parsePackets(PacketReader.java:368)

[java] at org.jivesoftware.smack.PacketReader.access$000(PacketReader.java:44)

[java] at org.jivesoftware.smack.PacketReader$1.run(PacketReader.java:76)

[java]

[java] “main” prio=1 tid=0x000000004011bb30 nid=0x72fa waiting for monitor entry [0x00007fff3fa24000…0x00007fff3fa24b10]

[java] at java.io.BufferedReader.close(BufferedReader.java:500)

[java] - waiting to lock <0x00002aaae184d920> (a java.io.InputStreamReader)

[java] at org.jivesoftware.smack.XMPPConnection.shutdown(XMPPConnection.java:583)

[java] at org.jivesoftware.smack.XMPPConnection.disconnect(XMPPConnection.java:643)

[java] at org.jivesoftware.smack.XMPPConnection.disconnect(XMPPConnection.java:618)

[java] at com.gaocan.im.server.ImConnectionPool.closeConnection(ImConnectionPool.java:114 )

so I am trying this hack, so far it seems to work:

add "

try {

socket.shutdownInput();

} catch (IOException e) {

;

}

"

in front of the

packetReader.shutdown(); call in

XMPPConnection.shutdown() method. With this hack then reader is forcefully closed, you will see these harmless messages:

[java] java.io.EOFException: no more data available - expected end tag </stream:stream> to close start tag stream:stream from line 1, parser stopped on END_TAG seen …onflict xmlns=‘urn:ietf:params:xml:ns:xmpp-stanzas’/>… @1:690

[java] at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:3015)

[java] at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)

[java] at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1144)

[java] at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)

[java] at org.jivesoftware.smack.PacketReader.parsePackets(PacketReader.java:368)

[java] at org.jivesoftware.smack.PacketReader.access$000(PacketReader.java:44)

[java] at org.jivesoftware.smack.PacketReader$1.run(PacketReader.java:76)

Is this the best solution anyone has found for this issue?

I pulled down the latest source from the subversion repository, but the problem still seems to appear with that code.

I tried making the modifications to the source described above, and found I needed to force the socket to close before calling packetReader.shutdown() to completely avoid the problem.

try {
socket.shutdownInput();
socket.close();
} catch (IOException e) {
;
}

Thanks for any information.

Hi,

I’m having the same problem with disconnect().

It is not fixed in version 3.1.0 and I’m wondering if anybody filed a bug or contributed the solution for this issue. Otherwise everybody having this problem will have to fix it in their own copy of smack…

Best regards,

Peter

Do you think this is related to SMACK-230 and http://www.igniterealtime.org/community/message/148571#148571 or should I create another issue?

Hallo Guenther,

thanks for your quick reaction.

These seem to be different issues since no NullPointerException occurs - execution is blocked (maybe even deadlocked) in disconnect().

Best regards,

Peter

Yes, you’re right. I created SMACK-278.

Hello,

Is this the place to submit patches?

Anyway, please find attached patch that works for us.

Cheers,

Kare
0001-SMACK-278-Fix-deadlock-during-disconnect.patch.zip (1188 Bytes)

Hi Kare,

Thanks for submitting a patch, do you have a Jira account http://issues.igniterealtime.org ? If so, I can elevate your privs there to help you submit future patches and file tickets.

daryl

Hello Daryl,

My Jira account is kare.

Cheers,

Kare

Hi Kare,

You should be all set! We are always seeking more svn committers and patch writers, so please consider submitting more patches

daryl

Attached to the issue in Jira.

Thanks for your patch Kare. I needed to put the close() in a extra try-catch block, otherwise if there is an Exception in shutdownInput() it won’t get called:

    try {

          socket.shutdownInput();

    } catch (Exception e) {

          // Ignore

    }

    try {

              reader.close();

    } catch (Exception e) {

              e.printStackTrace();

    }

This does fix the stalled threads and the memory leak when reconnecting the XMPPConnection, which is very helpfull for mobile (android) development. But the unexpected close() of the socket will throw exceptions in the PacketWriter and PacketReader class, which results in a unclean shutdown. I am not sure how big the impact is, but just doesn’t seem to be a proper long-term solution.

I have an android app that uses (a)smack in an long running service. The deadlock prevents some threads from terminating, therefore they don’t get garbage collected causing my service getting restarted by Androids ActivtyManager to reclaim memory. I am certainly looking for a solution of SMACK-278, that’s why I asked this also on stackoverflow. I will start an bounty for the answer as soon as possbile. Maybe will get some other ideas/input on how to solve this.

You can find the the so.sx question here: http://stackoverflow.com/questions/7901303/how-to-solve-deadlock-in-smack-with-b ufferedreader-on-xmppconnection-shutdown

Flow

Here is the patch I came up with. It’s based on a smack fork and therefore may not apply directly to vanilla smack, but I think that the changes are small and it sould work. At least you will get an idea of what I’ve changed.

  1. Introduced a new bool “socketClosed” in XMPPConnection to signal the PacketReader and PacketWriter threads when they should ignore an exceptions caused by a closed socket
  2. Fixed an NPE error when PacketWriter Exception handling, which caused the app to crash
  3. Moved “connected = false” to the latest possible point, because of concurrent access to this information
  4. Increased the initial heartbeat delay, also good in mobile situations
  5. Interrupt the keepAlive threadon shutdown, so that the thread can be cleaned up and doesn’t stick around for a long time. This is handy in mobile situations (android) where you want an long heartbeat delay, which causes the KeepAliveThread to be in sleep() for a long time

I did some test which where very promising. My Android App does no longer leak any threads. :slight_smile:

Suggestions and feedback welcome

Flow
SMACK-278.patch.zip (1872 Bytes)

Maybe we split the patch in two patches, one for the 4th and 5th change and one for the shutdown fixes?

What do you guys think about the idea make status flags like “socketClosed” and “done” volatile?

edit

In XMPPConnection.java, you close the socket twice, is that intended? Does it have any effect?

I will take a careful look at the patch before committing it for 3.2.2. I am very hesitant about making changes to the core this late, as it is rather risky and might have unforeseen consequences. Especially in threading related issues.

It may have to wait to get rolled into a nightly build on 3.3, where it will have more time to get used by people to see how it works before being rolled into an official release.

Hey Tim,

sure, a split makes sense. About the keepAliveThread interrupt: I forgot to add an check if keepAliveThread != null, be sure to include this commit also https://github.com/Flowdalic/smack/commit/daf4dbc7bf283e82cbccbeb27604cf7ce8c9e3 1a.

About making them volatile: It seems to be the right thing, both booleans are accessed by different threads. On the other hand, it worked suprisingly well without :slight_smile:

The second socket.close() was just layzness by me. I was happy that I fixed the damn deadlock and thought that it would harm anything. It maybe can be removed.

Sure, beeing carefull is always the right thing when it comes down to libraries. But in this case, the root of the deadlock can simply be fixed by replacing the reader.close() and writer.close() calls with a socket.close() (and adding a flag to prevent Exception handling in the Reader and Writer classes). I don’t see a big difference here.

Anyway, looking forward to see this into smack soon. Even if it’s only in a 3.3 release. :slight_smile: