powered by Jive Software

XMPPTCPConnection: internal race condition during stream resume?


#1

It looks like there are no provisions in Smack to actually wait for and process a <stream:stream> from the server. I’m seeing very very weird behavior in my implementation during stream resumption, where outgoing <stream:stream> and <resume> are sent in parallel:

12-19 07:40:33.821 D/SMACK   (13162): SENT (1): <stream:stream xmlns='jabber:client' to='yax.im' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' id='1ffda9a8-b175-4326-9bfc-de590b1bb0d8<'resume  xmlns='urn:xmpp:sm:3'xml:lang ='hen=''29686>
12-19 07:40:33.822 D/SMACK   (13162): SENT (1): ' previd='79042661-3d09-49bd-86ad-f10abc4140db'/>

I don’t understand yet how this is happening (both are triggered from loginInternal which is synchronized, so I don’t think it’s related to my threading). OTOH, both elements are put into the PacketWriter.queue so I wonder how they end up intermixed.

Any ideas?


#2

P.S: isSmResumptionPossible() doesn’t check stream:features post-auth, but only its internal streamId, which is a bit racy and not quite correct…


#3

IIRC I deliberately did not check stream:features as it is unnecessary under the assumption that a server does not suddenly lose its stream management capabilities.

Not sure what you mean “is a bit racy and not quite correct”. Happy to fix any issues, but it is hard it you don’t actually point them out.

Also this should possibly have been a new topic.


#4

Right, there are provisions to wait for the stream future annoucement of the server.


#5

Please have a look at my XML from the inital post, this is what I’m bangig my head against for some days now and where I really urgently need a second pair of eyes.

Regarding the stream-features post-auth, it looks like Smack will happily cache those from pre-auth or even pre-tls, and those might be different (e.g. not contain bind or other things). Not waiting for the updated features is what I considered as “racy” in my comment.

Then I’m missing the code where you are waiting for the stream feature announcement after authentication.


#6

I only can imagine that this could happen if two writer threads are running concurrently using the same queue and socket. But I have to think more about this, which may take some time, especially considering that the holiday seasion is starting soon.

I also experienced similar behavior when deflate compression broke because it was using the wrong dictionary. But I think you mentioned off band to me that compression is not involved here.

It is good that Smack then in-fact does wait for the updated features. Right? :slight_smile:

It’s a little bit hidden and arguably not the most elegant code (cause “organically grown”), but it’s done by the maybeCompressFeaturesReceived sync point.


#7

Yes, you are right. I’m not using compression.

Thanks, I missed that one indeed. Now that I’ve looked at the code, there is one minor little catch in it:

        if (!config.isCompressionEnabled()) {
            return;
        }
        maybeCompressFeaturesReceived.checkIfSuccessOrWait();

This checkpoint isn’t awaited if I have compression disabled. I have a feeling that reverting the order of the two above commands will work around my race condition, but I still think thins needs to be properly investigated.


#8

Good catch, that is indeed wrong and the order should be switched. I’ve created SMACK-846.

Does “my race condition” refer to your initial problem of intermixed elements in the stream? If so, it would be great if you could test if switching the order fixes the problem for you.


#9

Yes, I’m referring to the XML garbage in the initial post. I’m testing the reversed-order code now for the next days, and will have to see whether the problem happens again. It was pretty hard to reproduce in the past - you need a special kind of bad network connectivity that I only experience on business trips, and it needs a background logging mechanism on the phone or a permanent adb connection.

I still think that there is something more fundamental going bad (how can you ever end up with two XML elements being written concurrently?), but if this is going to be a viable workaround, I might stop caring.


#10

My first guess would be two writer threads using the same queue and socket.

It is not immediately clear to me how this could be caused by that. But it is the nature of concurrency problems that they are sometimes hard comprehend. So please, in order to be not biased, forget the two previous statements if you ever try to squash the bug.