Liveness problem related to PacketReader.connectionIDLock

Hi,

I found that if the packetReader reads the connection ID before the wait in the startup method is reached, then the connection thread becomes blocked until packetReplyTimeout is reached. This isn’'t a theoric liveness problem, but something that I found out testing against a local server and had to wait 5 seconds (the default timeout) for each connection, thus making my stress testing laughably useless

And here’‘s the patch… I was hoping I would get an “attach file” option after posting the first message. If the patch gets corrupted, let me know and I’'ll e-mail it.

A few remarks about the patch:

The access to connectionID is now synchronized. However, the synchronized blocks cover as much “logic” as before (see usage of localConnectionID in the packet reader loop), therefore I’'m not adding new deadlocks or liveness problems.

I noticed that Smack is hyper optimistic when dealing with input, both from the server and other users. My patch deals with the case of a malformed message without a connection ID. I think the behavior resembles the developers’’ reasoning in the rest of the class.

The wait had no guard, therefore the liveness problem. A simple if is not sufficient according to Doug Lea, as wait can be awaken by additional factors thus unexpectedly shortening the timeout. Therefore the time mangling in the new condWaitConnectionID method.

It’‘s a good idea to handle InterruptedExceptionS. However, you may want to remove the exception I’'m throwing in the catch block in startup as a similar exception should be thrown a couple lines below.

I think I tested all reasonable success/failure combinations but, as usual, no guarantees are given that this code actually does what I say it does.

Yeah, the explanation is longer than the actual patch, but locking can get confusing

diff -ur smack-dev-1.4.1.orig/source/org/jivesoftware/smack/PacketReader.java smack-dev-1.4.1/source/org/jivesoftware/smack/PacketReader.java

— smack-dev-1.4.1.orig/source/org/jivesoftware/smack/PacketReader.java 2004-1 2-24 18:29:06.000000000 -0500

+++ smack-dev-1.4.1/source/org/jivesoftware/smack/PacketReader.java 2004-12-24 19:15:21.000000000 -0500

@@ -151,11 +151,12 @@

// Wait for stream tag before returing. We’'ll wait a couple of seconds before

// giving up and throwing an error.

try {

  •        synchronized(connectionIDLock) {
    
  •            connectionIDLock.wait(SmackConfiguration.getPacketReplyTimeout());
    
  •        }
    
  •        condWaitConnectionID();
    
  •    }
    
  •    catch (InterruptedException ie) {
    
  •         Thread.currentThread().interrupt();
    
  •        throw new XMPPException("Connection aborted. Thread interrupted.");
    

}

  •    catch (InterruptedException ie) { }
    

if (connectionID == null) {

throw new XMPPException(“Connection failed. No response from server.”);

}

@@ -192,6 +193,12 @@

void notifyConnectionError(Exception e) {

done = true;

connection.close();

  •    // Wake up the conditional wait in startup, if it''s still running.
    
  •    synchronized(connectionIDLock) {
    
  •        connectionIDLock.notifyAll();
    
  •    }
    

// Notify connection listeners of the error.

ArrayList listenersCopy;

synchronized (connectionListeners) {

@@ -259,19 +266,24 @@

// Ensure the correct jabber:client namespace is being used.

if (“jabber:client”.equals(parser.getNamespace(null))) {

// Get the connection id.

  •                        String localConnectionID = null;
    

for (int i=0; i<parser.getAttributeCount(); i++) {

if (parser.getAttributeName(i).equals(“id”)) {

  •                                // Save the connectionID and notify that we''ve gotten it.
    
  •                                connectionID = parser.getAttributeValue(i);
    
  •                                synchronized(connectionIDLock) {
    
  •                                    connectionIDLock.notifyAll();
    
  •                                }
    
  •                                         localConnectionID = parser.getAttributeValue(i);
    

}

else if (parser.getAttributeName(i).equals(“from”)) {

// Use the server name that the server says that it is.

connection.host = parser.getAttributeValue(i);

}

}

  •                        if (null == localConnectionID)
    
  •                             throw new XMPPException("Missing connection ID in opening stream.");
    
  •                        // Save the connectionID and notify that we''ve gotten it.
    
  •                        synchronized(connectionIDLock) {
    
  •                            connectionID = localConnectionID;
    
  •                            connectionIDLock.notifyAll();
    
  •                        }
    

}

}

}

@@ -790,6 +802,32 @@

}

/**

  • * Waits on monitor <code>connectionIDLock</code> if connection ID
    
  • * is not set. The wait times out after n milliseconds, where n is
    
  • * the value of the <code>packetReplyTimeout</code> property in the
    
  • * <code>SmackConfiguration</code> class.
    
  • *
    
  • * @throws InterruptedException If the Thread is interrupted.
    
  • */
    
  • private void condWaitConnectionID() throws InterruptedException {

  •      int timeout = SmackConfiguration.getPacketReplyTimeout();
    
  •     long start = System.currentTimeMillis();
    
  •      long waitTime = timeout;
    
  •      synchronized(connectionIDLock) {
    
  •           while (null == connectionID && !done) {
    
  •                if (waitTime < 0)
    
  •                     break;
    
  •                connectionIDLock.wait(waitTime);
    
  •                long now = System.currentTimeMillis();
    
  •                waitTime = timeout - (now - start);
    
  •           }
    
  •      }
    
  • }
    
  • /**

  • A wrapper class to associate a packet collector with a listener.

*/

private static class ListenerWrapper {

And here’'s the patch, attached.
smack-1.4.1-jkohen-packetreader_startup.patch (4284 Bytes)

Would the attached patch do the same thing?

It doesn’‘t check for no connection ID being returned… but I’‘ve never seen a server that doesn’'t include it and the XMPP RFC is fairly explicit about requiring it :).

In the patch, the two changes are:

  1. Synchronize on the lock before setting the connectionID.

  2. After synchronizing on lock, check to see that the connectionID is null before waiting.

Thanks,

Matt
packetreader.patch.txt (1535 Bytes)

Note: tracking progress on this issue in SMACK-37.

Thanks,

Matt

Unfortunately your patch is not enough. According to Doug Lea (the authority in Java threads) wait can return before the timeout is reached, even when the monitor received no notifications and, he adds, this does happen in some platforms (I think there was something about Windows here, but I don’‘t recall the exact text). That’'s why you need a while. For the same reason, you have no guarantee that wait will honor your desired timeout.

You could call me paranoic, but I think my code is safer and adds no overhead over the code currently in place or your proposed patch.

You can stop reading here, the rest is fine.

Second, you’‘re right, I was being too careful about not synchronizing around parser.getAttribute, because I don’‘t know whether it can block waiting for I/O. After reviewing the scenario I realized that you’‘re using a lock just for that, so it’'s not that bad (unless getAttribute calls PacketReader.startup which I seriously doubt ). I was thinking about a situation where ‘‘this’’ was being locked.

Stop reading. Don’‘t say I didn’'t warn you.

Last and definitely less important, about not checking whether connection ID was passed by the server… I admit I’‘m being a bit paranoid here, but there are many things that can go wrong (maybe it’‘s a hacked server?). But this is Java, and we’'re safe from stack smashes and freeing null pointers and other things. Or are we?

Recapitulating, I would leave my version of the waiting code, including the handling of the InterruptedExceptions, and your version of the notification code.

Ok, yep. The JDK 1.5 Javadoc mentions this:

“A thread can also wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. While this will rarely occur in practice, applications must guard against it by testing for the condition that should have caused the thread to be awakened, and continuing to wait if the condition is not satisfied.”

However, quite a few areas in Smack rely on wait(long) working correctly. If we fix the issue here, we should most likely fix it everywhere else in the code.

Regards,

Matt

Ok, the final changelog is:

@@ -142,7 +142,21 @@      // giving up and throwing an error.
     try {
         synchronized(connectionIDLock) {
-            connectionIDLock.wait(SmackConfiguration.getPacketReplyTimeout());
+            if (connectionID == null) {
+                // A waiting thread may be woken up before the wait time or a notify
+                // (although this is a rare thing). Therefore, we continue waiting
+                // until either a connectionID has been set (and hence a notify was
+                // made) or the total wait time has elapsed.
+                long waitTime = SmackConfiguration.getPacketReplyTimeout();
+                long start = System.currentTimeMillis();
+                while (connectionID == null && !done) {
+                    if (waitTime <= 0) {
+                        break;
+                    }
+                    connectionIDLock.wait(waitTime);
+                    waitTime -= System.currentTimeMillis() - start;
+                }
+            }
         }
     }
     catch (InterruptedException ie) { } @@ -252,8 +266,8 @@              for (int i=0; i<parser.getAttributeCount(); i++) {
                 if (parser.getAttributeName(i).equals("id")) {
                     // Save the connectionID and notify that we''ve gotten it.
-                    connectionID = parser.getAttributeValue(i);
                     synchronized(connectionIDLock) {
+                       connectionID = parser.getAttributeValue(i);
                        connectionIDLock.notifyAll();
                     }
                 }

-Matt