Gaim hangs on disconnect

Ok, Ive done more debugging, and some learning about TLS (the GNUTLS library, specificly). Gaim is waiting for a reply from a closed socket connection. Im not yet ready to say whos fault that is (Wildfire? Java? Gaim? GnuTLS?)

In the Gaim backtrace, we are getting hung up on the gnutls library call to gnutls_bye(…), which has the following documentation:

/**

  • gnutls_bye - This function terminates the current TLS/SSL connection.

  • @session: is a &gnutls_session structure.

  • @how: is an integer

  • Terminates the current TLS/SSL connection. The connection should

  • have been initiated using gnutls_handshake().

  • @how should be one of GNUTLS_SHUT_RDWR, GNUTLS_SHUT_WR.

  • In case of GNUTLS_SHUT_RDWR then the TLS connection gets terminated and

  • further receives and sends will be disallowed. If the return

  • value is zero you may continue using the connection.

  • GNUTLS_SHUT_RDWR actually sends an alert containing a close request

  • and waits for the peer to reply with the same message.

  • In case of GNUTLS_SHUT_WR then the TLS connection gets terminated and

  • further sends will be disallowed. In order to reuse the connection

  • you should wait for an EOF from the peer.

  • GNUTLS_SHUT_WR sends an alert containing a close request.

  • This function may also return GNUTLS_E_AGAIN or GNUTLS_E_INTERRUPTED; cf.

  • gnutls_record_get_direction().

**/

/code

This seemed interesting, so I did some debugging of the session itself. Here is an ssldump of a normal SSL connection closing:

1 12 14.2167 (0.0007) S>CV3.0(32) application_data

1 13 14.2167 (0.0000) S>CV3.0(32) application_data

1 14 30.6220 (16.4052) C>SV3.0(32) Alert

1 15 30.6224 (0.0004) S>CV3.0(32) Alert

1 30.6244 (0.0020) S>C TCP FIN

1 30.6250 (0.0005) C>S TCP FIN

/code

Notice than an alert gets sent from the client to the server, then from the server to the client, THEN the connection gets closed.

Now, here it is with Gaim and Wildfire:

1 30 1.4051 (0.0008) S>CV3.1(288) application_data

1 31 5.5092 (4.1041) C>SV3.1(320) application_data

1 32 5.5098 (0.0005) C>SV3.1(176) application_data

1 33 5.5098 (0.0000) C>SV3.1(96) Alert

/code

The client sends its Alert, then nothing. The client is waiting for a reply. Now, Gaim shouldnt wait forever, thats bad. I dont know if thats Gaim or GnuTLS, but the client should be able to tollerate this. On the server side, however, Wildfire should be completing this conversation.

In the course of debugging this, I also noticed that if you fire up two Gaim clients, connect with one, then try to disconnect (gaim hangs), then connect with the other the “hung” client cleans itself up. No network traffic from the hung client, just the new session from the second client. The second client dosnt have to be gaim, and dosnt have to have the same resource, just any new connection for the same JID it seems.

RFC 2246 Describes the TLS protocol, and section 7.2.1 sates the following:

The client and the server MUST share knowledge that the connection is

ending in order to avoid a truncation attack. Either party may

initiate the exchange of closing messages.

close_notify

This message notifies the recipient that the sender will not send

any more messages on this connection. The session becomes

unresumable if any connection is terminated without proper

close_notify messages with level equal to warning.

Either party may initiate a close by sending a close_notify alert.

Any data received after a closure alert is ignored.

Each party is required to send a close_notify alert before closing

the write side of the connection. It is required that the other party

respond with a close_notify alert of its own and close down the

connection immediately, discarding any pending writes. It is not

required for the initiator of the close to wait for the responding

close_notify alert before closing the read side of the connection.

If the application protocol using TLS provides that any data may be

carried over the underlying transport after the TLS connection is

closed, the TLS implementation must receive the responding

close_notify alert before indicating to the application layer that

the TLS connection has ended. If the application protocol will not

transfer any additional data, but will only close the underlying

transport connection, then the implementation may choose to close the

transport without waiting for the responding close_notify. No part of

this standard should be taken to dictate the manner in which a usage

profile for TLS manages its data transport, including when

connections are opened or closed.

NB: It is assumed that closing a connection reliably delivers

pending data before destroying the transport.

/quote

What Im getting then Wildfire (or Java?) is wrong. Since Gaim sent a close_notify alert, Wildfire MUST respond with one. Gaim is not required to wait for it, though; so a hack would be to have Gaim not block while waiting for the reply. The Gaim developers told me this is going to happen in gaim2.0.0b3 (not released yet). However, this is only a hack since Wildfire/Java is still not following the protocol spec.

So, whats going on? Well, after I “disconnect” in gaim, Java recieves the close_notify alert, and thinks it sent the alert back according to the debug logs:

Client SR - 8820986, RECV TLSv1 ALERT: warning, close_notify

Client SR - 8820986, closeInboundInternal()

Client SR - 8820986, closeOutboundInternal()

Client SR - 8820986, SEND TLSv1 ALERT: warning, description = close_notify

Padded plaintext before ENCRYPTION: len = 32

0000: 01 00 27 71 53 4F DF A9 0F 1F E2 8A 81 D2 54 7F …’'qSO…T.

0010: 3E 3F 76 D4 95 69 09 09 09 09 09 09 09 09 09 09 >?v…i…

Client SR - 8820986, WRITE: TLSv1 Alert, length = 32

Finalizer, called close()

Finalizer, called closeInternal(true)

Finalizer, called close()

Finalizer, called closeInternal(true)

Finalizer, called close()

Finalizer, called closeInternal(true)

/code (The last two lines get repeated ad infinitum)

But netstat still shows the TCP connection as established (both Gaim and Java). The ssldump in the previous post supports that, too. Ive been trying to track down in the WiFi code when the socket actully gets closed, maybe a developer can take a look and see if the logic there is missing something?

just to follow up, here’'s a patch to gaim 1.5 that fixes the hang by timing out if gnutls waits too long for a response:

— gaim-1.5.0-clean/plugins/ssl/ssl-gnutls.c 2004-09-03 14:34:16.000000000 -0700

+++ gaim-1.5.0/plugins/ssl/ssl-gnutls.c 2006-03-24 15:52:50.000000000 -0800

@@ -131,6 +131,12 @@

if(!gnutls_data)

return;

  •   int sd = (int) gnutls_transport_get_ptr(gnutls_data->session);
    
  •   struct timeval tv = ;
    
  •   tv.tv_sec = 3; /* wait at most 3 seconds */
    
  •   setsockopt(sd, SOL_SOCKET, SO_RCVTIMEO, (char *) &tv, sizeof tv);
    

gnutls_bye(gnutls_data->session, GNUTLS_SHUT_RDWR);

gnutls_deinit(gnutls_data->session);

/code

If this is a server side issue I would really like to

fix it. I’'m running Wildfire under WinXP and Fedora

Core 4 using TLS and Gaim 1.5.0 as the client and the

problem does not happen. I would appreciate if you

can give me some specific steps to be able to

reproduce the problem.

I figured out why using Fedora Core 4 dosnt show the problem. Gaim used to have support for the old Netscape SSL libraries, and Fedora still compiles with them (instead of GnuTLS). The Netscape libs had some odd licence with them, so Debian (and many others) wont use it. I dont know what the Windows build uses, but Im guessing that its not GnuTLS.

I do want to reiterate that GnuTLS is doing nothing wrong here. The server side (Java? Wildfire?) is not terminating the connection properly. If there are any Java programmers willing to dig into this, I can provide all sorts of details. But Im afraid Im just not that good of a Java programmer to do this myself.

I think the fix is going to be two-part:

  1. TLSStreamReader.read needs to check if lastStatus is CLOSED and not attempt an underlying socket read. I’'ve tried a variant on this and it fixes the practical problem of Wildfire and Gaim going into read deadlock. However, Wildfire still erroneously fails to send its own close notification according to a packet trace; it simply closes the TCP connection.

  2. To make Wildfire compliant with the TLS spec, the enclosing logic (BlockingReadingMode.run, perhaps?) needs to cause tlsEngine.closeOutbound to be invoked somehow.

Here’‘s the simplest patch I can come up with to recognize and handle incoming TLS-level close alerts (which is the main issue here). The patch won’'t make it through the forum software very well, but people may be able to apply it by hand.

Index: TLSStreamReader.java

===================================================================

— TLSStreamReader.java (revision 5505)

+++ TLSStreamReader.java (working copy)

@@ -81,6 +81,10 @@

// set the position of inAppBB to 0 to process it.

inAppBB.flip();

}

  •        else if (lastStatus == TLSStatus.CLOSED) {
    
  •            inAppBB.flip();
    
  •            rbc.close();
    
  •        }
    

else {

// Some data in inNetBB was not decrypted since it is not complete. A

// bufferunderflow was detected since the TLS packet is not complete to be