Clients disconnected after 30 minutes?

That version command has the option to send a version number as well as a version name, so in theory it could work. You’'d be dependant of the clients author though - that could get messy quick, more so if you add more clients with more versions to the equation.

Just to confirm, setting “xmpp.client.idle” to “-1” seems to work in as much as clients don’'t get disconnected any more. The downside of making that setting change is that ‘‘lost connections’’ may not be closed by the new feature. An alternative would be to set a different value for “xmpp.client.idle”. It is the timeout in milliseconds. The default is 1800000, i.e. 30 minutes, but you could set this to something longer, e.g. 12 hours = 43200000.

It seemed quite simple to me, but bear in mind I don’‘t know this project apart from a quick browse through the source in notepad, and I don’‘t know much about the protocol either. I won’'t let ignorance stand in my way though…

I am assuming that the server can ask the client for its version at any time, in which case what I had in mind was that if a client doesn’'t send heartbeats (no need to version check anything to know that), instead of disconnecting it you would first ask for its version, or do something else harmless that requires a response, and if it responds the connection stays alive, otherwise it gets dropped.

And even if you send a malformed or unsupported request, the client should return an error (and reset the idle timer that way). In various classes, I’‘ve found comment that states that sending stanza’‘s to the client doesn’‘t reset the idle timer, so what you propose should work. I’‘ve added some code based on your idea, and redeployed my Wildfire server. I’'ll update this post when I get some results.

Update:

Ok, it seems to work. I’‘ve edited the file src/java/org/jivesoftware/wildfire/net/SocketConnection.java by replacing the checkHealth method with the code below (I’‘ve made some format changes hoping that it won’'t scale the browser windows in this forum).

Before closing the connection, the code sends a stanza to the to-be-closed client, and waits for 60 seconds (configurable by editing the xmpp.client.luretimeout propertie) for a reply to that.

Note that I did this quick and dirty. A few things still have to be done:

  • I haven’'t checked if the stanza I send is actually correct (not that it matters that much - the code seems to work, so the client should at least return an error)

  • I haven’‘t checked how this code acts if the socket is actually closed (and that is after all what we’'re trying to detect).

  • I haven’'t been able to run ant test yet. (no junit)

  • Code has to be cleaned up a little (a few double calls)

  • A ‘‘good’’ default value for lureTimeout should be found. Sixty seconds won’'t work for a lot of http polling clients, for example.

Note that you need a new attribute for this to work:

private long lureTimestamp = 0;[/code]

void checkHealth() {

// Check that the sending operation is still active

if (writeStarted > -1 && System.currentTimeMillis() - writeStarted >

JiveGlobals.getIntProperty(“xmpp.session.sending-limit”, 60000)) {

// Close the socket

if (Log.isDebugEnabled()) {

Log.debug("Closing connection: " + this

  • " that started sending data at: "

  • new Date(writeStarted));

}

forceClose();

}

else {

// Check if the connection has been idle. A connection is

// considered idle if the client has not been receiving data for a

// period. Sending data to the client is not considered as activity.

if (idleTimeout > -1 && socketReader != null &&

System.currentTimeMillis() - socketReader.getLastActive() >

idleTimeout)

{

// First, try to lure the client in responding

if (this.lureTimestamp = System.currentTimeMillis() -

JiveGlobals.getIntProperty(“xmpp.client.luretimeout”, 60000))

{

// lured in last lureTimeout. Wait a bit longer

return;

}

else

{

// If the client didn’'t respond to the lure, close the socket

if (Log.isDebugEnabled()) {

Log.debug("Closing connection that has been idle: " + this);

}

forceClose();

}

}

}

}

/code

Message was edited by:

Guus

That looks good to me. I’‘m not set up to build this yet, but I’'ll have a play with that if I get time.

Does any one know if Gaim 2.0 (currently in beta) suffers from this problem?

I tried to test this for you, but Gaim 2.0.0beta1 crashes as soon as I try to connect to our server. Tried to connect on two different computers (both Windows XP though).

What version of Wildfire has the updated checkHealth method been implemented in? We are having this issue at our company and we would like to clear this up.

Thanks.

We’‘re also interested to know what version will have the updated checkHealth methods since we’'ve run into this problem at our company too.

As far as I’‘m aware this isn’‘t implemented in a release yet - I don’'t know if there are plans to do so. You can modify your own version with the code above, but you should be ok with the following workaround for now, it is working for me: (copied from my message above)…

Just to confirm, setting “xmpp.client.idle” to “-1” seems to work in as much as clients don’'t get disconnected any more. The downside of making that setting change is that ‘‘lost connections’’ may not be closed by the new feature. An alternative would be to set a different value for “xmpp.client.idle”. It is the timeout in milliseconds. The default is 1800000, i.e. 30 minutes, but you could set this to something longer, e.g. 12 hours = 43200000.

What file(s) do we need to modify to make Guus’’ changes above? is it possible to do this post install?>

The file that you need to change is a source code file, src/java/org/jivesoftware/wildfire/net/SocketConnection.java. This file needs to be compiled afterwards, so you can’'t do this post-install.

Has anyone opened a bug against wildfire about this? I’‘d really like to vote for it if so: please post it to the thread. I searched the bug database but couldn’'t find anything.

I’‘d like to propose that a solution such as the one Guus posted: if we get to 1 minute (say) before the idle time expires and no message has been received, then send a request to the client. If he responds before the time expires, then the counter will automatically get reset. If not, the connection is torn down. It seems to me that something like this should be implementable without much violence to the current idle timeout code, and it wouldn’‘t require any special knowledge of the client (no need to keep track of lists of clients which don’'t do heartbeats etc.) and for any client that did[/i] do heartbeats the code would never fire anyway (since it would never get that close to the idle timeout).

To be honest, I don’‘t feel this is a bug that should be opened on the Wildfire developers side. I think this is just something that needs to be supported in GAIM. From what I’‘ve seen, GAIM 1.5 and under are the only IM clients affected by this. I’‘ve been told by the users at our company that this is not a problem with the GAIM 2.0 beta. I can’‘t say for sure though since I haven’'t properly tested this.

A couple of days ago I talked to Matt about this problem. It’'s being looked at by Jive Software.

If it’‘s true that the XMPP specs require that all clients perform heartbeats, so that we can say GAIM is not complying with the XMPP standard, then I’‘ll agree with you that it doesn’'t need to be fixed in Wildfire (although for the reasons below it still might be a good idea).

If, however, heartbeats by clients is not required by XMPP then Wildfire needs to do something to support those clients, and not just disconnect them arbitrarily. The method I described, I believe, is a good compromise that is pretty straightforward to implement and doesn’‘t need much change in the Wildfire server, and doesn’‘t impact clients that DO support heartbeat at all. I personally don’'t think you can tell admins that if they anticipate any GAIM clients <2.0 to connect to their server the only way they can avoid idle timeouts is by disabling it for the entire server.

You can say that GAIM is the only one that doesn’‘t support it but first, GAIM is one of the most popular XMPP clients so it can’'t be ignored, and second a version of GAIM that does support it is not generally available… and for enterprise deployments that use GAIM (such as Red Hat etc.) it will be even longer before they include it, and longer still before most users are upgraded.

Cool thanks Guus. Is there a bug or enhancement request for it we can vote for?

Hey guys,

Sorry for replying too late to this thread but I was away for 2 weeks and the forum activity is very high. So one problem we had to solve was that under some rare circumstances the server may write some data on a socket and never return from the writing method. This seems to be a problem with the TCP layer that is not throwing an exception so the JVM never realizes that the socket is dead and keeps waiting for an ACK from the non-existent client. Since the ACK never arrives the TCP layer never raises an error. To fix this problem we added the #checkHealth method in SocketConnection.

Therefore, if we add to the idle checking the logic of sending some data to a client that has been “quiet” for a while we may potentially freeze the “health checking” thread forever. A couple of option we can think are:

  1. Instead of having one thread that a) checks socket health and b) closes idle clients, we may have 2 threads in total (one for each activity). And the thread that closes idle clients will have Guus’'s patch.

  2. Create a new thread that monitors the “health checking” thread. And another monitor for the monitor and another monitor for the monitor, etc. until we run out of memory…[/i] (that was a joke guys)

  3. For those installations that are using clients with no heartbeat then disable this feature.

  4. Add heartbeat support to the clients that do not support it.

Since I like to provide a solution that works for everyone and not kick the problem to someone else I vote for discarding options 3 and 4. Option 2 looks like the beginning of a complex solution but it may be good in case that the “health checking” thread starts to have more checkings. Option 1 sounds like a clean solution though it may not scale if we ever need to add more checkings besides the 2 we currently have.

So what are your ideas? Any other option? Any other idea or feedback?

IMHO, option 1 is the right choise based on the current requirements.

Regards,

– Gato

Gato,

One point to add is that it might be worth seeing how many clients are affected by this. If it’'s a very small number (really only GAIM), then it might worth contacting the GAIM devs to see if they can easily patch their code.

-Matt

Well, it seems to me that ultimately the problem is there’‘s a bug in Java where it’'s not properly managing sockets!!

Assuming you can’'t get that fixed, then I agree #1 seems like the best option all around.