I think I’ve just found a bug in org.jivesoftware.openfire.ldap.LdapManager - the property ldap.readTimeout is documented at https://www.igniterealtime.org/builds/openfire/docs/latest/documentation/ldap-gu ide.html as “The value of this property is the string representation of an integer representing the read timeout in milliseconds for LDAP operations.”
Checking the code at http://fisheye.igniterealtime.org/browse/openfire/trunk/src/java/org/jivesoftwar e/openfire/ldap/LdapManager.java?r=13754 you can see this being applied to the JiveInitialLdapContext environment at line ~656 in the checkAuthentication() method.
However, it is not being applied at all to the getContext() method that starts at line 480.
This means that any LDAP operation that uses getContext() (most of them?) does not have a timeout specified - according to the LDAP Guide listed above, if “no read timeout is specified which is equivalent to waiting for the response infinitely until it is received”.
We’re seeing a problem where our Active Directory LDAP server is sometimes failing to respond, and once that happens everything locks up indefinitely, waiting for a reply (*). My suspicion is that the timeout not being applied is the cause of the lock up (though the AD is not responding is I guess the root cause, but I have no control over that).
(*) A thread dump when this happens shows one thread is at com.sun.jndi.ldap.Connection.readReply(Connection.java:467) - this thread is checking the members of a group, and has locked the name of the group at org.jivesoftware.openfire.group.GroupManager.getGroup(GroupManager.java:326). All the other stuck threads are waiting on the lock of the group name before proceeding, and so are also blocked.
Have I missed anything?