powered by Jive Software

LDAP search errors

Twice in the last week after wildfire has been up for a few days - I’‘ve had a situation where authentication attempts to my LDAP server result in a “SEARCH RESULT tag=101 err=80” response to the SRCH request made by wildfire on an authentication. It’‘s like the LDAP server just stops listening to wildfire SRCH requests. (the LDAP server in question is setup with backend=sql, and it doesn’'t make any DB queries either before blindly returning err=80).

I think this wasn’‘t wildfire’‘s fault, because the error comes from the LDAP server, and the SRCH request looks great, and slapd isn’'t doing a db query.

However, the LDAP server is happily handling other requests. And restaring wildfire clears things up. I’‘ve managed to get ethereal dumps (which is why I said the SRCH requests look great) - but I can’'t get additional debugging information without restarting wildfire and/or ldap and that clears up the problem (and the debug logs generate too much data for me to leave it running for 4 days to wait for the problem to happen again).

My current pet theory is that this has to do with the connection pooling. In both instances where the LDAP server starting handing out err=80’'s. it was early in the morning after hours of no connection attempts (like 9 or so hours between OPs on the connection handle) - and for whatever reason, the connection is actually invalid but java is using it anyway.

Is this a reasonable pet theory? I’‘ve turned off connection pooling for now, and I guess if it merrily keeps authenticating for the next week-2 weeks, I’'ll chalk it up to being doubtfully right. And then I might want to re-enable pooling but with a timeout.

I’‘ve read: http://java.sun.com/products/jndi/tutorial/ldap/connect/config.html as linked from the wildfire LDAP guide - but I’'m a java idiot. (love wildfire, but dang I hate java - where do I set that timeout property? In the INSTALL4J_ADD_VM_PARAMS= in bin/wildfire?

So I guess this is two questions in the rambly post:

  1. Could connection pooling with no timeouts be causing odd LDAP errors after long idle periods between SRCH requests?

  2. If so, where does the java idiot go to set the timeout parameter?