A corrupt user?

I use the embedded database authenticating to LDAP. I have one user who previously was able to login and can no longer login. Interestingly if try to bring up the user details url for him, like:

https://wildfireserver:9091/user-properties.jsp?username=joeuser

The page never loads. If I do the same with another user, comes right up, with an invalid user I get the message that the user was not found.

I’‘ve tested the ldap connection for this user with ldapsearch. The account my Wildfire server uses can bind and find this user no problem. I setup another Wildfire server on my desktop with the same LDAP configuration and this server can find the user. So now I’'m wondering if this user is perhaps corrupt in the database somehow? What would be a good way to approach correcting this so the user can login again?

EDIT another data point. I also noticed this user appears in the session list with a Status of ‘‘Closed’’ and Presence of ‘‘Online’’ If I delete his connection then refresh he is still there with the same status. If I add him to my contact list he appears as Online to me via Spark as well.

Message was edited by: St0nkingByte

Well I rebooted my Wildfire server and this cleared up the problem for the one user. Unfortunately now with only 6 days of uptime I’'ve got another user who is in the same state.

Hi,

do you see errors in the log files when loading the user page hangs?

You could try to clear the User and the vCard cache on :9090/system-cache.jsp.

LG

I tried clearing the User and vCard cache to no avail. Then I cleared all the caches, still no joy. I turned on debug logging and hit status on the user’'s session. It will stay ‘‘loading’’ forever.

I’‘ve turned on debug logging but I’'m not seeing anything obvious sticking out for this user. Unfortunately I have ~750 concurrent users right now so its a little tricky to pick out what might be happening for this specific admin console action.

Hi,

it may help to get a thread dump (see http://www.adaptj.com/root/main/tracehowtos ) when little users are connected and while you try to load the page.

Wildfire’'s HSQLDB logs the activity in embedded-db/wildfire.log file - do you see there something unusual?

“grep username embedded-db/wildfire.script” # may return one line, not sure if it does as you’'re using LDAP

“grep username embedded-db/wildfire.log” # may return n lines?

LG

I can’‘t use that particular thread dump utility on the server in question as it is RedHat Linux with no XServer or any kind of desktop (headless in a datacenter). Is there another way to dump some useful information from a running server that isn’'t already in debug mode?

I checked the wildfire.script and wildfire.log for things mentioning this user, there is a small delta (48 vs 40) lines between the two but the system has been restarted since this user existed so I suspect that is normal. Otherwise I can’‘t spot anything odd about this user vs everyone else other than he’‘s perpetually connected but ‘‘closed’’ and can’‘t login again since this started. Obviously restarting the service will solve the problem but with so many users I don’'t want to be restarting all the time to fix randomly broken users.

I’'ve generated a bunch of java dumps, which can be found here:

http://st0nkingbyte.whizy.com/files/wildfire_dumps.gz

In each case the dump happens 10 seconds after I click the user session info link from the list of sessions. For the last one I waited 30 seconds.

interesting lock: 0x76020178

“SunJsseListener1-98” prio=1 tid=0x54d06750 nid=0x2589 waiting for monitor entry

at org.jivesoftware.wildfire.user.UserManager.getUser(UserManager.java:183)

  • waiting to lock <0x76020178> (a java.lang.String)

at org.jivesoftware.wildfire.user.UserManager.isRegisteredUser(UserManager.java:32 9)

at org.jivesoftware.wildfire.admin.session_002ddetails_jsp._jspService(session_002 ddetails_jsp.java:105)

“pool-7-thread-131” prio=1 tid=0x5fd25af0 nid=0x6dbe waiting for monitor entry

at org.jivesoftware.wildfire.privacy.PrivacyListManager.getDefaultPrivacyList(Priv acyListManager.java:102)

  • waiting to lock <0x76020178> (a java.lang.String)

at org.jivesoftware.wildfire.roster.Roster.broadcastPresence(Roster.java:572)

“pool-6-thread-125” prio=1 tid=0x6f606c30 nid=0x673c in Object.wait()

at java.lang.Object.wait(Native Method)

at com.sun.jndi.ldap.Connection.readReply(Unknown Source)

  • locked <0x7a1946f0> (a com.sun.jndi.ldap.LdapRequest)
    at com.sun.jndi.ldap.LdapClient.getSearchReply(Unknown Source)

    at org.jivesoftware.wildfire.ldap.LdapUserProvider.loadUser(LdapUserProvider.java: 77)
    at org.jivesoftware.wildfire.user.UserManager.getUser(UserManager.java:186)
  • locked <0x76020178> (a java.lang.String)

at org.jivesoftware.wildfire.user.UserManager.isRegisteredUser(UserManager.java:30 9)

at org.jivesoftware.wildfire.spi.PresenceManagerImpl.userAvailable(PresenceManager Impl.java:145)

if (user == null) {
            synchronized (username.intern()) { // *183* locked <0x76020178> (a java.lang.String)
                user = userCache.get(username);
                if (user == null) {
                    user = provider.loadUser(username); // *186* UserManager.java:186
                    userCache.put(username, user);
                }
            }
        }

So it seems that loading one user does never return and thus you get a “deadlock”, Roster updates should fail etc. as they also must wait. I assume one could change the synchronized block to something like

synchronized (username.intern()) { userCache.put(username, user); };

without breaking anything but Gato should have a much better solution.

The problem that the LDAP connection never returns must also be fixed.

LG

http://java.sun.com/docs/books/tutorial/jndi/newstuff/readtimeout.html - with JRE 1.6 one can specify

com.sun.jndi.ldap.read.timeout - so if you want to modify LdapManager.java (line 428, env.put(“com.sun.jndi.ldap.read.timeout”, “5000”); ) and compile Wildfire with Java 1.5 and then run it with 1.6 you may get an early solution, Wildfire 3.2 (estimated for 31.1.2007) should support this setting.

LG

In Wildfire 3.2 beta can I enable the LDAP connection timeout by simply adding a

*com.sun.jndi.ldap.read.timeout *

property with a value of 5000 on the System Properties page?

Hey St0nkingByte,

I filed JM-941 and checked in a fix. You can find it in the next nightly build version.

Regards,

– Gato

Hi,

I assume that -Dcom.sun.jndi.ldap.read.timeout=5000 along with the -Xms and -Xmx values is the better place.

LG

Actually, I added a new LDAP property so that all LDAP settings are grouped in one place. The new LDAP property is ldap.readTimeout.

Thanks,

– Gato

Hi Gato,

what did you do to fix it? I really like the issues without any Subversion Commit … I’'d expect that you did change a documentation.html page and something in a java class.

LG

Edited: I should lock the thread while posting so Gato can not post something while I do (;

Hey LG,

This is what I did: http://www.jivesoftware.org/fisheye/changelog/svn-org/?cs=6748

I did change the documentation and a Java class. However, I’'m not using a JVM parameter to pass the value but reading it from the place where we are storing the LDAP properties (aka conf/wildfire.xml).

Thanks,

– Gato

Sweet, I’'m looking forward to it.

Thanks!