Wildfire with LDAP hangs + presence updates still not working

Hi,

we’'re having some major problems with wildfire here.

We’'re currently running version 2.5.1 under Suse SLES 4.1.1 with LDAP.

(LDAP Server is an openldap under Solaris 2.6 with about 500 users)

After starting, everything works more or less fine for some time.

At that time there are about 30 java-processes from wildfire.

Though, we are still having the problem with presence updates absolutely not working, even with 2.5.1.

When wildfire is running for some time, we noticed our LDAP-Server was hardly responding anymore which was caused by wildfire.

At that point, also connecting to wildfire via any messenger or login to the webinterface are also just stalling.

Also one can notice that at that time, there are about 90 wildfire java-processes, and it’‘s even not possible anymore, to stop wildfire with ‘’./wildfire stop’’.

The only suspicous thing in log-files is the following:

(though I cannot say, if wildfire will be stalled just after that happens)

2006.03.13 11:30:13 org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateHa ndler.java:141) Internal server error. Triggered by packet:

(Roster.java:86)

at org.jivesoftware.wildfire.roster.RosterManager.getRoster(RosterManager.java:90)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.broadcastUpdate(Presenc eUpdateHandler.java:257)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateH andler.java:96)

at org.jivesoftware.wildfire.handler.PresenceUpdateHandler.process(PresenceUpdateH andler.java:153)

at org.jivesoftware.wildfire.PresenceRouter.handle(PresenceRouter.java:92)

at org.jivesoftware.wildfire.PresenceRouter.route(PresenceRouter.java:61)

at org.jivesoftware.wildfire.PacketRouter.route(PacketRouter.java:73)

at org.jivesoftware.wildfire.net.SocketReader.processPresence(SocketReader.java:44 5)

at org.jivesoftware.wildfire.net.ClientSocketReader.processPresence(ClientSocketRe ader.java:56)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:242)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:119)

at java.lang.Thread.run(Unknown Source)[/code]

Hey Horst:

It would help in troubleshooting the problem if you could set the property:

/code

and also if you could post your current ldap setup?

Thanks,

Alex

Ah, ok, didn’'t think of debug-mode now.

Turned on now, and watching …

Here’'s my LDAP config already:

[/code]

Hey Horst,

One of the potential issues I see with your configuration is your groupNameField… are you sure it is “ou”? In my implementation is is “cn”, which generally is short for “common name”. Could you post some example group records so we could trouble shoot and see if your fields are correct?

Alex

Hi,

yes that is correct.

Initially we had no groups in our LDAP, or at least, the users group was only in one specific attribute on each user record.

User-records are at top-level in form:

“cn=Full Name”

As Wildfire was not able to retrieve the different groups from each users attribute, we had to make new top-level records for each group, where the attributes of the group list it’'s members.

Though, to separate users-records from the groups, we added the groups as:

“ou=groupname”.

This works as supposed, we have each group as a roster, listing the appropriate members.

And is there anything in your debug logs yet?

Alex

When I check wildfire this morning, It hung again (11.30 am)

(Neither connection via IM or Webinterface was possible).

The problem is, that I can’‘t determine since when it didn’'t respond anymore.

I didn’'t see any additional log messages from the debug-output except in stderror.log where it looks like it dumped all LDAP communication.

The last output in that file was from half an hour before I tried to connect.

debug.log is empty.

Last entry in warn.log:

2006.03.15 09:04:31 Cache: vcardCache – object with key username is too large to fit in cache. Size is 1537721

Last entry in error.log was from one day ago, and the same message as I posted above (different username though).

Is there any other possibility to get some more debug output ?

Can you post some more of that LDAP output that was dumped to stderror.log? Might be clues as to where the problem lies. Also, when you get a chance, can you start using the latest nightly builds? We fixed some LDAP bugs in there and it will help us narrow down where the problem is occuring.

Thanks,

Alex

One day after your post, I installed the nightly build 20060315, and basically the server was running without problems until now.

So way longer as the normal version.

Though, when I checked some stuff today again, I noticed I wasn’‘t using the search-plugin on that nightly-build because it wasn’'t included.

Maybe that was the reason, so I’'ll now test with normal 2.5.1 release without search-plugin again.

Thanks for your help.

BTW: The problem with status/presence updated was still there.

Concering that, I found a very interesting post in the “status/presence with LDAP rosters”-Thread and posted the things I noticed there.

After using 2.5.1 without the search-plugin for one week, it looked good, but just some hours later it hung again.

Though it lasted for quite a longer time than before.

In the changelog for version 2.6.0 I found several bugfixes which could be the cause of that problem.

Sadly, I can’‘t try it, because groups/roster don’'t work with 2.6.0 anymore.

As I wrote in the follwing thread:

http://www.jivesoftware.org/community/thread.jspa?threadID=16969

Groupmembernames have their last character cut of, thus the actuall users aren’'t found in the groups.

Hey Horst,

It really is quite unusual the error you are seeing. What is the complete contents of your error logs for 2.6.0… Also did you have LDAP debugging enabled on 2.6.0 when you were experiencing these problems? If not, could you enable it?

Thanks,

Alex

horst, Im suspect of your ldap setup a little… can you give an ldif output of an example user and group? You can get that using the ldapsearch command:

ldapsearch -h servername -x “(cn=username)”

ldapsearch -x servername -x “(ou=groupname)”

Typically OU’'s are used for containers (organizational unit) and not leaf objects. There is nothing preventing someone from using it for multiple things, of course, but the fact its non-typical makes me wonder if there is something else odd in there…

Hey,

actually that sounds like a real good idea as it’'s a bit tricky to describe the LDAP-structure.

All following output to refer to:

Username: jdoe00

Fullname: John Doe

Group: GRP

ldapsearch -x -D xxx -h xxx -b dc=domain,dc=org “(uid=jdoe00)”

  1. Doe John, domain.org

dn: cn=Doe John,dc=domain,dc=org

uid: jdoe00

cn: Doe John

ou: GRP

objectClass: person

ldapsearch -x -D xxx -h xxx -b dc=domain,dc=org “(ou=GRP)”

  1. GRP, domain.org

dn: ou=GRP,dc=domain,dc=org

objectClass: top

objectClass: groupOfUniqueNames

ou: GRP

cn: GRP

uniqueMember: uid=jdoe00[/code]

LDAP group configuration:

[/code]

Now with up to 2.5.1 wildfire will populate all groups with the appropriate users, 2.6.0 cuts the last character:

2006.04.11 00:02:58 [org.jivesoftware.wildfire.roster.Roster.(Roster.java:148)] Groups () include non-existent username (jdoe0)[/code]

That error message will occour as soon as someone connects to the server with an IM and there are no groups displayed.

It is repeated for every username.

It can also be seen on the webinterface.

Selecting any group, it will display the members as (the non-existing):

jdoe0@server.domain.org *

Thanks in advance,

Horst

I think I see the problem. Your “uniqueMember” field is neither “posixMode” or “non-posixMode” according to wildfire’'s definitions. Basicly, the uniqueMember field needs to be either:

a) posixMode: only the username field (jdoe00 in your case)

b) not posixMode: the full dn of the user (cn=Doe John,dc=domain,dc=org)

Anything else requires modifying the wildfire code. Since what you have is very non-typical, I would say the chances of it getting accepted into the main branch are slim- any chance you can change the way you organize things in LDAP?

Yeah, something like I supposed.

It’'s an other department here which manages the LDAP-Server.

I hope I can get them to change the attribute to the plain username, it will make things a lot of easier.

Though, I just wonder why and what was changed from 2.5.1 to 2.6.0 - as with 2.5.1 it still worked with that setup, not using posixMode but not 2.6.0

Anyway I’'ll stick with 2.5.1 for now.

One other note:

Just yesterday while playing around with 2.6.0 and ldap-configuration, I managed to fix the presence/status updates !

I’'m not 100% sure, but I think the cause was the following:

I only used no searchfilter for the groups.

As I used for groups:

Documentation says that if there’'s no filter, it selects all objects with have an attribute ‘‘ou’’.

But what I didn’'t realize: As you can see from my post above, not only the acutal group-objects have an attribute ‘‘ou’’ also the users have this attribute.

In practice, there was nothing wrong, only the real groups with their members where listed.

But it must be this, which lead wildfire that all users just had: “Subscription: to”

and therefore no presence-updates where sent to anyone.

After I created a group-searchfilter which now only selects the real groups by (objectClass=groupOfUniqueNames), all users now have: “Subscription: both” and the updates are correctly sent and working.

Maybe this can help others too, which have these problems.