powered by Jive Software

Rant about UserCollection

The user collection is supposed to provide a lazy fetching container for user objects. The usual case where it makes sense to use lazy fetching is when we first query a collection of ‘‘keys’’ in one query and then the client can iterate over the keys and request some of the values. The assumption is that the number of actually used values is much less than the total number of entries.

In the case of org.jivesoftware.wildfire.user.UserCollection we have a collection of user names which on request fetch the user details. After that, the user details are stored in a UserManager.userCache, so they can be retrieved later without going back to the UserProvider. The users cache is never invalidated and there is no way to force cache purging short of restarting the server, which means that Wildfire assumes sole ownership of the database (something not uncommon for a Java application).

If we pwn the database and do such aggressive (not to say naive) caching, we could assume at least that the user queries are quite fast? …and we would be terribly wrong. As the UsersCollection does not define any methods for random access of the key, which means that we have to iterate over the collection to find the user we look for.

This lazy iterator sort of makes sense when you process some streaming response and throw away the data (e.g. aggregation query). The purpose is to reduce the footprint and make the application scale to huge amounts of data. It also does make sense when you have connection based protocol and you don’‘t want to bother pulling and storing the data until you need it. It definitely doesn’'t make sense when you have a connectionless protocol and you are going to cache all the data at the end of the first iteration.

LDAP is connectionless protocol. The connection is fast, but it’'s not free. Then you have the network roundtrip latencies. Add the processing time and multiply all by 4000.

Some might argue that the idea is that you don’‘t iterate over this collection and query only the users that you need. Surprise - there’'s no way to do this. UserCollection implements the collection interface and nothing else. This means that the first time I do an user search, I have to wait for 4000 connect/query/responses which means ~3mins.

As a bonus, if you change the user’‘s details they won’'t take power until you restart the server. Brilliant.

PS. I’‘ve implemented a workaround for this issue, loading the full user list as soon as the server starts up. This doesn’‘t fix the inefficiency though. What I’'d like to do is to have the LdapUserProvide return actual collection of users instead of UserCollection. I can submit a patch if the maintainers agree.

I don’'t quite understand… why are you itereating over the Collection to get a particular user? Why not call UserManager.getUser(String)?



The code in question is part of custom search plugin. The search criteria is a pattern string, which is converted to regular expression.

The LDAP search does not support regexp filters, that’'s why we need to do the filtering on the jabber server.

The query fetching all user data in one go takes ~7 seconds (compared to 4 minutes with the current approach). I was tempted to fork the LdapUserProvider, but finally we decided that it was not worth it as the slowdown is experienced only on server startup.

The problem with being unable to purge the caches still remains.

Btw, my point was not that there is no way to get a single user from the UsersCollection, but that there is no point in lazy loading the users. For comparison, consider an SQL ResultSet which runs a new query for every next row.