The user collection is supposed to provide a lazy fetching container for user objects. The usual case where it makes sense to use lazy fetching is when we first query a collection of ‘‘keys’’ in one query and then the client can iterate over the keys and request some of the values. The assumption is that the number of actually used values is much less than the total number of entries.
In the case of org.jivesoftware.wildfire.user.UserCollection we have a collection of user names which on request fetch the user details. After that, the user details are stored in a UserManager.userCache, so they can be retrieved later without going back to the UserProvider. The users cache is never invalidated and there is no way to force cache purging short of restarting the server, which means that Wildfire assumes sole ownership of the database (something not uncommon for a Java application).
If we pwn the database and do such aggressive (not to say naive) caching, we could assume at least that the user queries are quite fast? …and we would be terribly wrong. As the UsersCollection does not define any methods for random access of the key, which means that we have to iterate over the collection to find the user we look for.
This lazy iterator sort of makes sense when you process some streaming response and throw away the data (e.g. aggregation query). The purpose is to reduce the footprint and make the application scale to huge amounts of data. It also does make sense when you have connection based protocol and you don’‘t want to bother pulling and storing the data until you need it. It definitely doesn’'t make sense when you have a connectionless protocol and you are going to cache all the data at the end of the first iteration.
LDAP is connectionless protocol. The connection is fast, but it’'s not free. Then you have the network roundtrip latencies. Add the processing time and multiply all by 4000.
Some might argue that the idea is that you don’‘t iterate over this collection and query only the users that you need. Surprise - there’'s no way to do this. UserCollection implements the collection interface and nothing else. This means that the first time I do an user search, I have to wait for 4000 connect/query/responses which means ~3mins.
As a bonus, if you change the user’‘s details they won’'t take power until you restart the server. Brilliant.
PS. I’‘ve implemented a workaround for this issue, loading the full user list as soon as the server starts up. This doesn’‘t fix the inefficiency though. What I’'d like to do is to have the LdapUserProvide return actual collection of users instead of UserCollection. I can submit a patch if the maintainers agree.