I was thinking about scaling a jabber server, in general today. The question came up on the jadmin list. Similarly to what I posted on the list:
What if we just kept an “activeConnection” table with a list of currently connected clients and the IP address/port/etc. where to find them? Implementation would be a fairly simple thing: all clustered servers would check to see if the packet’‘s destination is local, they would first check in memory if the client is connected to the same machine, if not they would check the active connection list in the database to see if the client is connected at all, if the client is connected to a different IP, just relay the packet through s2s like normal. It really wouldn’‘t be very difficult to implement. I’‘m no expert on the architecture, so there may be some issues I am unaware of. I don’'t think performance would be especially poor with intelligent caching, either.
Some issues:
-
if you don’'t want to allow connections from other jabber servers and disable s2s communication, this would not work
-
if a client disconnects and is sent to a different server, cached information could prove problematic
-
database access for every packet may lead to poor performance
-
if a cluster server goes down there is the potential for incorrect records in the database
If instead, we used something that wasn’'t exactly the same as “s2s” using jabberd terminology, but something like: “cluster2s” we could add functionality to inform the other servers in the cluster when a user logs off so they could reliably invalidate their cache information. With this functionality, servers could be located anywhere - making a very fault-tolerant system. If the cluster connection went down, all cache about that cluster node would be invalidated. I think we would have to include a way to ask the cluster node if they currently have the client connected since database information may not be reliable. Maybe not since the most recent connection would overwrite any stale ones in the activeConnection table.
Just my 2 cents on clustering.