powered by Jive Software

Has anyone considered an easy implementation for scalability?

I was thinking about scaling a jabber server, in general today. The question came up on the jadmin list. Similarly to what I posted on the list:

What if we just kept an “activeConnection” table with a list of currently connected clients and the IP address/port/etc. where to find them? Implementation would be a fairly simple thing: all clustered servers would check to see if the packet’‘s destination is local, they would first check in memory if the client is connected to the same machine, if not they would check the active connection list in the database to see if the client is connected at all, if the client is connected to a different IP, just relay the packet through s2s like normal. It really wouldn’‘t be very difficult to implement. I’‘m no expert on the architecture, so there may be some issues I am unaware of. I don’'t think performance would be especially poor with intelligent caching, either.

Some issues:

  • if you don’'t want to allow connections from other jabber servers and disable s2s communication, this would not work

  • if a client disconnects and is sent to a different server, cached information could prove problematic

  • database access for every packet may lead to poor performance

  • if a cluster server goes down there is the potential for incorrect records in the database

If instead, we used something that wasn’'t exactly the same as “s2s” using jabberd terminology, but something like: “cluster2s” we could add functionality to inform the other servers in the cluster when a user logs off so they could reliably invalidate their cache information. With this functionality, servers could be located anywhere - making a very fault-tolerant system. If the cluster connection went down, all cache about that cluster node would be invalidated. I think we would have to include a way to ask the cluster node if they currently have the client connected since database information may not be reliable. Maybe not since the most recent connection would overwrite any stale ones in the activeConnection table.

Just my 2 cents on clustering.

kzantow,

Yep, what you’‘re proposing is similar to what we’'ve been thinking about for a clustering architecture. However, if you use a specific clustering technology, it can solve most the challenges you mention with using the database method you describe (as well as perform much better). For example, Coherence deals with all of the fault-tolerance and communication issues (http://www.tangosol.com).

Regards,

Matt

If you use a specific clustering technology you could hurt yourself down the road if the product becomes unsupported or company goes out of business… I know there was talk of JGroups before, but I didn’‘t find anything in this forum about it. Wouldn’‘t the best option be an interface that allows for pluggable clustering technology? That way Jive could sell a commercial version with Coherence and those of us who can’'t afford it could still benefit by clustering through JGroups.

I looked at the JGroups API to see what it offers, and it seems as though it may be robust enough to work for JM from my brief examination. A number of classes would need to implement the Serializable interface which do not right now. There would also have to be another layer for message delivery as well to utilize the technology.

Using a clustering technology like JGroups (and I assume Coherence, too) would probably mean a lot more architectural changes than the database method I mentioned above it seems.

Out of curiosity, have you ever looked at Spring (http://www.springframework.org) to help separate JM into a more easily pluggable architecture? It has a lot of nice features that could prove useful and would probably get rid of some of the things in that XML configuration file nicely.

Don’'t get me wrong, I think you guys have done great work so far. These are simply suggestions that may be helpful at some point.

i am a bit confused by the first post. can a JID of the user be associated with more than one server ?