Openfire Clustering Technologies with Gato

Learn more about the clustering technologies and approach used in upcoming versions of Openfire Enterprise from Gato.

Or download the Quicktime movie version (caution: 376MB download)

Hey, that’s a great overview of the internals of the new clustering features!

For stateful plugins you said those plugins have to provide their own clustering stuff. Will it be possible to reuse the clustering infrastructure used by Openfire for this?

Hey Stefan,

Sure. However not all stateful plugins need to be cluster aware. Plugins that have an internal state that is quickly persisted to the database or that are not critical if they are lost may not need to be cluster aware. On the other side, components that need to gather information from several cluster nodes are the hardest ones. An example of the latter case would be chat archiving in Openfire Enterprise. In that example the component needs to keep track of conversations between users hosted by different nodes and at the same time the archived conversation needs to not live in a single place (until they are persisted in the DB). Anyway, components that are internal can make use of the clustering framework (i.e. high level API built by us) to be cluster-ready.

– Gato

Thanks for the insights.

I am currently thinking about how this translates to Asterisk-IM

Configuration data can easily shared between nodes through the database though we have to be careful about cache synchronization or invalidation. A clusterwide cache flush in the event of a config change would do it given the fact that config changes do not occur frequently.

Determining which node should trigger presence changes and update the queue status of a user is a bit trickier as they should occur on exactly one node in the cluster. This sounds a bit like the challenge with the archiving plugin where exactly one node should persist a conversation. Do you already have a solution for that?

Hey Stefan,

I’m not totally familiarized with the Asterisk-IM but this is my shot. Caches will be automatically replicated across the cluster nodes. So no worries about that. The plugin uses packet interceptors to trap presences and cache them if the user is on the phone. Those packet interceptors need to be running in all cluster nodes and the trapper presences need to be replicated. I would suggest keeping them in a cache with unlimited size and no expiration. We then have the part that connects to Asterisk. I would suggest having only one cluster node connecting to the Asterisk server. That node would be the senior cluster member. You can track when a new node becomes the senior cluster member and connect that node to the Asterisk server.

– Gato

Gato,

the concept of a senior cluster member that is responsible for the Asterisk connection sounds very promising.

Any pointer where to look to get things started?

=Stefan

Hey Stefan,

Check out the ClusterEventListener interface that you may want to implement.

– Gato

Hi Gato,

Thanks for the oppretunity to look on the next version features.

Are you also going to publish a performance report for this clusteting feature?

Hey Gal,

Yes, we are still running more regression tests and load tests. As part of the load test we are collecting statistics and going to make them public. BTW, in a single JVM (i.e. cluster node) we got 112K concurrent users and stopped there because the load client test ran out of users.

– Gato
JingleNodesPlugin.java (5314 Bytes)

wow.
LdapGroupProvider.java (4243 Bytes)

It will be great if you can also specify what DB you are using during these tests (external such as MySQL or embedded)and what is the system (hardware) components (e.g. clients in external PC or on the same server etc.)

Good luck.

Hey Gal,

That is a good point that I forgot to mention. When running in cluster mode you cannot use the embedded DB since it won’t be shared by all cluster nodes. You have to use an external database when running in cluster node.

For our tests we were using MySQL but since we were not pushing the DB a lot (that is usually the case with XMPP servers) you will be fine by using any good (and well known) DB (e.g. Oracle, MySQL, PostgreSQL, etc.)

– Gato

Thanks for this insightful video.

In the scenario of a stateful plug-in, let’s assume it needs to hold some large set of data about many users and that using a SQL database would be too slow, would you recommend replicating the data across the cluster nodes or “outsourcing” the shared data mechanism to an external server(s), and communicating using RMI or something similar?

Thanks.

Gato,

Will there be a clustered connection manager included with this? For example, if we’re connected to aol (aim-xmpp) which is a statefull connection, with Openfire Cluster, will the connection be maintained by the connection managers?

Gato,

Another question in a similar vein. If we’re using a hardware layer7 load balancer to maintain connections for aim-xmpp and the connection manager is clustered, if it fails and another connection manager instance takes over will it be able to take over the already open connection?