Clustering

I was just investigating clustering and I found this webpage: http://community.igniterealtime.org/docs/DOC-1471

It states that I need an enterprise.jar file but the plugins page that is linked to does not contain this file.

Also, I’ve read that clustering requires oracle Coherence but this doc doesn’t mention install that. Its this doc out of date?

Does anybody know waht liscense Coherence is distributed under?

The doc also states that if you are clustering you must use an eternal database. Is it possible to use postgres as the DB?

What is the internal datbase? I’m curious if there is a performance benefit from running on an external datbase on another server rather than the internal database.

The document is out of date. The clustering plugin is available for free while the needed Coherence Enterprise license must be obtained from Orcale - and it is quite expensive.

It should not matter which database is used.

The internal database is a HSQLDB (http://hsqldb.org/). Of course it is usually faster to use an internal database and for small installations it may be fine. But it does not scale and it keeps the data in the memory of the Openfire JVM.

Thanks!

Are there any other options for High Availability other than the clustering plugin? i.e. Can multiple servers use the same database?

I’m mainly concerned with the server crashing and loosing service for a period of time.

What is stored in the database?

What is stored in the database?

This depends, if you don’t use monitoring plugins and are not interested in MUC history then the users and vCard information is stored in it. Also some other settings but nothing which is really important as far as I know. A daily backup is enough to restore functionality.

Can multiple servers use the same database?

The servers should not be running together. If one server fails the other one should be started manually (or automatically if this is possible).

This is really helpful thanks.

If a message is received and there is no client listening for the recipient user, is the message queued up for the user? If so is this stored on disc? If the server crashes this message would be lost?

Offline messages may be stored in the database, this depends on the server setting (/offline-messages.jsp).

If the offline messages are stored users may be confused if you ever need to use a database backup as current offline messages are lost and old offline messages are sent again. Anyhow the chance of a database of hardware failures is very small.

I hope that you don’t have requirements for PITR or Master-Slave even though managing HA databases is a nice job.

We’ve ran Openfire a few different ways for better availability - In all cases we’ve used an external Oracle database, mostly because that is the database system we are most familiar with, and we are required to retain IM history long term.

  1. Run standalone Openfire with Pacemaker on Linux - This is an ‘active/passive’ method which restarts Openfire on a different box and moves an IP around to make sure the clients connect to the right place

  2. Clustered Openfire in front of a load balancer - Active/Active, more complex configuration.

In both cases we have connection managers running to support external users through the DMZ. Unless you have some crazy requirements, trying to get #2 working isn’t worth it - In general, you can capture a failure and restart it somwhere else within 10s, which is enough for most people.

Today we’re running #2 with an Oracle RAC cluster behind it. Probably overkill, but we already had the RAC infrastructure in place, so it was easy to implement.

I hope that you don’t have requirements for PITR or Master-Slave even though managing HA databases is a nice job.

What do you mean?

Can you explain active/active vs. active/passive? Or point me towards the docs?

Also, you speak of connection managers, are they part of XMPP or specific to open fire?

Pacemaker sounds interesting. I will look into it. Does it run on the same host as Openfire? If it doesn’t I gues it doesn’t help when the system crashes or looses connectivity.

Active/Passive is where you have two or more systems, but only one is actually running the application at any point in time - The other is sitting there as a hot-spare in the event something fails. Active/Active is when the app is running on both (or more) - This is the type of configuraiton when you use Coherence.

Connection Managers, at least in this context, are specific to Openfire. They connect on a non-standard port, and don’t work with other XMPP implementations. We use them in our DMZs as a XMPP proxy so we dont have to directly expose Openfire to the Internet.

Pacemaker is a cluster resource manager, so it runs on all systems which are part of the cluster. It stops/starts openfire for you and monitors nodes in the cluster.

Thanks. This is really helpful. Is there good documentation available for Connection Managers?

I think Pacemaker may be the solution we need and running in Active/Passive mode, we’d never be down for more than a few seconds. We’d loose any messages that are in flight when the first sever crashed.

We may also have the luxery of only allow our own clients to connect to the system (was thinking of writing a packet interceptor plugin but maybe connect manager is better approach). If we do that we can built fault tolerance/retry into our clients so thtat we get garanteed delivery. This would also allow us to enforce encryption.

http://community.igniterealtime.org/blogs/ignite/2012/09/23/introducing-hazelcas t-a-new-way-to-cluster-openfire may be interesting for you.