Failover Possibilities?

Bruce_Peck · October 5, 2008, 3:18am

We’ve been testing/piloting Openfire and Spark in our corporate environment for possible company wide implementation. So far so good, but our current setup is a simple one using the embedded database and authentication against our AD environment. For a production implementation we really would like to have some redundancy on the server side. I’ve searched through the threads here and found good information, but I’m still a bit confused about what our options are.

From what I read here it seems clustering is not a viable option unless there is some change in licensing (someone please correct me if I am wrong on that). I would be happy to have an active/passive server situation behind hardware load balancers that we already use on our network, but the two posts I note below make it unclear whether or not we could accomplish that.

I was hopeful when I read this text in this thread:

*"To have fail over I would recommend having 2 Openfire servers where one is active and the other *

*one is just sitting there. You may put a load balancer in front or something that will detect when *

*a server goes down and then redirect the traffic to the other server.
Note that in XMPP the connections are long lived. That means that if an XMPP server goes down then *

*clients will need to reconnect. This means that when a server goes down clients will notice it. *

*Smart clients may reconnect and hide that fact to users but still there is a reconnection since *

the TCP connection went down."

We would accept a situation where if the primary server failed all connections would drop and users would have to re-connect and authenticate (presumably to the secondary server that the load balancers would now direct the clients to). My assumption was that we could use a remote database for each server (MSSQL in our case as we have a HA MSSQL “farm” used for many other applications) and replicate data between the databases.

But then I read this:

*"Every Openfire server still needs it’s local database which stores the server name *

(east.example.com and west.example.com) and all other things like MUC information."

Assuming a failover situation described above with sync’d remote databases, I am unclear whether or not the information available to users on the primary server will be available to users once they connect to the secondary server. It appears that some information will be in the remote SQL database but some information will still be contained locally on the server (??)

Can someone help me get a clear understanding of this? I would love to hear from a corporate type who has done something like this especially with layer seven type load balance/failover devices.

Bruce_Peck · October 5, 2008, 12:36pm

Hmmm. . . after a good nights sleep I realize that I may not have been clear on my question. I wrote the original post late at night after a long hard day so pardon me if I didn’t get to the point.

My assumption is that I can set up two Openfire servers each pointing to an external MSSQL database with the databases being synchonized (we do this frequently with other applications). We would front end the servers with our Nortel Alteon layer-seven switches which in this case would be configured to put the servers in an active/passive situation and present a virtual IP (and related DNS entry) to the clients. The Alteons would be set up to do various “health checks” on the primary server and when a problem is detected would direct the clients to the real IP of the secondary server. When the primary server fails, all connections would be severed and users would have to log back in and would now be on the secondary server.

So the question is about the database information and the related functions and how that would affect users in the failover. I looked but don’t see a document that maps out the data written to the database environment - a “data dictionary” if you will.

A) If an external database is used, what data is written to the external database and what data is always maintained on the server itself?

B) Assuming there is information that is always maintained on the server, what user functions might not work properly in the above failover situation since that information could be different on the two servers? If we know what this might be, we might be able to live with it in a short term failover situation and handle it with user education and announcements when such a situation occurs. As long as basic chat functions are maintained with user rosters intact, etc. I think we would be fine.

I hope that clarifies my question a bit. Thanks for any help provided.