OpenFire Geo-Redundancy

Jimmy802 · May 28, 2019, 1:45pm

We want to add geo-redundancy in our Openfire cluster with the use of F5

This means separating the cluster into 2 cities with a 15ms one-way latency.

Will Hazelcast work with such latency? Is there some way to keep the nodes in cluster in loose sync?

I have read: https://docs.bmc.com/docs/smartit13/deploying-openfire-nodes-in-a-cluster-614647678.html

Thanks!

gdt · May 28, 2019, 3:08pm

You’ve really got three options;

Use Hazelcast WAN replication. The community edition of Hazelcast IMDG doesn’t support WAN replication. As such, Neither Openfire nor the Openfire Hazelcast plugin are designed/tested with that in mind. However, it might work if you pay for the enterprise edition and configure it appropriately. https://docs.hazelcast.org/docs/3.12/manual/html-single/index.html#wan
Use regular LAN based replication over the WAN. Given you’ve got relatively low latency, you may find you are OK, but I wouldn’t be surprised if there were some performance issues.
Have two distinct clusters, one in each location, and use XMPP federation / server to server to link the two. This is probably your best bet, and the way the system is designed to work.

Note that the clustering docs you’ve found are a bit out of date, I’d suggest starting at https://www.igniterealtime.org/projects/openfire/plugins/2.4.2/hazelcast/readme.html

Greg

Jimmy802 · May 28, 2019, 7:23pm

Thanks you so much!!

But, in a federation server/server, how can a user be automatically switched to the backup server with the same rooms etc etc present?

gdt · May 29, 2019, 9:01am

“server to server” could also be thought of as “cluster-to-cluster”. So you’d have a cluster in site-A that users in site-A would log in to (user1@xmpp.site-a.example.com), and a cluster in site-B, users in site-B would use (user2@xmpp.site-b.example.com). Site-A users could communicate with site-B users via the federated link. As each site has a cluster, you’re protected from a single failure, if you have a disaster-recovery scenario and site-A is wiped out somehow, your users at site-A would need to manually switch to log in to site-B.

If you want truly seamless (to the users) geographic redundancy you’ll need a WAN based cluster, i.e. one of the first two options I offered.

Greg

Jimmy802 · May 29, 2019, 3:10pm

Thanks Greg!

Is there no medium way between the 2 extremes of Hazelcast memory sync and 2 separate clusters??

Can we have loose sync between 2 side of a geo-redundant cluster (ie. SQL sync). If one half fails, I don’t expect all chat logs etc to be maintained, just rooms with the same name etc

gdt · May 29, 2019, 4:15pm

No, no happy medium unfortunately.

FWIW, we do have a customer who went for option 2; two geographically separate nodes in a single cluster. It mostly works, but they do complain from time to time of a split-brain when the WAN has problems. At which time we remind them that we recommended not doing what they are doing.

That said, the most recent Hazelcast plugin should recover from split brain better than before, so you may find option 2 is “good enough” for you, depending on your exact use-case. You will have to ensure that you disable the multicast cluster discovery and enable TCP/IP based cluster config, with appropriate additional configuration, as multi-cast will not work across the WAN.

Greg

speedy · May 29, 2019, 6:47pm

how is your wan connected? you may be able to minimize your latency by implementing QoS between the nodes…

Jimmy802 · August 27, 2020, 1:06pm

Has Openfire been tested with the Enterprise edition of Halzelcast?? thx!