Hazelcast Clustering and 2 Nodes

I think those heartbeats can only be useful when the whole NLB node server goes offline (Windows crash, etc.). But it would still send heartbeats if only Openfire application fails, but the NLB host itself would be running.

Here’s the problem with NLB, from the FAQ:

**
Q.** Can NLB Balance Load Based on CPU/Memory Usage?

A. No, NLB does not respond to changes in the server load (such as CPU usage or memory utilization) or the health of an application.

If it can’t respond to the health of an application, it’s probably not a whole lot of use. So you either need to have Openfire run on both nodes all the time, and only assume it will reroute users if the whole node is down, or look for another solution.

@All,

Understanding that the server health is not being monitored, I do this anyway with another program, the redundant link is what I am looking for. NLB seems to provide this. We just need to figure out how to use OpenFire with it.

So, does this sound like it needs to be configured via OpenFire or HazelCast? I would think that something similar to what I was attempting before where HazelCast would look to the Cluster and associated IPs would do this. However, I understand that OpenFire might need configuring to use multiple hosts, etc.

Any thoughts?

Thanks for the help,

Johnathan

There is nothing in Openfire or Hazelcast that should, or can, be configured to support IP clustering. Once Hazelcast is configured so Openfire on both nodes can communicate internally, that is the extent of what Openfire will do. Openfire normally listens on port 5222 (plus others) on every interface/IP on the box, so there is no need to do any additional configuration - netstat should confirm it is listening on *:5222.

You would normally use some sort of load balancer or IP clustering to manage the XMPP connections to the application - Sounds like NLB sort of does this, but doesn’t do it in a way that makes it useful when setup with Openfire. The short answer is because NLB doesnt know if Openfire is up or down, it may route XMPP connections to an active NLB node which does not have Openfire running - I think you experienced that earlier when you said OF was down on one node, but connections didnt get routed to the other box properly.

What does ‘redundant link’ mean?

@David,

That makes sense that NLB wouldn’t know if the OpenFire service is down or not. Since it only monitors the IP via heartbeat and not the service.

So, the only possible reason that NLB would work is if the entire server ceases to respond.

Whe I referenced ‘redunant link’ I was meaning a link from both OpenFire servers to the client.

Thanks for the help, looks like we can’t do what I’m wanting. Although, it doesn’t sound that it would be too reasonable to request multiple servers and be able to have the client reconnect when one goes down.

Question though, when you click the advanced button before logging in, there is an auto detect option… what is that?

Thanks,

Johnathan

Most clients (Spark included) will automatically reconnect after 30s when they are disconnected. We’ve found that a few minutes for all clients to reconnect is better than having to reboot a server or have an extended downtime period.

Autodetect uses SRV records to figure out where to connect to - The client will lookup _xmpp-client._tcp.domain.com (where domain.com is the domain part of the login) and get back a hostname, along with priority to connect. You could maybe put both servers in the SRV response and let failover work that way, but I’m not sure how long some clients might take to timeout if trying to connect to a server that is dead.

http://wiki.xmpp.org/web/SRV_Records#XMPP_SRV_records

@David,

Where would I modify that?

Thanks,

Johnathan

On your DNS server