OF 3.9.1, Ubuntu & Hazelcast - Cannot see cluster members & admin console login problem

We have set up two Openfire servers v3.9.1 under Ubuntu 12.04 LTS, both using an external MySQL DB backend. The servers as separate entities work fine. We wanted to cluster them in order to add High Availability functionality to our platform. By reading many posts and instructions, we used the Hazelcast plugin, to no effect!

When OpenFire is started in both servers and we visit the cluster information tab, each server can see as a cluster member only itself! That is, server A can see as a cluster member only server A, whereas server B can see only server B. We have disabled multicast as a cluster transport due to network incompatibility and we use the TCP/IP method of interconnecting the clusters.

In addition, we are using a load-balancer to load-balance the XMPP and the HTTP connections to the admin interface. However, when the servers are accessed via the load-balanced address (eg balance.mycompany.com), we cannot login to the admin console but if we access directly the admin console of either either serverA.mycompany.com or serverB.mycompany.com, we can login perfectly!

Any ideas or help regarding the two issues?? The cluster as-is is unusable!

The cluster setup is the following:

SERVER A (IP 192.168.0.1)



openfire
openfire



5701


224.2.2.3
54327


server1.mycompany.com
server2.mycompany.com




192.168.0.1


Server B (IP 192.168.0.2)



openfire
openfire



5701


224.2.2.3
54327


server1.mycompany.com
server2.mycompany.com




192.168.0.2


1 Like

Hi John -

This could be any number of network topology-related issues, but here are some things to try to troubleshoot the problem you are having:

  • From each cluster member, make sure you can resolve (or ping) the DNS name of the peer server(s). If you cannot resolve the DNS name, you can try specifying the IP address instead.
  • If you have a lot of latency in your network, you might need to extend the default connection timeout of five seconds using the “connection-timeout-seconds” attribute of the element.
  • Unless you have a good reason to do so, I would recommend that you use the default value for the configuration element (enabled=“false”).

If it’s any consolation, I know the clustering feature will work if you can get it configured correctly for your network, as I have set up and am actively running several Openfire clusters. You can find more information about configuring Hazelcast network settings here.

Good luck, and feel free to report back your findings here.

Tom

Hi Tom,

All nodes are pingable and DNS resolvable. As far as latency is concerned, the two nodes are interconnected my means of a single Gigabit switch, similar to be connected NIC-to-NIC. No traffic is throttled or prevented between the two hosts.

I also reverted the configuration for “interfaces” to false, to no avail…

We are seriously baffled with this behavior and we are desperately looking for a solution. Will us posting any logs do any help?

1 Like

I have the same exact issue.

Hazelcast 1.2.0, openfire 3.9.1

I have two servers on Amazon EC2, they share RDS mysql database.

Each node sees itself, but not the other.

Hard thing about this is that there are no exceptions in error.log, no other info in warn.log etc.

Each host can ping the other fine. Nmap shows port 5701 open on both hosts.

root@ip-172-31-45-245:/usr/share/openfire/plugins/hazelcast/classes# ping ip-172-31-43-108.us-west-2.compute.internal

PING ip-172-31-43-108.us-west-2.compute.internal (172.31.43.108) 56(84) bytes of data.

64 bytes from ip-172-31-43-108.us-west-2.compute.internal (172.31.43.108): icmp_req=1 ttl=64 time=0.776 ms

Each host has same config.

root@ip-172-31-43-108:/usr/share/openfire/plugins/hazelcast/classes# ls

hazelcast-cache-config.xml

…

5701 ip-172-31-43-108.us-west-2.compute.internal:5701 ip-172-31-45-245.us-west-2.compute.internal:5701

…

Adding to the local interface does not solve the issue. Using IP addresses instead of host names also does not.

hai im using OS ubuntu 12.x.x and OF 3.9.1, connection manager, and cluster from hazelcast.

my schema is :

4 server connection manager ( with roundrobin dns), 2 server xmpp with OF + cluster from hazelcast and 3 server for DB ( 1 LB + 2 DB master to master).

so far the clustering on 2 xmpp server working great,

my config is :

on server 1 :

5701 [ip-address-xmpp server-1]:5701 [ip-address-xmpp server-2]:5701 [ip-address-xmpp server-1]

on server 2

5701 [ip-address-xmpp server-2]:5701 [ip-address-xmpp server-1]:5701 [ip-address-xmpp server-2]

and why i just using ip address from server for joined cluster, coz on my roundrobin dns i have subdomain (subdomain.myserver.com) point to ip address both of server 1 and server 2 xmpp server

sory for my bad english if u have a lot question please contact me, maybe i can help u

1 Like

Kuncen

Using your configuration worked fine.

Thanks for helping me out.

I dont know why its different, but maybe its the order of specifying the members, or replacing with

Zubair

I tried with the same settings as above (i.e. using “member” and not “hostname”) in the configuration, also added the :5701 port notation… TO NO EFFECT!

Port 5701 in both servers is in listening mode, however each server/cluster member under the cluster administration recognises only itself and not the other!

Any logs I can dive in regarding this behavior?

I have done some more testing with the TCP-based join configuration (using CentOS and Windows, but not Ubuntu), and I can confirm that the child elements should be and not . This is a change that the Hazelcast implementation made a few versions ago. As I mentioned earlier, you can refer to the documentation for more information - it appears they have recently updated the documentation as well.

As an aside, it appears that the Hazelcast team has pulled the 3.1.5 release from their list of “prior releases”, possibly due to issues with that version. I am in the process of updating the plugin to use the latest 3.1.x release (currently 3.1.7). While I do not know if this will have an effect for your particular situation, it might be worth a shot. You can update your own copy in-place, or wait for the next nightly build to pick up the changes from the repository.

zubai

great to know working out at your servers, in my case if you using same domain for user both of xmpp, etc : user@domain.com, then cluster will be find your hosts from the xmpp server and coz of that we need of server not of server, moreover if we working out with roundrobin dns type

Aso, I’m sure you’ve checked this already, but just for the sake of completeness you should confirm that both hosts have the iptables/UFW firewall disabled, or that you have added the correct ports to the firewall configuration. I have not used the Ubuntu Linux distribution, but it appears that the UFW may be enabled by default, and if so it will need to be configured to allow connections via the Hazelcast ports.

ipTables are enabled but with allow all in all chains… The same behavior also appears with iptables completely disabled. Port 5701 is also telnet-able from both servers to each other, hence the port is not blocked.

When you complete the new build based on 3.1.7 let us know so we can try it out… I also haven’t managed to locate any log files that record hazelcast’s behavior… Any help?

Hello Guys,

I had the same issue and replacing with fixed the clustering part, Now, couple of other issues,

I have 2 servers in clustering in AWS using Hazelcasr. All servers are in the same zone.,

Server-a.openfire.domain.com and server-b.openfire…domain.com are instances name.

ELB is named chat.domain.com in DNS with CNAME

  • I’m able to access admin console via server-a or server-b url directly but not with chat.domain.com

  • Users are either connected to server-a or server-b and their names display the eiture backsend openfire server names instead of the ELB name.

  • When adding users, username@chat.domain fails but username@server-a or server-b works.

Is there a “clean” way to do this and make server-a and server-b make aware of chat.domain.com so that users are not confused with the backend servers under ELB?

Anand

I figured this out by reading a bit more on Google and the forums here. Here is an answer to my questions –

There are 2 things to configure correctly –

  1. Server Name: Each server name in the hazelcluster should have the same server name, from my earlier example - chat,domain.com. This is also the same name as my ELB under DNS and this is the same name the user will use.

  2. Hostname: This is the backend server names. From my example server-a.domain.com and server-b.domain.com.

So, as long as all openfire servers have the same “Server Name”, you can use this name as a single openfire instance. Adding users at user@chat.domain.com works too. No matter if users are connected to server-a or server-b, users always appear as user@chat.domain.com.

Anand

hello anand

(trying to help)

ur infrastructure right now is : (cmiiw)

Server-a.openfire.domain.com : with sample ip : 1.1.1.2

Server-b.openfire.domain.com : with sample ip : 1.1.1.3

Server-a.openfire.domain.com CNAME to chat.domain.com

Server-b.openfire.domain.com CNAME to chat.domain.com

ur user address is user@chat.domain.com

---- try my schema :

ip address : 1.1.1.2 (server-a) : point to chat.domain.com ( do not cname please use roundrobin way )

ip address : 1.1.1.3 (server-b) : point to chat.domain.com ( do not cname please use roundrobin way )

and if someone or ur self try to nslookup chat.domain.com, u will get the result like :

chat.domain.com resolve to 1.1.1.2

chat.domain.com resolve to 1.1.1.3

and then if you dont want to using ip address for administrator access you can point another subdomain to each ip address for ur xmpp server.

ip address : 1.1.1.2 : point to server-a.openfire.domain.com

ip address : 1.1.1.3 : point to server-b.openfire.domain.com