Openfire Cluster creation - several unexpected problems

wskibum · May 10, 2018, 3:50am

Openfire = 4.2.3
Hazelcast plugin = 2.3.0
Can’t go into POC production until these issues are addressed

HAproxy = 1.8.8
CentOS = 7.4
Nodes = 3
Living in AWS

Cluster changed hostnames on all 3 nodes to same as node-1, now messages don’t travel between nodes
I’ve changed the hostname name to that of the HAproxy node which is now the new hostname on all three cluster nodes and that seems to work, but seems unlikely to be the proper way to manage this.

User chat messages are not replicated across all nodes, if a user logs in on another node all his chat messages have not replicated over from the previous node.

gdt · May 10, 2018, 7:38am

By hostname, do you really mean the machine hostname? In which case you’ve probably got things set up not quite right - there should be no need to change a hostname.

I’d suggest
a) Install and configure a single node, with the correct XMPP domain for your cluster. Get that working correctly, enable clustering.
b) Install a second node. Copy the openfire/conf and openfire/resources/security folders from the first node to this node, start it up. Check it joins the cluster correctly.
c) Repeat for your third node.

You don’t specify what your DB is; just to be clear, clustering won’t work with the embedded DB.

Greg

dbh · May 10, 2018, 2:07pm

I found that for settings that are stored in the common database, such as FQDN, should be something that is true for all openfire cluster nodes . In this case, using the FQDN of the load balancer or proxy makes sense.

Are you using multicast, or explicit addresess for then haselcast config . I’m using a series of four clusters (one per environment), each referencing three static IPs.

Do you see openfire log entries when a node joins or leaves the cluster ?

wskibum · May 10, 2018, 2:15pm

Hi Greg,
Thanks for the reply. The DB is MariaDB in RDS.
Your suggested build is very close to what I did. So we have a domain name and a host name in OF instance

Build domain = example.domain
OF host name = server1.example.domain
Install cluster

Build 2
domain = example.domain
node name = server2.example.domain
install cluster
Now the server 2 OF hostname name has changed to server1.example.domain

Build 3
domain = example.domain
node name = server3.example.domain
install cluster
Now the server 3 OF hostname name has changes to server1.example.domain

I also have the monitoring service installed and configured to keep messages 365 days

wskibum · May 10, 2018, 2:39pm

Hi dbh,
Thanks for your reply. Using explicit addresses in hazelcast. Haven’t checked logs for nodes restarting or going down, but thanks for the suggestion. I’ll take a look at that.

wskibum · May 10, 2018, 2:47pm

Since I don’t appear to be the only one having to use a single hostname, lets take a look at my other issue with chat messages.

I’ve got 3 nodes, HAproxy is set for balance leastconn so users will be moving around to different nodes each time they connect depending on server connections.

A user Bob is logged in to server 1 and has a nice long chat with user Fred, then goes to bed
In the morning Bob logs in again and is connected to server 2. He goes into his chat window to review chat with Fred last night and finds that none of his previous conversation with Fred is there.

wskibum · May 10, 2018, 2:56pm

I did have to spend a fair amount of time on getting my HAproxy and cluster config files working. Could not find anything on Google that gave me the answers I needed, so I am posting my configs here to help folks who are just starting out.

hazelcast-local-config.xml

<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.*.xsd"
           xmlns="http://www.hazelcast.com/schema/config"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <group>
        <name>your-cluster-group</name>
        <password>password</password>
    </group>
    <network>
        <port auto-increment="true" port-count="100">5707</port>
        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="true">
                <member-list>
                    <member>xxx.xxx.xxx.1</member>
                    <member>xxx.xxx.xxx.2</member>
                    <member>xxx.xxx.xxx.3</member>
                </member-list>
            </tcp-ip>
            <aws enabled="false"/>
        </join>
        <interfaces enabled="true">
           <interface>xxx.xxx.xxx.1</interface>
        </interfaces>
    </network>
</hazelcast>

haproxy.conf

global
daemon
maxconn 256
user haproxy
group haproxy
log 127.0.0.1 local0 debug
log-tag haproxy

defaults
mode tcp
timeout connect 5000
timeout client 10000
timeout server 10000

frontend http-in
bind *:7443
bind *:8443
bind *:5222
option tcplog
mode tcp
default_backend servers

backend servers
mode tcp
option ssl-hello-chk
balance leastconn
stick-table type integer size 1k expire 12h
stick on dst_port
server server1 xxx.xxx.xxx.1 check
server server2 xxx.xxx.xxx.2 check
server server3 xxx.xxx.xxx.3 check

gdt · May 10, 2018, 2:59pm

Your “hostname” issue is a non-issue.

Your operating systems host name is unchanged (server1/2/3.example.domain)

You have a single XMPP domain name (server1.example.com) as you have a cluster, all hosts in that cluster must have the same XMPP domain name. If you get the opportunity to re-install, when you configure the first node, change the XMPP Domain Name from the default value of the host name (server1.example.com) to something like xmpp.example.com.

Greg

wskibum · May 10, 2018, 3:07pm

Thanks Greg, glad to know my hostname issue isn’t really an issue

dbh · May 10, 2018, 5:22pm

@wskibum , that looks very similar to my hazelcast config. The Join -> tcip-ip -> member-list is set up exactly the way I did and is working.

I suspect the part throwing you off is the cluster nodes not joining and seeing each other, or the (XMPP) Domain setting should be something like "chat.example.com’, and the FQDN should be that of your external load balancer / proxy which will route the requests to the individual openfire instances by their own FQDN or IP.

Agree with @gdt on his post.

wskibum · May 10, 2018, 5:26pm

cluster nodes all see each other ok
the domain is set for mydomain.net, should I change that to chat.mydomain.net?

wskibum · May 10, 2018, 6:04pm

While I expect the majority of users to use the web interface for chatting, I might have a few that want to use Spark on their desktop. I just tested it on mine, it connects fine but then reconnects about every 15 seconds
It complains that it was disconnected and then starts the reconnect.

Looking at my HAproxy setting above, do you see what the issue might be?

wskibum · May 10, 2018, 8:40pm

changed the domain name, restarted all the nodes
messages are still not replicating across the nodes

gdt · May 11, 2018, 7:24am

I’m not familiar with HA Proxy, but my reading of your config suggests you may have configured port 5222 as an HTTP port. It’s not HTTP, it’s a raw socket (from the proxies point of view). This may explain why Spark is losing it’s connection every 15 seconds.

Greg

wskibum · May 11, 2018, 3:55pm

That was it, removed the http and replaced with tcp and it works great

wskibum · May 11, 2018, 3:59pm

Greg, I am down to one request.
User to User messages are not replicating across the nodes, they only live on the node they were written on.
Is there a way to make them persistent like the group rooms?

gdt · May 11, 2018, 4:04pm

Sorry, you’ve exhausted my knowledge. That’s not a feature we use, so have never looked at it.

Greg

wskibum · May 11, 2018, 4:05pm

So this is normal behavior?

Greg, thank you so much for all your help!