Openfire Cluster Setup Issue On EC2

Hello There, I am stuck up in setting/establishing openfire cluster on two EC2 instances.

Below screen shot depicts my hazelcast-local-config.xml, I am using public DNS names of the cluster EC2 members.

For XMPP domain, I have given compute.amazonaws.com. Also, allowed traffic on 5701 port from security groups, but unable to start the cluster.
Error I am getting - "Failed to start or join an existing cluster. Check the error log for more information."
No error could be find in log files.

Can someone please post the steps which I need to follow? This would be of great help.

image

Start from the basics;

a) Can you resolve both ec2-*.amazonaws.com hosts from both nodes?
b) Is a firewall blocking post 5701? Try telnet ec2...amazonaws.com 5701 from both nodes to both nodes

Greg

I can resolve both the nodes from each other, I have also enabled 5701 port from security group configuration.

Telnet on port 5701 doesn’t work, I think port 5701 will get enabled once clustering is successfully enabled, is it so?

Am I giving correct domain name in server property ‘xmpp.domain’??

Attached error log file here.
error.txt (29.2 KB)
.

Command ‘netstat -plnt’ doesn’t shows up the port 5701.

There’s not enough in there to go on, really.

First, enable trace and debug logging. Server -> System Properties and set both

log.debug.enabled and
log.trace.enabled to true, adding them if necessary.

Then shut down both nodes, delete all existing logs files, and start one node. Then post the results of all.log to somewhere like https://gist.github.com/ - but you might want to sanitise addressed/host names etc.

FWIW, your xmpp.domain will probably need changing (unless you want to offer XMPP support to everyone on compute.amazonaws) but it’s not the issue at hand - leave it as is for now.

Greg

Enabled debug and trace log level, stopped openfire service on both the EC2 nodes, removed log files, started openfire service only on one node and then enabled clustering, again the same problem and no trace in log file.

Attaching all.log here.
all.log (74.3 KB)

Greg it would really be great if you could suggest a way out here.

There’s clearly something wrong; there’s simply insufficient information in your log files. For comparison, this is what I see in mine.

2019.02.27 13:32:50 INFO  [Jetty-QTP-AdminConsole-34]: system-clustering.jsp - Enabling clustering
2019.02.27 13:32:50 INFO  [TaskEngine-pool-2]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Starting hazelcast clustering
2019.02.27 13:32:50 DEBUG [TaskEngine-pool-2]: org.jivesoftware.openfire.plugin.util.cache.ClusterClassLoader - Adding conf folder C:\Greg\openfire-4.3\conf
2019.02.27 13:32:51 INFO  [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [a.b.c.d]
2019.02.27 13:32:51 INFO  [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Prefer IPv4 stack is true, prefer IPv6 addresses is false
2019.02.27 13:32:51 WARN  [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Could not find a matching address to start with! Picking one of non-loopback addresses.
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'lo': isUp=true, isVirtual=false, isLoopback=true
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'eth0': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'net0': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'wlan0': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'net1': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'net2': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'eth1': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'eth2': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'eth3': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:51 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'net4': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'wlan1': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'eth4': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'net5': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'ppp0': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'eth5': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'net6': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Skipping NetworkInterface 'eth6': isUp=false, isVirtual=false, isLoopback=false
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Trying to bind inet socket address: 0.0.0.0/0.0.0.0:5701
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Bind successful to inet socket address: /0:0:0:0:0:0:0:0:5701
2019.02.27 13:32:52 INFO  [TaskEngine-pool-2]: com.hazelcast.instance.AddressPicker - [LOCAL] [openfire] [3.11.1] Picked [10.221.17.69]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
2019.02.27 13:32:52 INFO  [TaskEngine-pool-2]: com.hazelcast.system - [a.b.c.d]:5701 [openfire] [3.11.1] Hazelcast 3.11.1 (20181218 - d294f31) starting at [a.b.c.d]:5701
2019.02.27 13:32:52 INFO  [TaskEngine-pool-2]: com.hazelcast.system - [a.b.c.d]:5701 [openfire] [3.11.1] Copyright (c) 2008-2018, Hazelcast, Inc. All Rights Reserved.
2019.02.27 13:32:52 DEBUG [TaskEngine-pool-2]: com.hazelcast.system - [a.b.c.d]:5701 [openfire] [3.11.1] Configured Hazelcast Serialization version: 1

(loads more clustering related stuff)

2019.02.27 13:33:06 INFO  [Jetty-QTP-AdminConsole-34]: system-clustering.jsp - Clustering enabled

Can you post your full hazelcast-local-config.xml here - I wonder if an XM: syntax error is throwing off Hazelcast somehow.

Greg

PS If you’re not using the latest Hazelcast plugin, I suggest updating.

I am using Openfire server version - Openfire 4.3.1 with Hazelcast plugin version - 2.4.0

Attaching hazelcast-local-config.xml here.
hazelcast-local-config.xml (2.4 KB)

Loaded that up in IntelliJ and it told me straight away what the problem was :slight_smile:

Your <join> section should look like the following …

        <join>
            <multicast enabled="false"/>
            <tcp-ip enabled="true">
                <member>ec2-13-211-32-137.ap-southeast-2.compute.amazonaws.com</member>
                <member>ec2-13-54-0-115.ap-southeast-2.compute.amazonaws.com</member>
            </tcp-ip>
        </join>

No idea where you got the <member-list> tags from!

Greg

“member-list” tags, Greg means. :slight_smile:

2 Likes

Thanks, now corrected :slight_smile:

1 Like

Thanks Greg for your inputs here, really appreciate it, I tried those tags from one of the threads on forum, earlier I tried with the default tags, which hazelcast-local-config.xml contains, but was unable to start cluster.

Right now cluster has been started on both nodes, but showing only current node in cluster.

Attaching hazelcast config for both nodes and screenshots of clustering section.

hazelcast-local-config_Node-A.xml (2.6 KB)
hazelcast-local-config_Node-B.xml (2.6 KB)

@gdt Did you get a chance to look into the problem here?

Somehow I was able to start the cluster successfully, then I installed two plugins:

  1. Monitoring Service
  2. Uploaded OfChat JAR (for REST API SSE)

Now when I start the cluster, it fails, getting below error in logs:
Caused by: com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.ClassNotFoundException: org.jivesoftware.openfire.reporting.stats.GetStatistics **
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]

error.log (6.9 KB)

Issue seems to be with OfChat (Rest API) plugin, tried with plugin version - 0.4.9. Release 8, but still unable to drill down the issue.

OfChat (Rest API) plugin is not cluster aware yet. Please DO NOT USE in a cluster setup

Ok any workaround or alternatives I can follow, can we make it cluster aware???

Any other plugin with same functionality/set of APIs and which is cluster aware?

Thanks for your inputs.

I am not aware of any.

None. The work is yet to be done.