Clustering not starting, trying to bind to incorrect address

License Information

Enterprise License Details

License Type: Commercial License Created:

Maximum Users: 50 License Expires:

Cluster Members: 2

When ever I try to enable clustering, I get:

Failed to start or join an existing cluster. Check the error log for more information.

Server is running Openfire Enterprise version 3.4.4, Enterprise plugin 3.4.4, Search plugin 1.4.1.

Server-to-Server is enabled, close idle after 10mins, “Anyone” accessible.

DB-backend is MySQL.

From the log:

2008.01.30 11:41:33 com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.startCluster(Coherenc eClusteredCacheFactory.java:117) Unable to start clustering - continuing in local mode

(Wrapped: UnicastUdpSocket{State=STATE_INITIAL, address:port=a.b.0.250:8088}) java.net.BindException: Cannot assign requested address

at java.net.PlainDatagramSocketImpl.bind0(Native Method)

at java.net.PlainDatagramSocketImpl.bind(Unknown Source)

at java.net.DatagramSocket.bind(Unknown Source)

at java.net.DatagramSocket.<init>(Unknown Source)

at java.net.DatagramSocket.<init>(Unknown Source)

at com.tangosol.coherence.component.net.socket.UdpSocket.instantiateDatagramSocket (UdpSocket.CDB:20)

at com.tangosol.coherence.component.net.socket.UdpSocket.open(UdpSocket.CDB:8)

at com.tangosol.coherence.component.net.Cluster$SocketManager$UnicastUdpSocket.ope n(Cluster.CDB:6)

at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:73)

at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)

at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB: 3)

at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CD B:5)

at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeClus ter.CDB:26)

at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)

at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:951)

at com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.startCluster(Coheren ceClusteredCacheFactory.java:73)

at org.jivesoftware.util.cache.CacheFactory.startClustering(CacheFactory.java:541)

at org.jivesoftware.openfire.cluster.ClusterManager.startup(ClusterManager.java:25 8)

at com.jivesoftware.openfire.enterprise.EnterprisePlugin.initializePlugin(Enterpri sePlugin.java:269)

at org.jivesoftware.openfire.container.PluginManager.loadPlugin(PluginManager.java :447)

at org.jivesoftware.openfire.container.PluginManager.access$300(PluginManager.java :46)

at org.jivesoftware.openfire.container.PluginManager$PluginMonitor.run(PluginManag er.java:1013)

at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)

I don’t know why it’s trying to bind to 10.100.0.250:8088, the server’s IP is a.b.0.2. I cannot find (grep) 8088 from any file in the openfire directory.

Where can I change the BIND-to address?

Thanks,

-John

Hey John,

This error occurs when the port that openfire is attempting to use is already bound to another application. What other applications do you have running on your server? I believe port 8088 is the port that openfire uses to communicate information to the cluster and I will have to check on weather or not that can be changed, and how.

~Sean

Any idea why it’s trying to bind to (OctA).(OctB).0.250 instead of the server’s actual eth0 interface (OctA).(OctB).0.3?

Netstat doesn’t report anything listening on 8088, there are no firewall rules blocking that port for IP.

Log text attached.

Oh that is odd, I must have misread, I didn’t realize it was trying to bind on the entirly wrong IP. Does your server have 2 NIC’s or is it on any sort of VM?

No, only one non-local interface configured (eth0).

I’ve run out of ideas, bar perhaps an SSL or non-obvious configuration issue or conflict stopping clustering from initialising correctly.

Could the license file have anything to do with it?

There was a previous license on the machine, one which was missing clustering, and which was since replaced with the current one (after an upgrade to 3.4.4).

Does the IP it’s trying to bind to mean anything to you? That is, do you recognize 10.100.0.250? If the license did not have clustering you would not be allowed to turrn it on. By this point you’re enabled clustering but it looks like the node is having trouble binding to the addres.

Checked /etc/hosts, and found a rogue definition of the 250 address where the .3 should’ve been. Hands met foreheads.

Going from http://www.igniterealtime.org/community/docs/DOC-1260 we’re currently trying to enable unicast cluster member definitions on both openfire servers as multicast doesn’t seem to be working.

Problem solved.

Thanks for your help

Awesome, glad you got it working!