Messages do not reach the recipient in cluster

After some testing i start to prodaction use of our cluster. About 500 users distributed by three nodes and it work. But some users dont see other users online in their roster. Messages dont delivers to recipient. I think it is because users connected to other nodes of cluster, and nodes not connecting properly with each other.

Maybe exist some way to view which node serve user session now?

Environment: 3 Openfire 3.8.2 node, HAProxy node, MySQL database node. Centos 5.9 operation system on all of this and 1.7.0_17 Oracle Corporation – Java HotSpot™ 64-Bit Server VM as Java VM for Openfire all in Hyper-V on different hosts.

Are you running Coherence or Hazelcast for clustering? Does the Server->Server Manager->Clustering page show the same nodes present on all three instances (e.g. log into each of them one at a time).

I don’t believe there is an easy way without OF to identify which node a user is connected to, however if you look in the session list and get their IP address you can do a 'netstat -nt | grep ’ on each node to find the TCP connection.

Are you running Coherence or Hazelcast for clustering? Does the Server->Server Manager->Clustering page show the same nodes present on all three instances (e.g. log into each of them one at a time).

Hazelcast. All nodes exist and work.

I don’t believe there is an easy way without OF to identify which node a user is connected to, however if you look in the session list and get their IP address you can do a ‘netstat -nt | grep ’ on each node to find the TCP connection.

On Openfire node it shows connection Balancer <=> Node. On HAProxy it show connection between Balancer <=> Clients. Maybe exists some ways to view HAProxy connection on both sides, but i dont know it.

So yesterday i turn off all cluster nodes and balancer, copy one node on new virtual machine, start here and disable cluster at all.

Today it seems work better. But some message received by recipient after minutes delay.

You can see i wrote at 10:07, and message received only at 10:20. It took 13 minutes! But it came…

(10:07) xxx: нет такой почты

(10:19) xxx: точно не пришло?

(10:19) yyy: да

(10:20) xxx: если сообщения не приходят это плохо ведь

(10:20) yyy: (10:20) xxx: нет такой почты

(10:20) yyy: вот приишло)))

(10:20) yyy: тока что

(10:20) yyy: застряло

(10:21) yyy: или ты тока что написал?

(10:21) xxx: (10:07) xxx: нет такой почты

Also our configuration includes ActiveDirectory integraion and ~500 users online at same time with shared roster.

I took a look at this and determined there is not a simple way to identify the cluster node for a given session in the admin console (this information is private to the routing component). However, I believe we can update the session list to identify which sessions are “local” and which are “remote” relative to the Openfire instance that is running the admin console. This would be of some assistance when troubleshooting message delivery issues in a cluster configuration.

I have opened OF-660 to track this item.