Clustering and Pubsub

yann3 · March 20, 2013, 3:56pm

I try to use Hazelcast Clustering Plugin 1.0.4 to have 2 openfire 3.8.1 with one Database Oracle.

To do the loadbalancing I’ve got haproxy

When I try to publish (SMACK API function) items leafnode, half time my user don’t receive the notification.

With some parameter the user can receive all the publication or nothing.

IP Configuration:

case1 _ server1 with default port 5222

case2 _ server2 with default port 5222

case3 _ server1 with port 52220 for the loadbalancing (haproxy)

publisher (case3) - recipient (case3) => 1 / 2 receive

publisher (case1) - recipient (case3) => all receive

publisher (case1) - recipient (case1) => all receive

publisher (case2 ) - recipient (case1) => KO

publisher (case3) - recipient (case2) => 1 / 2 receive

If you have a idea to help me

ps : sorry for the very bad english

**
**

Tom_Evans1 · March 25, 2013, 4:31pm

You should start by verifying that both servers (case 1 and case2) are joined to a single cluster. The easiest way to confirm this is using the “Clustering” page on the admin console. You should see two servers listed on this page. You can also see this information in the system log file (nohup.out) for each of the servers.

If the servers are not joined as a single cluster, you may need to modify the default Hazelcast configuration file to use TCP (unicast) rather that UDP (multicast) for the discovery protocol (depending on your network configuration).

If the servers are properly joined in a single cluster, you should check the error and warning log files to determine whether you have some other type of configuration problem. Feel free to post further details here.

yann3 · March 25, 2013, 8:04pm

Thanks for you reply

I do all the verification.

The clustering seem to be good

When I start I’ve got this in the log:

Members [2] {

    Member [server178.noe.xxx.fr]:5701 this

    Member [server185.noe.xxx.fr]:5701

}

In my hazelcast-cache-config.xml I user TCP

<tcp-ip enabled="true">

  <hostname>server178.noe.xxx.fr:5701</hostname>

  <hostname>server185.noe.xxx.fr:5701</hostname>

</tcp-ip>

Unfortunately in the log file I haven’t error.

I make my publisher tool and the recipient with smack api 3.2.2

My openfire is 3.8.1

Do you think it’s could be a version problem??

Or maybe some parameter in the hazelcast-cache-config.xml??

Philippe_CEROU · April 4, 2013, 7:37am

I Yann,

have you reached to make your cluster working ?

We have here exactly the same problem (2 nodes with Openfire 3.8.1, Oracle database 11.2, Hazelcast 1.0.4 with TCP/IP configuration).

All the users are distributed on the 2 nodes, the chat is working perfectly but the pubsub items are dynamically received only by users on the same node as the publisher. The users of the second node do not receive anything and when this users cut and reconnect to the same “bad” node they receive the last 2 missed messages.

It seems like Hazelcast transmit the items to the second node (We can see it in the console in the “clustering” item) but, on the second node, it do not seems to “send” the signal to the “pubsub engine” to treat this ones has if they were just published…

Robin_Collier · April 7, 2013, 1:00pm

Philippe CEROU wrote:

:
It seems like Hazelcast transmit the items to the second node (We can see it in the console in the “clustering” item) but, on the second node, it do not seems to “send” the signal to the “pubsub engine” to treat this ones has if they were just published…

It isn’t supposed to. The second node is simply supposed to deliver the message to the user.

This obviously doesn’t fix your problem, but I thought I would clarify that for you. The node which receives the publish does all the processing and routes to all the subscribers. The hazelcast messages you see would be that routing to users logged in to the second node. The second node in this case would not need to do any pubsub processing, just message delivery. Which in your case doesn’t appear to be happening.

Sorry, but that is about all I can contribute.

Philippe_CEROU · June 4, 2013, 3:24pm

Hi everybody,

we just have tried back the same configuration with OF 3.8.2 and HA 1.0.6 and we still have the same problem (If we publish on #1 the clients on #2 catch no messages).

For onformation we have no error in any of the logs files and HA see correctly the cluster.

Does anybody has experimented OF clustering with PUBSUB Items ?