Issues with pubsub while clustering

We started with OpenFire 3.6.4 with clustering enabled using coherence and oracle as the DB. A disco item sent to the pubsub module returned a 500 internal server error on restart

Steps to reproduce


  1. Client logins to the server

  2. Client creates a node on the pubsub system

  3. Client sends a disco items to the pubsub system

  4. Client recevies the list of nodes in the pubsub system

  5. Server is restarted

  6. Client reconnects

  7. Client sends a disco items to the pubsub system

  8. A 500 internal server error is thrown

To overcome the above problem we built the 3.7.0 beta release from the SVN trunk and this solved the above problem but brought in new problems.

After a server restart a disco info on any existing node (entry is there on DB) returns a 404 item not found.

Similary after server restart a disco item on a root node expecting a list of leaf nodes also returns a 404 item not found.

All of these worked perfectly before the server was restarted.

This also works if clustering is disabled from the admin console.

From the debugging I have done so far, the “nodes” ConcurrentHashMap in the PubSubModule class is empty when these specific disco events are sent even though this Map was loaded with data from the DB during server startup. Looks like the coherence clustering mechanism is creating this issue. Any thoughts to solve this would be highly appreciated.

Thanks,

Ardi