I run into a strange problem in a clustered MUC environment that when a remote node tries to handle the OccupantAddedEvent, it does not always invoke its run() method after the deserialization was done successfully. I am using Openfire 4.2.3 with Hazelcast Plugin 2.3.1 (using hazelcast 3.9.2 library) If I put a 25ms delay in the deserialization method (readExternal), the run() method will always be called. I am not convinced that putting a delay is a solution. I try to locate a piece of codes in Hazelcast Plugin that handles the OccupantAddedEvent object from hazelcast library, deserializes the object (i.e. readExternal()) and then executes the event (i.e. run().) Can someone shed some light which class(es) in the Hazelcast Plugin does this work?
For deserialisation have a look at
org.jivesoftware.openfire.plugin.util.cache.ClusterClassLoader which finds the right class loader- but most of it is done by the Hazelcast IMap implementation itself. However, do take notice of the the warning in the comment at the top of the ClusterCassLoader (I do have an idea to ease this restriction, but it’s not something more than an idea at this stage).
The OccupantAddedEvent is actually a cluster task - so have a look at
org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - specifically
public void doClusterTask(final ClusterTask<?> task) which submits the task to the other members in the cluster and
private static class CallableTask<V> which is the code that is run on the receiving node.
Also, I wonder if https://issues.igniterealtime.org/browse/OF-1261 could be the issue?
Thank you so much! Your information is very useful. It exposes a lot of timing issues caused by the application I am using.