Unable to remove a remote component route

I would like to report a bug in Openfire 4.2.3 that an external component always gets a “conflict” error when it reconnects to the 2nd node in a clustering environment. I had a cluster with two nodes, a load balancer to these two nodes, and an external component which will do reconnection. The external component connects to the load balancer which picks the least loaded node. When the node is shutdown, the load balancer will pick the other node. The external component always gets a “conflict” error after the reconnection.

The bug is in leftCluster(byte[] nodeID) and removeComponentRoute(JID route) in src/java/org/jivesoftware/openfire/spi/RoutingTableImpl.java. The leftCluster method didn’t handle the remote component removal correctly because the removeComponentRoute method only removes a route for local component (it only referenced to server.getNodeID().)

Here is a suggested fix for removeComponentRoute():

    @Override
    public boolean removeComponentRoute(JID route) {
        return removeComponentRoute(route, server.getNodeID());
    }

    // A new method to remove local or remote component route.
    private boolean removeComponentRoute(JID route, NodeID nodeID) {
        String address = route.getDomain();
        boolean removed = false;
        Lock lock = CacheFactory.getLock(address, componentsCache);
        try {
            lock.lock();
            Set<NodeID> nodes = componentsCache.get(address);
            if (nodes != null) {
                removed = nodes.remove(nodeID);
                if (nodes.isEmpty()) {
                    componentsCache.remove(address);
                }
                else {
                    componentsCache.put(address, nodes);
                }
            }
        } finally {
            lock.unlock();
        }
        localRoutingTable.removeRoute(new DomainPair("", address));
        return removed;
    }

Here is a suggested fix in leftCluster():

        ...
        // remove component routes for the defunct node
        Lock componentLock = CacheFactory.getLock(nodeID, componentsCache);
        try {
            componentLock.lock();
            Map<String, NodeID> remoteComponents = new HashMap<>();
            NodeID nodeIDInstance = NodeID.getInstance( nodeID );
            for (Map.Entry<String, Set<NodeID>> entry : componentsCache.entrySet()) {
                if (entry.getValue().contains(nodeIDInstance)) {
                    remoteComponents.put(entry.getKey(), nodeIDInstance);
                }
            }
            for (Map.Entry<String, NodeID> entry : remoteComponents.entrySet()) {
                removeComponentRoute(new JID(entry.getKey()), entry.getValue());
            }
        }
        finally {
             componentLock.unlock();
        }

Thanks, could you kindly make a Pull Request against our github Openfire repository with your suggested changes?

I have logged this as https://issues.igniterealtime.org/browse/OF-1617

PR here: https://github.com/igniterealtime/Openfire/pull/1164/files

Oops, I missed your e-mail earlier. Thanks for incorporating the change to the OF repo.

Are you able to test with the current master/nightly builds of Openfire to confirm that this issue is now resolved?

I sync’ed my local workspace with the repo and compared the file in my local branch with the master branch; they are identical. I didn’t build Openfire from the master branch because I have some private changes in my local branch. Thanks for integrating the change to OF master branch.

-Vincent