NPE - LocalMUCRoom#joinRoom

We’re running v4.5.2 with hazelcast clustering plugin in prod. We’re seeing these NPEs in our error.log files so I investigated. We expect the same NPE in v4.6.2 because the code hasn’t changed significantly.

org.jivesoftware.openfire.muc.spi.MultiUserChatServiceImpl - Internal server error
java.lang.NullPointerException: null
        at org.jivesoftware.openfire.muc.spi.LocalMUCRoom.joinRoom(LocalMUCRoom.java:676) ~[xmppserver-4.5.2.jar:4.5.2]
        at org.jivesoftware.openfire.muc.spi.LocalMUCUser.process(LocalMUCUser.java:485) ~[xmppserver-4.5.2.jar:4.5.2]
        at org.jivesoftware.openfire.muc.spi.LocalMUCUser.process(LocalMUCUser.java:180) ~[xmppserver-4.5.2.jar:4.5.2]

If I’m looking at the right source(s), then apparently “occupantsByFullJID.get(user.getAddress())” returned “null”, so the NPE was thrown @ line 676 on branch v4.5.2. I’m not sure what caused that. I mean - it would be nice to have a error-level log message(s) that exposed values of the method params when the exception was thrown, or some other debug-level log message(s) that exposed why null was returned, but we don’t have that.

Anyways, the point of this report/post is to discuss handling the NPE.

The try/finally block acknowledges an exception may be thrown, but the #joinRoom code after the finally block pretends everything succeeded above. Would it make more sense to flag if an exception is thrown and not proceed with some of the code after the finally block? I’m referring to both v4.5.2 and v4.6.2.

Thanks for any feedback!

This issue seems to be a direct result of a larger problem that is specific to the combination of ‘clustering’ and ‘MUC’. To resolve this problem, a partial rewrite will be performed (this issue is being tracked in OF-2219). The solution is scheduled to be available in Openfire 4.7.0.