java.lang.NullPointerException when user logging in to one of two clustered servers

nosferatum · January 15, 2015, 2:29pm

Hello.

I set up Openfire 3.9.3 with hazelcast plugin 1.2.1 (built from Openfire 3.9.3 sources) and put two Openfire servers (say server1 and server2) behind HAProxy, configured for roundrobin balance with equal weights.

So the user is sent to server1 or server2 alternatively.

When user logs in is directed to server1, the similar exception appears on server2 console :

Exception in thread "hz.openfire.cached.thread-2" java.lang.NullPointerException
    at com.jivesoftware.util.cache.ClusterListener$DirectedPresenceListener.getHandlers(ClusterListener.java:474)
    at com.jivesoftware.util.cache.ClusterListener$DirectedPresenceListener.entryAdded(ClusterListener.java:413)
    at com.hazelcast.map.MapService.dispatchEvent(MapService.java:706)
    at com.hazelcast.map.MapService.dispatchEvent(MapService.java:69)
    at com.hazelcast.spi.impl.EventServiceImpl$EventPacketProcessor.process(EventServiceImpl.java:488)
    at com.hazelcast.spi.impl.EventServiceImpl$RemoteEventPacketProcessor.run(EventServiceImpl.java:514)
    at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:142)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
    at com.hazelcast.util.executor.PoolExecutorThreadFactory$ManagedThread.run(PoolExecutorThreadFactory.java:59)

When I log out and log in again, the user is directed to server2, and the same exception fails on server1, and so on.

Line ClusterListener:474 is:

for (DirectedPresence directedPresence : (Collection<DirectedPresence>)value) {

and value is null in this case.

The case is quite same to mentioned here: hazelcast plugin issue in version 1.2.1 | Ignite Realtime

Will there be the correction for this error and is checking value for null sufficient here?

Tom_Evans1 · January 16, 2015, 10:01pm

I believe this NPE can be fixed easily, but I’m not sure that will address the root cause here. Multiple connections for single user session are currently expected to terminate in the same host, rather than switching among members within the cluster. We are planning to address this limitation at some point (for example, see OF-689), but it is not supported at this time.

As a workaround, we recommend that you configure your load balancer with a persistence rule that will select one of the cluster members based on the client source IP address (or subnet) in the inbound request. In the event that the selected member goes offline, the LB should be able to re-balance the client session using one of the remaining active nodes.

nosferatum · January 19, 2015, 8:54am

Hello, Tom, and thank you for your answer.

However, your suggested workaround using LB’s settings will not solve the problem, as NPE fails anyway.

I tried to log in with 2 different users (say user1 and user2), from different IPs, with two Openfire servers working in a cluster. NPE on “server on which user DOES NOT login” fails in all of following cases:

Very first login of user1 (logs on to server1, NPE fails on server2).
Second login of user2 (logs on to server2, NPE fails on server1).
LB set to use only server1. Very first login of user1 (logs on to server1, NPE fails on server2).
LB set to use only server1. Second login of user2 (logs on to server1, NPE fails on server2).

So anyhow NPE fails on different-from-logged-in-server, even if users are directed only to one server and therefore not alternating between servers.

According to this, I added null-checks to ClusterListener.DirectedPresenceListener class and rebuilt hazelcast plugin (from openfire 3.9.3 sources):

in method *ClusterListener.**DirectedPresenceListener#*getHandlers:

if (value != null) { // fix to prevent NPE on other servers, when user logs in on any server in the cluster
  for (DirectedPresence directedPresence : (Collection<DirectedPresence>) value) {
  answer.add(directedPresence.getHandler());
  }
} // fix

in method *ClusterListener.**DirectedPresenceListener#*getReceivers:

if (value != null) { // fix to prevent NPE on other servers, when user logs in on any server in the cluster
   for (DirectedPresence directedPresence : (Collection<DirectedPresence>)value) {
   if (directedPresence.getHandler().equals(handler)) {
   return directedPresence.getReceivers();
  }
  }
} // fix

After adding these checks, NPE-s stop to fail.

I suggest adding these little fixes to hazelcast plugin sources, so that I can use standard distribution instead of manually modified and rebuilt plugin.

Javier_Deferrari · July 27, 2015, 9:07pm

I am having this same issue when using 1.2.2.

Has this been fixed on the latest version? We get a lot of errors since we have a load balancer calling the RestAPI with basic auth. When the RestAPI authenticates the user we get the error.

Thanks

rinomasaya · April 9, 2017, 7:27am

NullPointerExceptions are exceptions that occur when you try to use a reference that points to no location in memory (null) as though it were referencing an object. Calling a method on a null reference or trying to access a field of a null reference will trigger a NullPointerException. More about…NullPointerException

Rino

CSH · April 9, 2017, 9:55am

Troll