Introducing Hazelcast ... a new way to cluster Openfire!

@By Tom Evans

A few of you more intrepid Openfire fans may have noticed a bit of recent activity in one of the branches of the Openfire SVN repository. Well, some of your fellow developers have been working behind the scenes to provide clustering support for PubSub, perhaps one of the lesser-known modules of our beloved real-time collaboration server. PubSub is an implementation of the XEP-0060 specification which extends the XMPP standard to add publish-subscribe functionality to the XMPP Core. However, if you have ever tried to use this module in Openfire, you may have been disappointed to discover that it was not designed to work in a clustered deployment. In fact, PubSub was forcibly disabled when deployed in a cluster! The main focus of the development effort was to address OF-205 and implement clustering support for the PubSub module. This work is now complete and the PubSub module is cluster-enabled and ready for action.

My Kingdom for a Cluster!

However, during the course of this development effort, the team also took a critical look at the current clustering implementation itself (the “clustering” plugin). This solution is currently the only way to run Openfire in a clustered configuration (where multiple servers share the load). Unfortunately this plugin is inextricably tied to Oracle Coherence, an enterprise class (and enterprise priced) middleware component. A recent quote from Oracle put the price of Coherence (EE) at well over $300K for a smallish deployment … clearly an untenable solution and incompatible partnership with the Openfire project.

We looked around for clustering alternatives that would have better affinity with Openfire, and landed with Hazelcast (Community Edition). Hazelcast is an open source clustering and highly scalable data distribution platform for Java. It enjoys a large deployment base and is licensed under the community-friendly Apache 2.0 license. There are also commercial licensing options available for deployments where professional support and enterprise security (among other features) are must-haves. This looked like a perfect fit for our needs, and likely for the Openfire community as well.

Where Two or Three are Gathered…

We are pleased to annouce the immediate availabilty of a new Hazelcast-based clustering plugin for Openfire. Starting today in the trunk of the Openfire SVN repository you will find the new plugin (/src/plugins/hazelcast/). Note that you will need to also setup the latest version of the Openfire core (currently 3.7.2-Beta) to use the new plugin.

We are looking for a few brave Openfire afficionados who can take the latest build and give it a whirl with your various deployment scenarios:

  • How many users and/or cluster member nodes do you have?
  • Which modules/components of Openfire are you using?
  • What is your typical JVM configuration? Preferred OS? Network topology (load balancer, LAN/WAN, etc.)?
    Your feedback is very important and will help ensure that this new clustering solution will be a robust and stable component in the next Openfire release.

Those who have wrestled with the existing clustering plugin will hopefully find the new solution to be much simpler to configure and deploy … and certainly much lower cost! There is a README file included with the new Hazelcast plugin that documents the basic steps for setting up an Openfire cluster, including links to the supporting Hazelcast documentation (if needed).

Testing … Testing … Is this thing on?

Please take the new build for a spin and report your feedback here. We will be posting an article to the main community page before long, but would love to have some initial feedback from the core developers before engaging a wider audience. No doubt there will be some bugs and configuration glitches … can you help us find and fix them?

Thanks in advance for your consideration and assistance.

Cross posted from http://community.igniterealtime.org/message/224947#224947

–UPDATE–

I have added a slighlty modified version of hazelcast that is backwards compatible with Openfire 3.6.4 and Openfire 3.7.1. Unzip and copy the clustering.jar file to your plugins folder.

Please test and post your results at http://community.igniterealtime.org/message/224947#224947
hazelcast-364-1.0.0.zip (1680594 Bytes)

1 Like

I’m one of those who have noticed high activity in the svn Good work. Though i can’t test as i don’t use pubsub nor need clusters.

Congratulations Tom and team, this is excellent stuff and something we’ll be giving a whirl soon as!

amazing work ! thanks

Very nice! Glad that openfire is gaining traction again.

I might have a look at it again… If I won’t encounter memory issues as when I tried the last time it might be a good alternative.

I just deploy a second server for my domain (forumanalogue.fr). I already tried the previous plugin but it wasn’t simple and didn’t work so good.

This new plugin is easy to install and work immediatly, if you don’t forget to use the same database for the two nodes ^^

The two servers run OpenFire 3.7.1 on Ubuntu 12.04 LTS with a mysql database. The both are on my local network but are opened server (free registration : cf http://xmpp.net/). I’ve got over 460 users.

Edit: It don’t work that well after word, cf : http://community.igniterealtime.org/message/225398#225398

I met a critical bug while using hazelcast based cluster.

I always got the following exception. Then openfire server hangs, which means the process still exists but it does not work.

I use openfire 3.7.2 beta, hazelcast 2.3.1 (or hazelcast 2.4 still has this error.)

2012.10.29 14:57:47org.jivesoftware.util.cache.CacheFactory - Hazelcast Instance is not active!

java.lang.IllegalStateException:Hazelcast Instance is not active!

atcom.hazelcast.impl.FactoryImpl.initialChecks(FactoryImpl.java:711)

atcom.hazelcast.impl.MProxyImpl.beforeCall(MProxyImpl.java:102)

atcom.hazelcast.impl.MProxyImpl.access$000(MProxyImpl.java:49)

atcom.hazelcast.impl.MProxyImpl$DynamicInvoker.invoke(MProxyImpl.java:64)

at$Proxy0.getLocalMapStats(Unknown Source)

atcom.hazelcast.impl.MProxyImpl.getLocalMapStats(MProxyImpl.java:258)

atcom.jivesoftware.util.cache.ClusteredCache.getCacheSize(ClusteredCache.java:1 40)

atorg.jivesoftware.util.cache.CacheWrapper.getCacheSize(CacheWrapper.java:73)

atcom.jivesoftware.util.cache.ClusteredCacheFactory.updateCacheStats(ClusteredC acheFactory.java:344)

atorg.jivesoftware.util.cache.CacheFactory$1.run(CacheFactory.java:636)

Can anybody give my some advice?

I tried this on AWS Ec2. Everything good but when i place these instances under AWS LoadBalancer , unable to connect to nodes via loadbalancer . I opened all nessary ports on LoadBalancer and if anyone worked with AWS instances and LoadBalancer then pls help me

I have not used the AWS load balancer, but here are a few things to keep in mind for load balancing in general:

  • Depending on which protocol you are using, you will need to configure the load balancer to use TCP (5222), HTTP (7070), or HTTPS (7443) to allow XMPP clients to connect, plus additional port(s) for the admin console (9090/9091) or S2S connections (5269), etc. as needed for your particular deployment.
  • You will need to have a valid health check for each clustered member. In our case we use a simple index.html on the BOSH port for this purpose (served from the {openfire_home}/resources/spank directory). Without a valid health check all the members will be marked as unavailable.
  • Configure your application to send traffic to the Openfire cluster via the DNS name assigned by AWS to your load balancer.

Hope that helps … let us know how it works out.

We have two Openfire 3.7.1 (Base version) Servers running and both are configured as domain - xmppserver. Both servers are on Windows Server and we have modified the hosts file (C:\windows\System32\drivers\ect\host) to include 127.0.0.1 xmppserver. We have clustered the two servers using Hazelcast and their ipaddresses and they are working fine. We have our own developed components like conference which attach to the domain and they are all working well. they work well because they only provide custom entries like search which do not depend on the members, However when we try and run the group chat which uses conference.xmppserver, we have encountered many issues. We get issues like not authorised - will be posting the actual details later.

Issue -

We think that there are two conference servers running and that each server is still managing its own conference. Is there some configurtion that we have done wrongly or should we upgrade to Openfire 3.7.2Beta ? Is this something that is known and fixed. Would appreciate if anyone could point us in the right direction.

Thanks for any ideas. Happy new year guys.

Below is the error - if we uncluster the servers it starts to work again -
Below are the three scenarios we get very consistently.

When we restart the openfire server - we get this

SEND at 2:26:58 PM :

SEND at 2:26:58 PM :

RECV at 2:27:01 PM :This">ituser1@xmppserver/ Messenger">This room is now unlocked.

SEND at 2:27:06 PM :01101000011xmppserverAdmins1text

RECV at 2:27:06 PM :

01101000011xmppserverAdmins1textroom2@conference.xmppserver

However after we have tried to create a room after restart - the second attempts onwards


Trying to create room2 again

SEND at 2:20:55 PM :

SEND at 2:20:55 PM :

RECV at 2:20:57 PM :

When we tried to create another room1 by ituser1 - we got an error on presence and then the error on room creation


SEND at 2:21:06 PM :

SEND at 2:21:06 PM :

RECV at 2:21:06 PM :

RECV at 2:21:06 PM :

Also we encountered an error with the openfire admin when looking at the client session -

Exception:

java.lang.IllegalStateException: Requested node [B@7916bd not found in cluster

at com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.doSynchronousCluster Task(CoherenceClusteredCacheFactory.java:325)

at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:538)

at com.jivesoftware.openfire.session.RemoteSession.doSynchronousClusterTask(Remote Session.java:171)

at com.jivesoftware.openfire.session.RemoteSession.isSecure(RemoteSession.java:130 )

at org.jivesoftware.openfire.admin.session_002dsummary_jsp._jspService(session_002 dsummary_jsp.java:362)

at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:530)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1216)

at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:11 8)

at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:74)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingF ilter.java:50)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:78)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:165)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:425)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:494)

at org.eclipse.jetty.server.session.SessionHandler.handle(SessionHandler.java:182)

at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:93 3)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:362)

at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:867 )

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)

at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:245)

at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.jav a:126)

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)

at org.eclipse.jetty.server.Server.handle(Server.java:334)

at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:559)

at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConne ction.java:992)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:541)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:203)

at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:406)

at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:4 62)

at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)

at java.lang.Thread.run(Unknown Source)

Based on the messages listed, it appears you are getting a system error (500). In this case, the best source of information would be the error log from the corresponding cluster member. You would expect to see a stack trace detailing the specific error - this would be helpful for isolating the root cause.

Also, you can confirm whether the cluster members have correctly identified one another in two ways. First, using the “Clustering” tab in the admin console, you will see a list of all the members known to be a part of the cluster. If you only see one server listed, the cluster is not properly configured. The second/alternative approach is to view the system console log (typically nohup.out) where you can see the system messages emitted by the clustering component as members join/leave the cluster.

The error you observed via the “Sessions” page in the admin console is a known issue (fixed in the 3.7.2 nightly build) that may occur if a cluster member has recently been restarted. A possible workaround is to view the client sessions from the admin console running on the other cluster member.

Hi Tom,

Thank you for great plugin. I am using the plugin version for Openfire 3.7.1 in production. Cluster works good but I have this error in logs:

2013.01.08 06:32:56 org.jivesoftware.openfire.interceptor.InterceptorManager - Error in interceptor: org.jivesoftware.openfire.plugin.BroadcastingPlugin@7dbd9d76 while intercepting:

java.lang.NullPointerException

Any ideea about what might cause this ?

Thanks in advance !

Hi Don -

Best bet for troubleshooing an NPE would be to look at the stack trace that follows the given message in your error log. I’m not too familiar with the BroadcastingPlugin, but on a first glance it appears to expect the “to” attribute, which is missing in the given inbound presence packet.

You can also try disabling/removing the broadcasting plugin (if possible) as a workaround.

Cheers,

Tom

Thank you very much for answering. Indeed removing broadcasting plugin eliminated the error. But now I am facing another issue. Conferencing isn’t working on my cluster.

sending: 4

async recv:

Hi

I just tried out the hazelcast plugin this afternoon.

I’ve already added a systems property indicated “hazelcast.config.xml.filename” system property I also edited the file and added the specific addresses to the know host list.

I am getting an error that some of you might be familiar with. Can someone give me some advise on what I am doing wrong?

com.jivesoftware.util.cache.ClusteredCacheFactory - Unable to start clustering - continuing in local mode. java.lang.NullPointerException

at com.jivesoftware.util.cache.ClusterClassLoader.getResource(ClusterClassLoader.j ava:79)

at java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1159)

at com.hazelcast.config.ClasspathXmlConfig.(ClasspathXmlConfig.java:39)

at com.hazelcast.config.ClasspathXmlConfig.(ClasspathXmlConfig.java:33)

at com.jivesoftware.util.cache.ClusteredCacheFactory.startCluster(ClusteredCacheFa ctory.java:121)

at org.jivesoftware.util.cache.CacheFactory.startClustering(CacheFactory.java:622)

at org.jivesoftware.openfire.cluster.ClusterManager.startup(ClusterManager.java:28 5)

at org.jivesoftware.openfire.cluster.ClusterManager$1.xmlPropertySet(ClusterManage r.java:65)

at org.jivesoftware.util.PropertyEventDispatcher.dispatchEvent(PropertyEventDispat cher.java:98)

at org.jivesoftware.util.XMLProperties.setProperty(XMLProperties.java:460)

at org.jivesoftware.util.JiveGlobals.setXMLProperty(JiveGlobals.java:435)

at org.jivesoftware.openfire.cluster.ClusterManager.setClusteringEnabled(ClusterMa nager.java:324)

at org.jivesoftware.openfire.admin.system_002dclustering_jsp._jspService(system_00 2dclustering_jsp.java:103)

at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1359)

at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:11 8)

at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)

at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:74)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)

at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingF ilter.java:50)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)

at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:78)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)

at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:164)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:478)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:520)

at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:22 7)

at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:94 1)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409)

at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186 )

at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875 )

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)

at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:250)

at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.jav a:149)

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)

at org.eclipse.jetty.server.Server.handle(Server.java:349)

at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441)

at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.j ava:936)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:801)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:224)

at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:51 )

at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.jav a:586)

at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java :44)

at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598 )

at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533)

at java.lang.Thread.run(Thread.java:636)

Does it work if you remove the hazelcast.config.xml.filename system property and just edit the file in plugins/hazelcast/classes/?

I’ve not tried using the system property - Will try it today and make sure it works.

Hi David,

I tried it without the property and I get this error. I just added it after reading about it in this thread and it didn’t seem to help.

Best Regards,

Stevenson Lee

Hello,

I would request shipping with the hazelcast-cloud.jar for AWS integration. I’ve tested w/ 2.5.1 hazelcast.jar and hazelcast-cloud.jar under 3.8.2 Openfire and seems to work well enough.

http://www.hazelcast.com/docs/1.9.4/manual/multi_html/ch11s02.html