powered by Jive Software

Openfire 4.0.2 starts dropping connections all of a sudden


#1

this has started happening very recently. server gets to almost 18k online users. it works great for hours, suddenly it starts dropping connections.

there are many error entries in the error.log.

“IQ must be of type ‘set’ or ‘get’” has started coming after disabling PEP through system property ‘xmpp.pep.enabled’ to ‘false’. this error comes throughout the day but it works normally.

“closing connection due to error while processing the message” is the real one. when it starts coming it comes in numbers, and connected users falls to below 1k!

it takes hours to get back to normal. users keep trying to connect, it takes lot of attempts to get through.

Server details:

Openfire v4.0.2

Java version: 1.7.0_95

Memory: 6g out of 8g.

User and roster cache: 50mb(working very fine)

Plugins: REST API, Search, Subscription(for auto accepting)

Any help will be really appreciated. Thanks.

below is the part of error.log (when it just happened recently)

2016.08.02 12:01:27 org.jivesoftware.openfire.handler.IQHandler - Internal server error
java.lang.IllegalArgumentException: IQ must be of type 'set' or 'get'. Original IQ: <iq type="result" id="41-5438968" from="dr837052579760263@example.com" to="dv162171057539782@example.com"><pubsub xmlns="http://jabber.org/protocol/pubsub"><subscription jid="dv162171057539782@example.com" subscription="subscribed"><subscribe-options/></subscription></pubsub></iq>
  at org.xmpp.packet.IQ.createResultIQ(IQ.java:384)
  at org.jivesoftware.openfire.pep.IQPEPHandler.handleIQ(IQPEPHandler.java:318)
  at org.jivesoftware.openfire.handler.IQHandler.process(IQHandler.java:66)
  at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:372)
  at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:115)
  at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:78)
  at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:69)
  at org.jivesoftware.openfire.pep.PEPService.send(PEPService.java:325)
  at org.jivesoftware.openfire.pubsub.NodeSubscription.sendSubscriptionState(NodeSubscription.java:781)
  at org.jivesoftware.openfire.pubsub.Node.createSubscription(Node.java:2100)
  at org.jivesoftware.openfire.pubsub.PubSubEngine.subscribeNode(PubSubEngine.java:629)
  at org.jivesoftware.openfire.pubsub.PubSubEngine.process(PubSubEngine.java:104)
  at org.jivesoftware.openfire.pep.PEPServiceManager.process(PEPServiceManager.java:215)
  at org.jivesoftware.openfire.pep.IQPEPHandler.createSubscriptionToPEPService(IQPEPHandler.java:494)
  at org.jivesoftware.openfire.pep.IQPEPHandler.subscribedToPresence(IQPEPHandler.java:557)
  at org.jivesoftware.openfire.user.PresenceEventDispatcher.subscribedToPresence(PresenceEventDispatcher.java:130)
  at org.jivesoftware.openfire.handler.PresenceSubscribeHandler.process(PresenceSubscribeHandler.java:215)
  at org.jivesoftware.openfire.PresenceRouter.handle(PresenceRouter.java:176)
  at org.jivesoftware.openfire.PresenceRouter.route(PresenceRouter.java:80)
  at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:88)
  at org.jivesoftware.openfire.net.StanzaHandler.processPresence(StanzaHandler.java:360)
  at org.jivesoftware.openfire.net.ClientStanzaHandler.processPresence(ClientStanzaHandler.java:106)
  at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:277)
  at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:199)
  at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandler.java:181)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
  at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
  at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl.flush(ProtocolCodecFilter.java:407)
  at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:236)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
  at org.apache.mina.core.filterchain.IoFilterEvent.fire(IoFilterEvent.java:74)
  at org.apache.mina.core.session.IoEvent.run(IoEvent.java:63)
  at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTask(OrderedThreadPoolExecutor.java:769)
  at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTasks(OrderedThreadPoolExecutor.java:761)
  at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.run(OrderedThreadPoolExecutor.java:703)
  at java.lang.Thread.run(Thread.java:745)
2016.08.02 12:01:47 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to error while processing message: <iq from="dv910366765752478@example.com/mobile" id="5nbeptqbif" type="get"><ping xmlns="urn:xmpp:ping"/></iq>
java.lang.NullPointerException
2016.08.02 12:01:54 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to error while processing message: <presence from="dv1830022707230444@example.com" to="ctjbbymiguxs7mjk2rwcjg@example.com" type="subscribe"/>
java.lang.NullPointerException
2016.08.02 12:01:54 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to error while processing message: <presence from="dv1830022707230444@example.com" to="ctjbbymiguxs7mjk2rwcjg@example.com" type="subscribed"/>
java.lang.NullPointerException
2016.08.02 12:01:57 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to error while processing message: <presence from="dvtpkcjoi0eciqa4jdv@example.com" to="dv10206155956797886@example.com" type="subscribe"/>
java.lang.NullPointerException
2016.08.02 12:01:57 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to error while processing message: <presence from="dvtpkcjoi0eciqa4jdv@example.com" to="dv10206155956797886@example.com" type="subscribed"/>
java.lang.NullPointerException
2016.08.02 12:03:14 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to error while processing message: <message from="dv621319528038075@example.com/mobile" id="501d1e7a-498d-47c0-92f2-88d5194e86fb" type="chat" to="dvsupport@example.com"><body>Rahim</body><markable xmlns="urn:xmpp:chat-markers:0"/></message>
java.lang.NullPointerException
  at org.jivesoftware.openfire.net.ClientStanzaHandler.processMessage(ClientStanzaHandler.java:112)
  at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:232)
  at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:199)
  at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandler.java:181)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$TailFilter.messageReceived(DefaultIoFilterChain.java:690)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
  at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
  at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolDecoderOutputImpl.flush(ProtocolCodecFilter.java:407)
  at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:236)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
  at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
  at org.apache.mina.core.filterchain.IoFilterEvent.fire(IoFilterEvent.java:74)
  at org.apache.mina.core.session.IoEvent.run(IoEvent.java:63)
  at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTask(OrderedThreadPoolExecutor.java:769)
  at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.runTasks(OrderedThreadPoolExecutor.java:761)
  at org.apache.mina.filter.executor.OrderedThreadPoolExecutor$Worker.run(OrderedThreadPoolExecutor.java:703)
  at java.lang.Thread.run(Thread.java:745)

#2

Just upgraded to Java 1.8.0_101 Oracle Corporation – OpenJDK 64-Bit Server VM. Now am seeing many connections getting dropped, see warn.log snippet below:

2016.08.03 07:09:08 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x0004D6D0: nio socket, server, /111.111.111.111:12897 => /222.222.222.222:5222)
javax.net.ssl.SSLHandshakeException: SSL handshake failed.
at org.apache.mina.filter.ssl.SslFilter.messageReceived(SslFilter.java:487)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.access$1200(DefaultIoFilterChain.java:47)
at org.apache.mina.core.filterchain.DefaultIoFilterChain$EntryImpl$1.messageReceived(DefaultIoFilterChain.java:765)
at org.apache.mina.core.filterchain.IoFilterAdapter.messageReceived(IoFilterAdapter.java:109)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.callNextMessageReceived(DefaultIoFilterChain.java:417)
at org.apache.mina.core.filterchain.DefaultIoFilterChain.fireMessageReceived(DefaultIoFilterChain.java:410)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:710)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)
at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLException: Received fatal alert: internal_error
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666)
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634)
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1800)
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1083)
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907)
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781)
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
at org.apache.mina.filter.ssl.SslHandler.unwrap(SslHandler.java:728)
at org.apache.mina.filter.ssl.SslHandler.unwrapHandshake(SslHandler.java:666)
at org.apache.mina.filter.ssl.SslHandler.handshake(SslHandler.java:552)
at org.apache.mina.filter.ssl.SslHandler.messageReceived(SslHandler.java:351)
at org.apache.mina.filter.ssl.SslFilter.messageReceived(SslFilter.java:468)
... 15 more

#3

adding “-Dcom.sun.net.ssl.enableECC=false” in OPENFIRE_OPTS solved the SSL handshake error.


#4

Thanks for reporting your success. Do you know if there is some issue reported at MINA regarding this?


#5

Thanks for replying Daryl. Is there a separate log for MINA? In Openfire logs I could see above error only, which are related to MINA of course.

Edit: nothing I could find on MINA tracker. Found this([JDK-7016078] javax.net.ssl.SSLException: Received fatal alert: internal_error starting JDK 7 b126 - Java Bug System ) on openjdk tracker.

any comment on connection issues mentioned in the original post?


#6

Java 8 didn’t help. all of a sudden, same errors show up(referring to original post) and connections drop to below 1k.


#7

Some update:

we observed below query was taking lot of time to execute:

SELECT serviceID, nodeID, maxItems FROM ofPubsubNode WHERE leaf=1 AND persistItems=1 AND maxItems > 0

since we’ve disabled pubsub through system property, don’t know why this was still executing. it was taking more than 100secs. table was having more than 500k entries.

we truncated the table and it looks stable now. error “java.lang.IllegalArgumentException: IQ must be of type ‘set’ or ‘get’” mentioned in the original post seems to disappeared now. don’t know the relation between these two though.

will continue to monitor and update this post. cheers.


#8

Hi, Sandeep

You’ve got 18k online users. That’s great. Is these users from ldap?

I’ve got 1k users with 600+ online from ldap, I found that users logon very slow in the morning, and some messages delayed more than 4 hour.

Do you have any suggestion for optimize the speed of openfire and spark ?

Thanks a lot!


#9

no we don’t have LDAP users.

no suggestions as such, you can read through expert’s posts mentioning heap and cache size. though those would be little significant in your case.


#10

I am also facing the same issue. To enable clustering we had introduced aerospike for caching.
Routingusercache is being served by aerospike, not jvm.
Currently, we are running this on only 1 instance.
All of a sudden, once in a day, this happens, and it recovers automatically.

We have close to 12K users online.
Memory being used to run Openfire : 30GB
Plugins : RestAPI, Search, Custom Plugin
version : 4.2.3

Few points with which I can relate this to are -

  1. Execution of full gc.
  2. Introduction of aerospike

Would appreciate any help on this.


#11

Resolved this error with the help of a better data resource. Now, the system is faster than before. Using the resource pool management in a better manner helped me.