Hello,
I’ve upgrade my instance firstly to 4.6.5 15 Dec, finally to 4.6.6 17 Dec.
Today I got reports that the system is not working. Users could not log in, and those logged in did not receive messages.
I restarted the service and everything was back to normal.
This has already happened the second time. The first was after updating to version 4.6.5. Before this upgade, there was no such problem.
Unfortunately, the logs was overwritten after restarting the service. I only have Erro.log, info.log and warn.log
Can I ask for help, where to look for the problem?
The most obvious place to look for issues would be in the log files. There sadly is little other information that persists anywhere, useful for debugging purposes.
Another way that would be helpful is if you find a way to reproduce the problem.
Since upgrade to 4.6.6, i face disconnection with some XMPP client like pidgin and spark. This phenomenon was not observed with version OF 4.6.3 and before. Once this happen it is impossible to re-connect with pidgin or spark unless openfire is re-started
As a result i need to restart OF and then it is fixed but this issue has come 4 times in 2 days no idea what could be the root cause.
When this happen OF remains up and running but pidgin or spark cannot connect to server unless i restart OF. Could this be related to xmpp_client SRV records i use to redirect 5222 port to server port ?
This behaviour is obviously observed since 4.6.6 I run many instances till version 4.6.3 that never have such a prob. Looks similar that yours
Hope this helps, but this issue is weird and annoying as it is randomly happening
Thanks a lot in advance if you find something on that one
Your are probably right. It looks very similar to that issue. However it is very difficult to reproduce and find a root cause. I have 2 clients docker instances having the same problem as they both use Pidgin. It happen once a day but so far very difficult to isolate.
I am running a pidgin client/ same with spark , when it freezes i know we just got this issue, i will try to collect logs at that moment before restarting and share.
Hi guus I had the problem again this night for one docker instance only. After a restart, back to normal. I did capture openfire logs before re-starting my Docker instance. No clue what is causing this
Problem occor at 4:30 PM PST time
It happens only for one instance today, it looks random and only pidgin + Spark client are freezing
Let me know where i can send you OF logs, if you can check
Thanks in advance for your help
Latest message in error.log, unsure if related is
2021.12.22 16:30:42 org.jivesoftware.openfire.spi.RoutingTableImpl - Primary packet routing failed
org.jivesoftware.openfire.PacketException: Cannot route packet of type IQ or Presence to bare JID: <iq type="error" id="107-3343" to="im.360bcgroup.com" from="ios13push.monal.im"><pubsub xmlns="http://jabber.org/protocol/pubsub"><publish node="D2F16C41-C45C-4F78-B309-24EB7381BC20"><item><notification xmlns="urn:xmpp:push:0"><x xmlns="jabber:x:data" type="form"><field var="FORM_TYPE" type="hidden"><value>urn:xmpp:push:summary</value></field><field var="message-count" type="text-single"><value>1</value></field><field var="last-message-sender" type="text-single"/><field var="last-message-body" type="text-single"><value>New Message</value></field></x></notification></item></publish><publish-options><x xmlns="jabber:x:data" type="submit"><field var="FORM_TYPE"><value>http://jabber.org/protocol/pubsub#publish-options</value></field><field var="secret"><value>bd115e680889e431a013a597dec670c5409478ce477058a6beac87849654550d</value></field></x></publish-options></pubsub><error code="404" type="cancel"><remote-server-not-found xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/></error></iq>
at org.jivesoftware.openfire.spi.RoutingTableImpl.routeToLocalDomain(RoutingTableImpl.java:329) ~[xmppserver-4.6.6.jar:4.6.6]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.java:262) [xmppserver-4.6.6.jar:4.6.6]
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.returnErrorToSender(OutgoingSessionPromise.java:346) [xmppserver-4.6.6.jar:4.6.6]
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.run(OutgoingSessionPromise.java:245) [xmppserver-4.6.6.jar:4.6.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
2021.12.22 16:30:42 org.jivesoftware.openfire.interceptor.InterceptorManager - Error in interceptor: org.jivesoftware.openfire.plugin.ofmeet.BookmarkInterceptor@24f7003c while intercepting:
<iq type="error" id="864-3345" to="im.360bcgroup.com" from="ios13push.monal.im">
<pubsub xmlns="http://jabber.org/protocol/pubsub">
<publish node="D2F16C41-C45C-4F78-B309-24EB7381BC20">
<item>
<notification xmlns="urn:xmpp:push:0">
<x xmlns="jabber:x:data" type="form">
<field var="FORM_TYPE" type="hidden">
<value>urn:xmpp:push:summary</value>
</field>
<field var="message-count" type="text-single">
<value>1</value>
</field>
<field var="last-message-sender" type="text-single"/>
<field var="last-message-body" type="text-single">
<value>New Message</value>
</field>
</x>
</notification>
</item>
</publish>
<publish-options>
<x xmlns="jabber:x:data" type="submit">
<field var="FORM_TYPE">
<value>http://jabber.org/protocol/pubsub#publish-options</value>
</field>
<field var="secret">
<value>bd115e680889e431a013a597dec670c5409478ce477058a6beac87849654550d</value>
</field>
</x>
</publish-options>
</pubsub>
<error code="404" type="cancel">
<remote-server-not-found xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/>
</error>
</iq>
java.lang.NullPointerException: null
2021.12.22 16:30:42 org.jivesoftware.openfire.spi.RoutingTableImpl - Primary packet routing failed
org.jivesoftware.openfire.PacketException: Cannot route packet of type IQ or Presence to bare JID: <iq type="error" id="864-3345" to="im.360bcgroup.com" from="ios13push.monal.im"><pubsub xmlns="http://jabber.org/protocol/pubsub"><publish node="D2F16C41-C45C-4F78-B309-24EB7381BC20"><item><notification xmlns="urn:xmpp:push:0"><x xmlns="jabber:x:data" type="form"><field var="FORM_TYPE" type="hidden"><value>urn:xmpp:push:summary</value></field><field var="message-count" type="text-single"><value>1</value></field><field var="last-message-sender" type="text-single"/><field var="last-message-body" type="text-single"><value>New Message</value></field></x></notification></item></publish><publish-options><x xmlns="jabber:x:data" type="submit"><field var="FORM_TYPE"><value>http://jabber.org/protocol/pubsub#publish-options</value></field><field var="secret"><value>bd115e680889e431a013a597dec670c5409478ce477058a6beac87849654550d</value></field></x></publish-options></pubsub><error code="404" type="cancel"><remote-server-not-found xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/></error></iq>
at org.jivesoftware.openfire.spi.RoutingTableImpl.routeToLocalDomain(RoutingTableImpl.java:329) ~[xmppserver-4.6.6.jar:4.6.6]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.java:262) [xmppserver-4.6.6.jar:4.6.6]
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.returnErrorToSender(OutgoingSessionPromise.java:346) [xmppserver-4.6.6.jar:4.6.6]
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.run(OutgoingSessionPromise.java:232) [xmppserver-4.6.6.jar:4.6.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
@guus I have rolled back 2 customers instances to 4.6.3 and no more connection problems. Pidgin Spark disconnect and impossible to re-connect unless OF is restarted. Never observed this with 4.6.3, hope this helps to understand the problem.
@guus My old post seems to be gaining in importance. Paying attention to the link to Stackoverflow.
So the issue could have something to do with SSL/TLS in combination with mina…
Meanwhile i have implemented a little workaround plugin which checks for invalid sessions and
restarts the connection listener for plain connections (5222 for clientsessions)
Excepción:
java.lang.NullPointerException
at org.jivesoftware.openfire.plugin.sessioncheck.sessioncheck_jsp._jspService(sessioncheck_jsp.java:146)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.jivesoftware.openfire.container.PluginServlet.handleJSP(PluginServlet.java:432)
at org.jivesoftware.openfire.container.PluginServlet.service(PluginServlet.java:122)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1459)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:226)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:234)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129)
at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
…
If you confirm that works with 4.6.5, I can rollback to 4.6.6/7 and try it.