Since a few days I saw a extremely high count of XMPP-packets send by Openfire in my statistics. The is no abnormal traffic on the network device, nothing in the logs, only 100% CPU load.
Today I wrote a small plugin, which is able to count packets per user per time. I found out, that Openfire does send 7000-8000 packets per second from an JID that is offline to another unavailable JID.
The first JID is a normal user account which is offline. I’m sure about this, because its my own account and I’m online with another resource.
The second JID is chat.yahoo.jabber.rwth-aachen.de, which belongs to our PyYIMt transport. This JID does is also unavailable, because it’s the yahoo MUC, which does not work.
Server runs stable, so I will wait until tomorrow morning (08:00 CEST) until I try a restart.
I will try to investigate more about this weird bug.
Hrm. Does the packet appear to be dancing back and forth between the two entities? (even though they aren’t available) I’ve seen this occur before, but I was never able to track down a way to reproduce it, and it vanished and never showed up for me again. Are you able to easily reproduce it?
Does the packet appear to be dancing back and forth between the two entities?
As it looks like, no. Only in this one direction.
Are you able to easily reproduce it?
I could not stop it by now.
Its possible that this has something to do with unstable network connections. That day it started I was online with that account using an unstable WLAN connection. (the network connection crashed every 10 seconds)
I will now try to drop the packets explicitly using a filter rule.
I don’t want to do a restart of the server. It’s my production server, currently 170+ users online. I will wait until tomorrow morning, when much less where connected.
it tries to route the packet, fails, tries again, fails, over and over again?
Ok, I identified the TID of the thread. But what does
You should find somewhere in the javacore a line with “… nid=0x207A …” - this should be the looping thread.
mean? Do I need this stacktrace tool for this? According to the screenshots, it seems the tool needs an XServer. I have no XServer installed, since it is a production server, not a desktop system.
you need a stack trace, kill -3 PID should produce one. Windows guys may want to use the Stacktrace tool. One may want to take a look at http://www.adaptj.com/main/tracehowtos#ht1
"SocketAcceptorIoProcessor-1.0" prio=10 tid=0x09d6c800 nid=0x446b runnable [http://0x9f815000..0x9f8160b0|http://0x9f815000..0x9f8160b0]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0xa2e9a200> (a sun.nio.ch.Util$1)
- locked <0xa2e9a1f0> (a java.util.Collections$UnmodifiableSet)
- locked <0xa2e9a0a8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:480)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
"SocketAcceptorIoProcessor-2.0" prio=10 tid=0x0993d000 nid=0x447a runnable [http://0x9f37f000..0x9f37fe30|http://0x9f37f000..0x9f37fe30]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0xa2f0e8a0> (a sun.nio.ch.Util$1)
- locked <0xa2f0e8b0> (a java.util.Collections$UnmodifiableSet)
- locked <0xa2f0e860> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:480)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
"Finalizer" daemon prio=10 tid=0x0961d000 nid=0x443e in Object.wait() [http://0xa08fe000..0xa08fefb0|http://0xa08fe000..0xa08fefb0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xa2a708e0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0xa2a708e0> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
do you write a GC log? I wonder if one thread is allocation and releasing objects in a loop and thus the garbage collector runs every second or more often. Of if your PermSize is too small causing also a lot of useless garbage collections.