Spark clients intermittently disconnect from Openfire

sledge · March 4, 2018, 8:56pm

We have been running Openfire (Using MySQL DB) and Spark at our organization since 2011-2012. We have roughly 400 users very actively using it at any given time.

Openfire version: 4.2.2
Spark version: 2.8.3

We have recently been having an issue with random users where their Spark clients disconnect from Openfire. Often they can’t log back in for awhile when this happens. This was on Openfire 4.2 and Spark 2.6.3.

After performing our usual troubleshooting steps, we decided to migrate our existing MySQL DB to a new Centos 7 server with 64-bit Openfire and Java. We also upgraded all of our clients from Spark 2.6.3 to 2.8.3. Neither of these upgrades resolved the issue.

One possible cause could be our hourly quiesced snapshots of the VM as part of our DR Plan; however, the disconnects don’t line up with the timing of these snapshots. We have been performing these since 2014 without an issue, but I am wondering if it is causing a delay that sometimes triggers some clients to think they’ve lost their connection.

Beyond that, I am at a loss to explain the disconnects and am nearly ready to throw my hands up and look at Skype for Business. Chat using Openfire/Spark is the primary way our staff communicate with each other. I’d be grateful for any help people can offer to clear up this issue and eliminate the frustration placed on me and our staff.

speedy · March 5, 2018, 2:49am

Skype for Business is totally an option, but something tells me it really isn’t! Anyway, when did you start having this issue? after upgrading to 4.2 or before, and if so, what ever was the last version you ran that did not give you the issue? are you running the snapshot through the hypervisor or using storage? Are you by chance running vmware esx 6.5, and did you notice the issue after moving to 6.5?

wroot · March 5, 2018, 5:25am

You can try disabling snapshots for a few days and see if there is a difference. I know that at least with Windows Hyper-V it does pause VM for a bit when doing snapshots and connection might be lost during that time. I have also noticed, that doing a backup of running VM with Backup Exec (and probably other software) is also making a temporary snapshot and event log is filled with many sorts of connection loss errors (DCOM, etc.).

Offtopic: it is not that Skype for Business is problem free (we have moved from Openfire/Spark to SfB because of video conferencing, easier setup for outside communicating, but mainly because we have Office 365 licenses). Sometimes messages don’t go through, although both parties are online. Sometimes you get a warning that a message won’t be delivered, though it should at least go to receiver’s inbox as a missed conversation. There are some other issues. But you don’t have forums where you can discuss it with developers. You have a dumb support line which can only answer questions in their FAQ… Openfire is not problem free either and you can’t always get attention and there are only a few volunteer developers, but it is free at least Oh, and MS has plans to replace SfB with Teams, which is not prime time ready still…

sledge · March 5, 2018, 2:57pm

Thanks for offering your time and assistance. The issue started at the end of January. Prior that everything had been going smoothly. We had been performing hourly snapshots and night backups (also quiesced using Veeam) for years without issue.

These snapshots are initiated by the storage, but because they are quiesced they involve a VMware hypervisor snapshot as well. We are currently still on 5.5 but will be moving to 6.5 this year.

This all started at the end of January. Around that time we had to reinstall VMware tools on our Openfire (Centos) VM in order to resolve an issue with nightly backup snapshots failing. Shortly thereafter, reports of random disconnects started coming in from our staff. I tried replacing the tools with the open source open-vm-tools, but the issue persisted. Finally, I built a whole new VM, migrated the DB, and deployed a new Spark client to everyone, and here we are.

It is odd to me that snapshots would cause the issue as they occur every hour on the hour for this VM and issues don’t always line up with that timing. Here are some examples of the warnings (names replaced with generic info):

2018.03.05 09:04:49 org.jivesoftware.openfire.IQRouter - User tried to authenticate with this server using an unknown receipient: <iq to="username@servername.domainname.net" id="Pz1cZ-95280" type="get" from="servername.domainname.net/35fbh407kv"><query xmlns="jabber:iq:last"></query></iq>  
2018.03.05 09:04:55 org.jivesoftware.openfire.IQRouter - User tried to authenticate with this server using an unknown receipient: <iq to="username@servername.domainname.net" id="Pz1cZ-95282" type="get" from="servername.domainname.net/35fbh407kv"><query xmlns="jabber:iq:last"></query></iq>  
2018.03.05 09:04:56 org.jivesoftware.openfire.IQRouter - User tried to authenticate with this server using an unknown receipient: <iq to="username@servername.domainname.net" id="Pz1cZ-95284" type="get" from="servername.domainname.net/35fbh407kv"><query xmlns="jabber:iq:last"></query></iq>  
2018.03.05 09:13:58 org.jivesoftware.openfire.IQRouter - User tried to authenticate with this server using an unknown receipient: <iq to="username@servername.domainname.net" id="JuN1n-36842" type="get" from="servername.domainname.net/2bvaot5ow1"><query xmlns="jabber:iq:last"></query></iq>  
2018.03.05 09:17:55 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x00000131: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)  
java.io.IOException: Connection reset by peer  
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)  
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)  
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)  
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)  
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)  
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)  
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
	at java.lang.Thread.run(Thread.java:748)

My final step is going to be to move this VM to a once a day snapshot schedule. We will risk more chat data loss in a disaster, but performance must come first.

We’ve had a few others issues with 4.2.x but nothing we couldn’t overcome. These have primarily been issues with AD users having their name changed or being disabled prior to being removed from the “local” groups we’ve setup within Openfire. This causes an error and makes the group inaccessible on the admin page for a period of time.

Ofttopic: Thanks for the tip regarding Skype for Business/Teams. Since we already pay for Office 365 and are considering Exchange 365 to replace our on-prem Exchange, we’re only a step away from walking off the plank and diving in to the whole package. I am just concerned that Skype for Business archiving won’t be as good as Openfire. We have to pull a lot of chat logs, unfortunately. I will do some reading up on Teams as well.

speedy · March 5, 2018, 3:43pm

sounds like a network related issue…are you by chance using the e1000 driver for your guest os?? if so, try switching to VMXNET3 to see if that helps.
also, what do your /etc/sysctl.conf looks like. you may need to tweak it a bit.

sledge · March 5, 2018, 4:03pm

Thanks, I appreciate the help.

The VM is configured with the VMXNET3 adapter. I believe VMware chose this as the default for a Centos 7 guest.

The /etc/sysctl.conf file is empty except for this default info:

# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).

Right now I am not seeing the specific user authentication errors, but I do see this repeating:

2018.03.05 10:36:17 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x000002B4: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)  
java.io.IOException: Connection reset by peer  
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)  
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)  
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)  
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)  
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)  
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)  
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
	at java.lang.Thread.run(Thread.java:748)  
2018.03.05 10:36:36 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x0000029F: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)  
java.io.IOException: Connection reset by peer  
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)  
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)  
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)  
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)  
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)  
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)  
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
	at java.lang.Thread.run(Thread.java:748)  
2018.03.05 10:41:14 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x0000029E: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)  
java.io.IOException: Connection reset by peer  
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)  
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)  
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)  
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)  
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)  
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)  
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
	at java.lang.Thread.run(Thread.java:748)  
2018.03.05 10:42:33 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x000001D1: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)  
java.io.IOException: Connection reset by peer  
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)  
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)  
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)  
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)  
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)  
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)  
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
	at java.lang.Thread.run(Thread.java:748)  
2018.03.05 10:51:32 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x00000126: nio socket, server, null => 0.0.0.0/0.0.0.0:5222)  
java.io.IOException: Connection reset by peer  
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)  
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)  
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)  
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)  
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)  
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)  
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
	at java.lang.Thread.run(Thread.java:748)

Here’s a look at top if it helps any (restarted VM yesterday for patching):

top - 10:53:34 up 1 day, 36 min,  1 user,  load average: 1.42, 1.26, 1.25
Tasks: 100 total,   1 running,  99 sleeping,   0 stopped,   0 zombie
%Cpu(s): 31.4 us, 16.9 sy,  0.0 ni, 47.3 id,  0.0 wa,  0.0 hi,  4.3 si,  0.0 st
KiB Mem :  8010788 total,  1665840 free,   973564 used,  5371384 buff/cache
KiB Swap:  8257532 total,  8257532 free,        0 used.  6752796 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1105 daemon    20   0 4690884 730824  16784 S  60.0  9.1  98:28.78 java
 1422 mysql     20   0 1629716  99964   9128 S  44.3  1.2  77:03.11 mysqld
  650 root      20   0  305028   6216   4764 S   0.3  0.1   1:40.02 vmtoolsd
    1 root      20   0  128168   6828   4060 S   0.0  0.1   0:02.00 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.27 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    7 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 migration/0
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
    9 root      20   0       0      0      0 S   0.0  0.0   0:03.58 rcu_sched
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.29 watchdog/0
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.35 watchdog/1
   12 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/1
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.23 ksoftirqd/1
   15 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H
   17 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs
   18 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 netns
   19 root      20   0       0      0      0 S   0.0  0.0   0:00.04 khungtaskd
   20 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 writeback
   21 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kintegrityd
   22 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 bioset
   23 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kblockd
   24 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 md
   30 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kswapd0
   31 root      25   5       0      0      0 S   0.0  0.0   0:00.00 ksmd
   32 root      39  19       0      0      0 S   0.0  0.0   0:00.43 khugepaged
   33 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 crypto
   41 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kthrotld
   43 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kmpath_rdacd
   45 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kpsmoused
   46 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 ipv6_addrconf
   66 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 deferwq
   98 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kauditd

sledge · March 5, 2018, 4:40pm

I’m going to try moving the volume to a storage replicated volume that does not use VMware snapshots. From what I’ve read, MySQL doesn’t quiesce properly anyway without pre and post-thaw scripts, so we’re probably not introducing any issues we wouldn’t have already had if we need to rely on the replicated snapshot to recover. If it is the VMware snapshots, this should clear up the issue.

speedy · March 5, 2018, 7:12pm

im not a linux expert, but you might want to check some of the values for max number of open files, as if I remember correctly, with each tcp connection a file is created…or something like that…anyway…that could be causing your issue . you might also take a look at net.core.somaxconn, net.core.netdev_max_backlog, net.ipv4.tcp_max_syn_backlog.

sledge · March 6, 2018, 8:31pm

Thanks, speedy. I’m not familiar with those settings either, but I’ll look into them. We have a lot of staff members who sign in from remote sites. A large portion of those users also switch from PC to PC throughout the day and log into Spark each time. I don’t know if this is potentially creating a lot of open files or connections as you suggest.

Eliminating the quiesced snapshots doesn’t seem to have resolved the issue, unfortunately. It was worth checking, but I think they are a red herring in this situation.

I’m a bit at a loss at this point. I am still seeing the “Connection reset by peer” warning logs. Currently we have 360 active users, 285 conversations, and 13,000 packets per minute with a high of around 14,000. I am assuming these aren’t really high statistics compared to some other companies that are using Openfire and Spark.

sledge · April 1, 2018, 8:55pm

Is it possible this issue is linked to OF-1481 (resolved in 4.2.3) or to OF-1497? I just updated our server to 4.2.3, but perhaps I should create and set the stream.management.active property to false?

EDIT: Tried adding stream.management.active property as false and rebooted. Spark would no longer login at all. I changed it back and restarted and was able to log in. Hopefully the update to 4.2.3 alone helps with the issues we are having.

wroot · April 2, 2018, 12:52pm

It’s weird that Spark was not able to login with that setting disabled. Spark doesn’t support Stream Management, but maybe when Stream Management is disabled on the server they can’t communicate for some reason. My first thought would be that these Openfire issues are not related to your problem. I have tried to set that setting to false on my test server and Spark was still able to connect (both 2.8.3 and 2.9.0). Also, SM issues are not fixed in Openfire 4.2.3.

speedy · April 2, 2018, 1:26pm

yeah, stream management shouldn’t have had any effect on spark. disabling on the server definitely should not have had any impact.

sledge · April 2, 2018, 1:54pm

Hmm, odd. Thanks for the info. The reason I am looking at those two issues is because in reviewing OF-1481, @guus comment included some error code that looked very similar to the error code I providing regarding “Connection reset by peer”.

My error:


2018.03.05 10:36:17 org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x000002B4: nio socket, server, null =&gt; 0.0.0.0/0.0.0.0:5222)  
java.io.IOException: Connection reset by peer  
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)  
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)  
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)  
	at sun.nio.ch.IOUtil.read(IOUtil.java:197)  
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)  
	at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)  
	at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)  
	at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)  
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  
	at java.lang.Thread.run(Thread.java:748)

The comment on OF-1481:

2018.02.13 19:40:54 WARN  [socket_c2s-thread-4]: org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x00000045: nio socket, server, /x.x.x.x:37637 => 
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:273)
        at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:44)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:690)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:664)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:653)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:67)
        at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1124)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:748)

These errors are nearly identical. Was this fixed in 4.2.3 or is this issue part of OF-1497? Is it not possible that this is causing our issue? We never had these issues in several years of using Openfire & Spark until the past few months.

wroot · April 2, 2018, 4:41pm

It’s a part of OF-1497 and the core issue with SM still exists. Errors look similar, but in your case it doesn’t have “[socket_c2s-thread-4]” and error code is not 0x00000045, though i’m not sure it should be the same. There are probably many ways for a connection to be reset. It might still be related. As Spark is using Smack library with SM support, although Spark itself hasn’t been modified to make use of it. But maybe on the library level it is still somehow operating.