Bitdefender Randomly Drops Clients

I’ve been following this thread and also started to observe Spark’s activity from my end. I am running Openfire 4.6.0 Beta on Cent OS 7. Spark 2.8.3 on Windows 8.
I left my system running for 32hrs and no issue returned by Spark.
Another thing - I also used Pidgin which also did not return any issue either.

I am not sure this issue only related to Spark itself, maybe there’s something else that you need to look into.

Update from another system that running same version of Windows, Spark version 2.9.4. and connected to same Openfire Server. No issue found so far.

Thanks for your input. Glad to hear it’s all working for you.

The issue is not every client. We will have 78 users connected to the Openfire Server 4.6.1 with Spark 2.9.4. Suddenly one user stops receiving messages. Was it because he walked away and his client timed out and then did not reconnect properly? Were they actively working and were disconnected due to a network blip and the client did not reconnect correctly? It all seems to be related to a disconnect reconnect issue; however, getting users to explain the situation of the last time they correctly received messages and when they disconnected is difficult. There are 50 some users who claim to never get disconnected or have the issue. We have tried a fresh install of both server and clients.

I may suggest you to run some test by using Different XMPP client on the same computer (instead of Spark) and monitor if they are getting disconnected or not.

I myself run Pidgin, Jitsi (Desktop) and Spark on same computer and left it to go idle mode. After re-login I found all three still online. So could not produce the issue you are facing.

Something I need to mention, not sure if this will help you - I did not use Wifi.

In reply your questions -

Was it because he walked away and his client timed out and then did not reconnect properly?
Could you please monitor the User Session from OF Admin portal? The user who’s idle. Just ask the person to inform you before he/she log in his/her computer.
Monitor it when the computer is idle, check client’s session and check if the session start when the person log in.

Were they actively working and were disconnected due to a network blip and the client did not reconnect correctly?

  • This could be the issue. This is the reason I asked you to run above test.

Also check those users who’s using wired connection (not Wifi). I am still not sure but it may help you. Sorry, I could not test using Wifi as my Router is not functioning properly as I noticed latency.

All are using wired connections. We do not allow WiFi within our network.

More information:
I was experiencing force closes while actively using the product Spark 2.9.4 this morning. I immediately turned on Debugging. Within 15 minutes I had the following errors with the Spark client force closing after.

2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.muc.spi.LocalMUCUser - Request from '<My username>@<XMPP Domain Name>/<My WorkstationName>' to join room '<Room ID>' rejected: request did not specify a nickname 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.spi.RoutingTableImpl - Failed to route packet to JID: <My username>@<XMPP Domain Name>/<My WorkstationName> packet: <presence type="error" from="<Room ID>@conference.<XMPP Domain Name>" to="<My username>@<XMPP Domain Name>/<My WorkstationName>"><error code="400" type="modify"><jid-malformed xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xmlns="urn:ietf:params:xml:ns:xmpp-stanzas">A nickname (resource-part) is required in order to join a room.</text></error></presence> 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.PresenceRouter - Presence sent to unreachable address: <presence type="error" from="<Room ID>@conference.<XMPP Domain Name>" to="<My username>@<XMPP Domain Name>/<My WorkstationName>"><error code="400" type="modify"><jid-malformed xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xmlns="urn:ietf:params:xml:ns:xmpp-stanzas">A nickname (resource-part) is required in order to join a room.</text></error></presence> 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.spi.PresenceManagerImpl - Recording 'last activity' for user '<My username>'. 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.spi.RoutingTableImpl - Removing client route <My username>@<XMPP Domain Name>/<My WorkstationName> 2021.02.15 10:27:20 DEBUG [Thread-16663]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - Broadcasting presence update in room <Room ID> for occupant <Room ID>@conference.<XMPP Domain Name>/<My Full Name>

I then seem to connect back just fine:

2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - User '<My username>@<XMPP Domain Name>/<My WorkstationName>' attempts to join room '<Room ID>@conference.<XMPP Domain Name>' using nickname '<My Full Name>'. 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - User '<My username>@<XMPP Domain Name>/<My WorkstationName>' role and affiliation in room '<Room ID>@conference.<XMPP Domain Name> are determined to be: participant, member 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - Checking all preconditions for user '<My username>@<XMPP Domain Name>/<My WorkstationName>' to join room '<Room ID>@conference.<XMPP Domain Name>'. 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - All preconditions for user '<My username>@<XMPP Domain Name>/<My WorkstationName>' to join room '<Room ID>@conference.<XMPP Domain Name>' have been met. User can join the room. 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - Adding user '<My username>@<XMPP Domain Name>/<My WorkstationName>' as an occupant of room '<Room ID>@conference.<XMPP Domain Name>' using nickname '<My Full Name>'.

I then get kicked for a different reason.

2021.02.15 10:33:20 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.spi.RoutingTableImpl - Failed to route packet to JID: <My username>@<XMPP Domain Name>/<My WorkstationName> packet: <message to="<My username>@<XMPP Domain Name>/<My WorkstationName>" id="hXNir-1412" type="chat" from="<Responding username>@<XMPP Domain Name>/<Responding user's WorkstationName>"><thread>IACtpY</thread><paused xmlns="http://jabber.org/protocol/chatstates"></paused></message> 2021.02.15 10:33:20 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.MessageRouter - Message sent to unreachable address: <message to="<My username>@<XMPP Domain Name>/<My WorkstationName>" id="hXNir-1412" type="chat" from="<Responding username>@<XMPP Domain Name>/<Responding user's WorkstationName>"><thread>IACtpY</thread><paused xmlns="http://jabber.org/protocol/chatstates"></paused></message>

I am curious why I am seeing http://jabber.org/prtotocol/chatstates in my logs.

One of my sites has 6 users that leave their Windows 10 computers locked after they leave for the day. So Spark remains running and presence shows as “away”.

I’m not certain that switching to another 3rd party XMPP client would prove anything. I say this because it’s not like I have one client here and there getting disconnected at random times. Sometimes I have 4 out of the 6 clients that get disconnected at once or within a minute of each other. So I believe that the problem originates centrally from the Openfire server and not at each individual client workstation. But out of curiosity, I will indeed setup a test workstation running Pidgin or Jitsi to see if it ever gets dropped.

Here is a portion of the Openfire warn.log as it shows 4 clients getting dropped early in the morning. The first 2 clients get dropped within 30 seconds of each other, while the next 2 clients get dropped within 15 seconds of each other. Notice the times of occurrence. Keep in mind that these clients have been running now for about 2 weeks without any drops until early yesterday morning. So for a true test, you would need to leave a client running for at least a couple of weeks before seeing a drop actually occur.

We don’t have any wireless connected devices at this site. All workstations are connected with Cat6 Ethernet cable. Do you have any other ideas?

We’re currently on Openfire 4.6.2 and Spark 2.9.4

Hmmm… not sure about why you’re seeing those log entries. We use Spark as a simple instant messaging system without setting up multi-user chat/conference rooms.

Did you notice when you became disconnected, was your Spark client still running or has it totally crashed/exited ?

Mine fully crashed and exited. I do not know if anyone else is seeing behavior like that.

Yes, here too. When the Spark clients are dropped, they also crash/abort/exit and are no longer running on those workstations. Spark has to be started again to reconnect to the server and continue functioning.

I set up a test machine and installed Pidgin. Will keep an eye on it to see how long it stays connected.

By the way, according to your logs, do your clients also get dropped within seconds of each other? See the sample log entries I posted earlier today displaying this very behavior.

Thanks for sharing the screenshot. I can see the message and I also marked it red.
I am curious, why it’s saying “Forcibly closed by remote host”

That makes two of us. :thinking:

Before I comment anything I need the following information -

All those client’s which’s been disconnected now and then, are they connected to same network (LAN, Router)?
Are those computers running same version of OS and running same version of Spark client?

Since OF log showing that message, so it could be something that closing Client Session from client end (as some of you noticed Spark crashed).
OR
something that sitting middle of Client and XMPP entity which is the reason behind this (this only happens if all those clients connected to same LAN/Router).

There could be some truth in this - it could be as trivial as some kind of buffer overflow in a low-level networking device like a switch. That said, Spark really should be able to survive such a hiccup (or at least seamlessly reconnect).

Just curious, will it create any issue if there’s anything in the network that drops traffic or conflict with port 9090/9091?

I never run any test but I saw issue while working on VOIP. Port and traffic getting blocked by the network so clients were appearing online but was not able to deliver/receive any data.

I can say that the problem is in the wrong routing of traffic or either incorrectly configured network equipment.
I would launch Wireshark and watch the packet movement.

Here’s an example, my users don’t turn off Spark for months.