Bitdefender Randomly Drops Clients

Yes, I believe we had this issue with some previous versions of the Spark client as well. This was also one of the reasons we upgraded to the latest version 2.9.4 - but the problem still exists. We also use the Client Control plugin (I contributed to some of its development, but cannot corelate anything in that code with the disconnections).

I disabled Stream Management and that doesn’t seem to make any difference. Clients get dropped whether it’s on or off. Even if it’s a stream management issue, would that make the client exit or crash so that it’s no longer running in memory? I’m not sure why that happens.

Sometimes the problem happens once in several weeks. While other times it can happen one in a couple of days.

We’ve been running openfire/spark for almost a decade and only recently are dealing with this nasty issue. I’m not sure exactly at what point it all began. I hope they’re able to figure it out. In the meantime, I’ll check the event logs on each disconnected client.

By the way, when you installed Spark 2.9.4, did you install it as an upgrade to a previous version or did you try doing a fresh install? I wonder if a fresh install would make any difference.

We have tried uninstalling, cleaning up directories and reinstalling and it “worked” for a while, but we have had users desktops which have never had the client, with a fresh Spark install and the disconnect happens all the time for them.
Stream Management is actually a “Keep Alive” functionality meant to have connections survive blips in the network. Stream Management was disabled programmatically in Spark 2.9.0, so there is no way to change it in the client. Disabling it in Openfire will not really do anything unless you use another client which still has Stream Management enabled, in which case I would not disable it in Openfire.

Do you believe that the problem can be resolved if they disable stream management on the Spark client? Also, have you tried using an entirely different xmpp client instead of Spark?

I checked all the client’s Windows event logs and found absolutely nothing out of the ordinary on those clients at the time they got dropped. I also checked their Spark logs and nothing was recorded at that time.

To be clear, the version tag doesn’t mean it is actually planned. Nothing is really planned with Spark as we usually have nobody working on it somewhat consistently. Contributors come and go. I mark some tickets with future version tags to not lose sight of them, so they are grouped and i can get the list by just opening a tag.

Also, i don’t remember there being disconnection problems because of SM in Spark 2.9.x. There was a problem that once Spark loses network connection, there will be a limbo session left on a server. That’s why SM was disabled in Spark. Because it doesn’t know how to properly reconnect to SM session, so its reconnect logic must be tuned for that.

Stream Management is disabled and cannot be enabled in Spark 2.9.0+.

This forum post hints at Stream Management being an issue with Spark.
In Spark-2140 you and Guus mention disabling Stream Management for now on 05-AUG-2020.

So Disable Smacks support for Stream Management (for now) by guusdk · Pull Request #502 · igniterealtime/Spark · GitHub disables SM for now? Until reconnection logic is adjusted?

As explained in XEP-0198: Stream Management

“This specification defines an XMPP protocol extension for active management of an XML stream between two XMPP entities, including features for stanza acknowledgements and stream resumption…is a feature that allows a client to ‘survive’ a network hiccup, without having to fully redo the authentication cycle.”

It is the stream resumption that seems to be at fault here. We have clients that suddenly cannot receive messages until they exit and reopen their Spark clients, which is a “redo of the authentication cycle.”
As for version tags it was giving me hope. Someone was actually making progress on the client. Dates were being set and met up to version 2.9.4 and now we’re just anxiously waiting.

Just curious, what is your environment and how often do the disconnections occur? Are your users actively using their computers at the time of disconnection? Or do you notice the disconnections while a workstation is in a “locked” state for several hours or days?

We have Openfire 4.6.1 (Linux Redhat) and Spark 2.9.4 (Windows 10). No, the times are random. The users have reported that they were actively using the machines at the time.

Thanks for your reply. I was trying to compare notes to see if I can find anything common between our environments. I recently updated to Openfire 4.6.2 at both sites which run on Windows 2019 Server. Our clients are also on Spark 2.9.4. Sometimes the drops don’t occur for a couple of weeks. Other times, it can occur within 2 or 3 days of each other.

Another thing that bothers me is that if this was a common issue, why aren’t more people complaining about it? Why are we the only ones with this problem?

For now, I’ll leave it alone and maybe try to do a fresh install from scratch in a few months when I have more downtime. I’ll keep you posted.

I’ve been following this thread and also started to observe Spark’s activity from my end. I am running Openfire 4.6.0 Beta on Cent OS 7. Spark 2.8.3 on Windows 8.
I left my system running for 32hrs and no issue returned by Spark.
Another thing - I also used Pidgin which also did not return any issue either.

I am not sure this issue only related to Spark itself, maybe there’s something else that you need to look into.

Update from another system that running same version of Windows, Spark version 2.9.4. and connected to same Openfire Server. No issue found so far.

Thanks for your input. Glad to hear it’s all working for you.

The issue is not every client. We will have 78 users connected to the Openfire Server 4.6.1 with Spark 2.9.4. Suddenly one user stops receiving messages. Was it because he walked away and his client timed out and then did not reconnect properly? Were they actively working and were disconnected due to a network blip and the client did not reconnect correctly? It all seems to be related to a disconnect reconnect issue; however, getting users to explain the situation of the last time they correctly received messages and when they disconnected is difficult. There are 50 some users who claim to never get disconnected or have the issue. We have tried a fresh install of both server and clients.

I may suggest you to run some test by using Different XMPP client on the same computer (instead of Spark) and monitor if they are getting disconnected or not.

I myself run Pidgin, Jitsi (Desktop) and Spark on same computer and left it to go idle mode. After re-login I found all three still online. So could not produce the issue you are facing.

Something I need to mention, not sure if this will help you - I did not use Wifi.

In reply your questions -

Was it because he walked away and his client timed out and then did not reconnect properly?
Could you please monitor the User Session from OF Admin portal? The user who’s idle. Just ask the person to inform you before he/she log in his/her computer.
Monitor it when the computer is idle, check client’s session and check if the session start when the person log in.

Were they actively working and were disconnected due to a network blip and the client did not reconnect correctly?

  • This could be the issue. This is the reason I asked you to run above test.

Also check those users who’s using wired connection (not Wifi). I am still not sure but it may help you. Sorry, I could not test using Wifi as my Router is not functioning properly as I noticed latency.

All are using wired connections. We do not allow WiFi within our network.

More information:
I was experiencing force closes while actively using the product Spark 2.9.4 this morning. I immediately turned on Debugging. Within 15 minutes I had the following errors with the Spark client force closing after.

2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.muc.spi.LocalMUCUser - Request from '<My username>@<XMPP Domain Name>/<My WorkstationName>' to join room '<Room ID>' rejected: request did not specify a nickname 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.spi.RoutingTableImpl - Failed to route packet to JID: <My username>@<XMPP Domain Name>/<My WorkstationName> packet: <presence type="error" from="<Room ID>@conference.<XMPP Domain Name>" to="<My username>@<XMPP Domain Name>/<My WorkstationName>"><error code="400" type="modify"><jid-malformed xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xmlns="urn:ietf:params:xml:ns:xmpp-stanzas">A nickname (resource-part) is required in order to join a room.</text></error></presence> 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.PresenceRouter - Presence sent to unreachable address: <presence type="error" from="<Room ID>@conference.<XMPP Domain Name>" to="<My username>@<XMPP Domain Name>/<My WorkstationName>"><error code="400" type="modify"><jid-malformed xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xmlns="urn:ietf:params:xml:ns:xmpp-stanzas">A nickname (resource-part) is required in order to join a room.</text></error></presence> 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.spi.PresenceManagerImpl - Recording 'last activity' for user '<My username>'. 2021.02.15 10:27:20 DEBUG [socket_c2s-thread-24]: org.jivesoftware.openfire.spi.RoutingTableImpl - Removing client route <My username>@<XMPP Domain Name>/<My WorkstationName> 2021.02.15 10:27:20 DEBUG [Thread-16663]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - Broadcasting presence update in room <Room ID> for occupant <Room ID>@conference.<XMPP Domain Name>/<My Full Name>

I then seem to connect back just fine:

2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - User '<My username>@<XMPP Domain Name>/<My WorkstationName>' attempts to join room '<Room ID>@conference.<XMPP Domain Name>' using nickname '<My Full Name>'. 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - User '<My username>@<XMPP Domain Name>/<My WorkstationName>' role and affiliation in room '<Room ID>@conference.<XMPP Domain Name> are determined to be: participant, member 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - Checking all preconditions for user '<My username>@<XMPP Domain Name>/<My WorkstationName>' to join room '<Room ID>@conference.<XMPP Domain Name>'. 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - All preconditions for user '<My username>@<XMPP Domain Name>/<My WorkstationName>' to join room '<Room ID>@conference.<XMPP Domain Name>' have been met. User can join the room. 2021.02.15 10:31:18 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.muc.spi.LocalMUCRoom - Adding user '<My username>@<XMPP Domain Name>/<My WorkstationName>' as an occupant of room '<Room ID>@conference.<XMPP Domain Name>' using nickname '<My Full Name>'.

I then get kicked for a different reason.

2021.02.15 10:33:20 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.spi.RoutingTableImpl - Failed to route packet to JID: <My username>@<XMPP Domain Name>/<My WorkstationName> packet: <message to="<My username>@<XMPP Domain Name>/<My WorkstationName>" id="hXNir-1412" type="chat" from="<Responding username>@<XMPP Domain Name>/<Responding user's WorkstationName>"><thread>IACtpY</thread><paused xmlns="http://jabber.org/protocol/chatstates"></paused></message> 2021.02.15 10:33:20 DEBUG [socket_c2s-thread-25]: org.jivesoftware.openfire.MessageRouter - Message sent to unreachable address: <message to="<My username>@<XMPP Domain Name>/<My WorkstationName>" id="hXNir-1412" type="chat" from="<Responding username>@<XMPP Domain Name>/<Responding user's WorkstationName>"><thread>IACtpY</thread><paused xmlns="http://jabber.org/protocol/chatstates"></paused></message>

I am curious why I am seeing http://jabber.org/prtotocol/chatstates in my logs.

One of my sites has 6 users that leave their Windows 10 computers locked after they leave for the day. So Spark remains running and presence shows as “away”.

I’m not certain that switching to another 3rd party XMPP client would prove anything. I say this because it’s not like I have one client here and there getting disconnected at random times. Sometimes I have 4 out of the 6 clients that get disconnected at once or within a minute of each other. So I believe that the problem originates centrally from the Openfire server and not at each individual client workstation. But out of curiosity, I will indeed setup a test workstation running Pidgin or Jitsi to see if it ever gets dropped.

Here is a portion of the Openfire warn.log as it shows 4 clients getting dropped early in the morning. The first 2 clients get dropped within 30 seconds of each other, while the next 2 clients get dropped within 15 seconds of each other. Notice the times of occurrence. Keep in mind that these clients have been running now for about 2 weeks without any drops until early yesterday morning. So for a true test, you would need to leave a client running for at least a couple of weeks before seeing a drop actually occur.

We don’t have any wireless connected devices at this site. All workstations are connected with Cat6 Ethernet cable. Do you have any other ideas?

We’re currently on Openfire 4.6.2 and Spark 2.9.4

Hmmm… not sure about why you’re seeing those log entries. We use Spark as a simple instant messaging system without setting up multi-user chat/conference rooms.

Did you notice when you became disconnected, was your Spark client still running or has it totally crashed/exited ?