Users report continuous drops and reconnects to OpenFire

OpenFire 3.6.4

This is an installation that has been running for a month (no issues). Now, users report on their clients (Ex. Pigeon or Spark) that they continuously get dropped from a chat room and reconnect.

What messages or logs should I look at in server?

------------Update below.

After reading the update below, please provide feedback on:

  • If this is a known bug, what is the work-around or time-frame to fix?

  • If you need further information?

  • How can we prevent this in the future?

Here is an update to our problem:

After analyzing the client logs, and working with the folks that support Pidgin, they believe the problem we have is a known bug in Openfire.

Bug: The OpenFire server will send (forward) malformed XML 1.0 messages to clients. In this case, the malformed XML message came from someone in the TSN-FL room that contained the following invalid XML code: �
This caused the Pidgin client to report a number of errors and disconnect from the server. (all group rooms and unicast IM’s disconnected)
This problem is specific to group-chat in our environment. It should self-correct for unicast IM.

Through the Pidgin debug logs, the problem was traced to a message that was coming from the ‘TSN-FL’ room. The offending message appears to be a cut & past of a Utopia e-mail notification message.

Further (part of our issues) everytime a user enters the ‘TSN-FL’ room, that last 25 messages in ‘history’ streamed out including the message with the bad XML code. My Pidgin client was configured to auto-login to a number of rooms. Everytime that I (re)started Pidgin, the problem repeated when the client auto-entered the TSN-FL room. Further, Pidgin tries to re-login at regular intervals after the initial ‘XML Parse error’ so this is why we see the repeating ‘enter/leave’ status messages.
In hindsight, had I typed 25 messages () in the ‘TSN-FL’ room I could have cleared out the offending message. More messages in the FL room from other users would have accomplished the same thing eventually.

FYI: Pigeon support ticket:

http://developer.pidgin.im/ticket/9378

You can check errors log.

I think this might be a separate issue. I am running the lastest Pidgin 2.5.7 connecting to Openfire 3.6.4 and the error message Pidgin states is: Ping Timeout. So to me it seems like an issue with how pings are being handled.

It’s very odd, randomly users will disconnect / reconnect serveral times. Restarting Openfire is the only thing that seems to settle the clients down.

You can try setting system property (create it ig there is none) xmpp.idle to -1

See if this changes anything. This should make Openfire not to disconnect idle sessions.

wroot wrote:

You can try setting system property (create it ig there is none) xmpp.idle to -1

Thanks. This also helps with Miranda 0.8.x builds.

But why Miranda 0.7.x works fine without this property?

So if other clients reconnecting too, is there some bug in Openfire?

Discussion at Miranda bugtracker:

http://bugs.miranda-im.org/view.php?id=887

Probably something has been changes between 0.7 and 0.8 about the heartbeat functionality. I mean the process how client is sending small packets to a server that it’s active (“alive”). I can’t say for sure is this a bug or a feature. This system property makes Openfire not to care about heartbeats or anything and keep all clients connected all the time.

Seems like a bug to me. I don’t know if it’s the client’s fault or the server’s. The issue rarely happens…like once every 2 or 3 weeks. I just wish I knew what caused it to behave the way it did. I’d rather not set it to never disconnect, it seems as though I would be disabling a feature that client/server’s should easily handle.

Actually i have filed this as a bug while ago - JM-1533. But it is hard to reproduce. I think only a developer who wrote this code or some community member with good java programming skills can find what is causing that. But, developer is almost not working with Openfire and community doesnt have enough skills yet.

Just experienced the random disconnects again…so strange. Anyways, I added xmpp.idle -1 to my Openfire system properties, restarted the service, and that has not corrected the issue.

After restarting the service things will run fine for a week or so. When the disconnect/reconnect begins again I just restart the service.

Openfire: 3.6.4 (running on RedHat 5.3)

Pidgin: 2.5.8

Does the xmpp.idle option affect the BOSH clients?

i have seen this is more of a temp fix, and then you have to restart the server every couple of weeks.

has this been fixed? or is there a better way to fix this?

restarting a service as important to our company every 2 weeks just doesn’t seem like it is the direction i want to go.

also is this happening for any of the larger companies running openfire? 1000+ users? if not suggestions as to what you might have changed?

Upgrade to the latest version of Pidgin and the disconnects will stop.