Fastpath / Openfire keeps dropping end users

Hi,

We have a online text chat support that has been implemented ontop of Fastpath and Openfire 3.6.4. During local testing everything works great. However the problem is on the production site, users are frequently getting dropped. The users are typically outside the US. The server running the Web Application that connects to openfire is in the US. The agents use the latest version of Spark.

I’ve been reading the forums and trying different things, but not much luck yet. I’m wondering if there is some settings in openfire or the application server that might help. e.g. setting make openfire more fault tolerant for poor networks. I tried messing with xmpp.client.idle and it didn’t seem to help too much.

I’m thinking that possible the end users are on either poor networks, or they experience drops in networks connectivity. I did try this

set xmpp.client.idle to -1.

use the tools client in the openfire administration console

start chatting

stop my network on the PC

restart it quickly

The client window allows to for a bit to still send messages to Spark

The agent can still see what is being typed by client in Spark

Agent types in Spark but it is not seen on the client

After agent tries one or two times to send messages to the client, the client receives this message

The connection to the conversatinicknamaon has been lost.

Is 3.6.3 more stable? Do others see this issue?

Hi,

have you tried the last version of Pidgin? and with that version you get connections lost too?

in past my clients got disconected too, i updated to openfire 3.6.4, and update all clients to last version of Pidgin.

One thing to check, is if your clients got disconnected, you much traffic your server have, i see disconections sometimes, where a client sends thousands of messages less than one second, this behavior i only see in a specific client that uses Spark library in a wrong way. e.g: a loop without a timer.

Best regards

Clóvis

We are using Fastpath web chat. So from the end users are using a Web clients via the internet, the agents inside the firewall use 2.5.8 Spark Client. Right now, the agents are in Phillipines, England, and Malaysia. The end users of Fastpath web chat are the ones getting disconnected (via internet). If End User is in the US we basically never see them dropped. Lots of drops occur for users in Eastern Europe, China, India, Pakistan, Malaysia, etc.

The alert seems to be show to the client from chatmain.jsp. It comes from the connectionChecker

function connectionChecker(){
         var t = new Date().getTime();
         if(t > (lastChecked + 60000) && lastChecked != 0){
            chatHasEnded();
            alert("The connection to the conversation has been lost. Please close the window and try again.");
            window.close();
         }

         setTimeout("connectionChecker()", 5000);
     }

Chat connections being dropped sounds a lot like the same issues I had. Are you using the 9090/9091 (ssl) port connections or Apache proxy? If Apache, take a look at this doc

http://community.igniterealtime.org/docs/DOC-1876

Recently we isolated it to only Internet Explorer, no issues with firefox. Our systems is sort of like for end users internet -> Tivoli Web Seal SSO Infrastructure -> Bluecoatproxy -> IHS 6.1 - > WebSphere 6.1 -> fastpath web EAR -> openfire. On IHS we are not using a proxy pass. In fastpath web chat, there is a JavaScript function named connectionChecker. Some reason on IE, it finds the connection to be lost. We tried commenting it out, but the session really is lost, so it appears it is needed.

We plan to try dwr 1.1.4 instead of 1.1.1 and set the html headers to no-cache.

Any other ideas?

Just to be clear, it always works with Firefox, or at least chats never get dropped.

With IE8, sometimes chats gets dropped every few minutes. Sometimes it works with no issues during testing.

One way to duplicate the error is, start chat on IE8 as user A. start chat on Firefox user B. Stop chat as user A. On Spark Agent, leave window and chat open with User A. User A initiate chat again. So now agent has 3 chats tabbed in Spark, but one is no longer connected. Sit idle for about 2 or 3 minutes and IE8 chat will get disconnected with connectionChecker alert. Spark Agent will see user A has left the room.

I can’t duplicate this on OpenFire 3.6.4 and IE8 if that’s any help.

I’ve seen problems with connections being dropped at the queue stage - the queue page fails reloading, JavaScript breaks, it doesn’t try to reload again. A faster database improved the situation, not sure why, and so did error checking.

I would love to see FastPath webchat overhauled to include error checking for every request to the server.

Ok, I spoke too soon. I got disconnected in ie8 but it took around 15 minutes. Which suggests a timeout setting somewhere? I’m not the server admin.

@rmscott_75077: From the IE client in question, if you access http://<your.jabber.server>:9090/webchat/ and click one of the links presented there while communicating via spark as an agent, do you still encounter issues? Have you tried enabling debug mode on the Openfire server?

If I use IE to access http://<your.jabber.server>:9090/webchat yes, it still encounters issues. Will try enabling debug again, did not notice anything the first time.

also added these settings and dwr 1.1.4 drop from IE seem to go from about 3-5 to about 15 minutes

xmpp.client.idle -1

xmpp.httpbind.client.idle -1