Some users messages are not recieved

After upgrading to 3.5.2, the server will be pretty good for about a week, then some users (random assortment) will find they can recieve messages, but when they reply, the recipient will not get the message. Signing out and back into the client (Pandion) doesnt help - same behavior. Clearing the server cache causes everyone not to work without a server reboot. I did find that if I just lock the user out and then unlock them, they start working again, but I need a solution to prevent it from happening. Any thoughts?

I really need some help with this. Even if nobody knows anything about it, is anyone aware of support I can pay for with it? The closest thing I can find thru this site is for the Enterprise version only which is discontinued (and we do not have).

Any assistance woulrd be greatly apprciated.

Greetings everyone,

(Pandion)- looks like a nice program. However; for the users that I continually have this issue with are rather spriatic. (Sorry my spelling is terrible).

Presently, i have my users on SPARK 2.5.8 and it works great for the most part - however; there are times that Spark seems to freak out and not send the message or the message is sent but it may never get to the other user.

Also, this occurs sometimes with broadcast messages.

I am running OF 3.5.2 as well.

So, yeah I am a little bit in the same boat.

cheers.

Support services:

Are you seeing any relevant errors in logs? If you are not sure, just put them all here. This is happening only with Pandion client? Have you tested that with other clients? Pandion is not updated for a very long time.

Hey there David, I will check the logs again - I didn’t see anything from that point and time - however; I will look again and report back.

There might be something there now that the server has been up working decent - not great, but decent for about a week and one full weekend now.

Thanks and more this afternoon or tomorrow.

Cheers.

Havn’t been able to find anything in any logs that points at the problem user(s). Currently I don’t have anyone reporting this is happening to them (though this morning it was happening to our CIO - had to fix that one before getting a chance to look into things with it). I have attached a copy of our error log. The user with issues this morning was mjs223 - which I do not see in the logs. you will also notice ALOT of ‘include non-existant username’ messages. We put our users from Active directory, so I’m thinking that is normal behavior (its always done that anyway).

Thanks - If there is ANYTHING else I can provide that you might need, please feel free to ask…

Well, i’m not a java geek myself. So these logs could be useful for a developer if one jumps to that thread. I see OutOfBounds exceptions in logs. Not sure what this means. What are the specs of you server? CPU, RAM. What OS?

Server is Dell 1955 balde - 2 @ 3GHz processors, 2GB Physical Ram running Windows Server 2003 (sp2). SQL is on a separate box. Neither this server, its SQL backend server, nor the network have shown any sign of being taxed at all. I’ve always expected it might be some caching thing, but Im not a Java hound (nor SQL guy) so wasn’t able to figure out much.

Forgot to ask how many users do you have, and how many online users. Though not only number counts. Depends on messaging activity also.

I honestly have never known how to tell exactly. There are 465 registered. I would guess about half that online most of thie time. As for activity - its not super busy. I would say the average user messages a few times an hour.

One aside, our Roster Cache is the only cache that is ever very high. Usually rides around the high 80% area with a ~50% effectiventess. I’ve always kind of wondered if that was filling up to 100 and truncating off active users or something.

Kinda off the rough path here a second, but how does your java ram look on the splash page when you log into the admin console. This might not have anything at all to do with anything - but if its running out of virtual ram in Java this might be an issue. - I ran into that last week and I had to make some fixes accordingly to a particular file in the root directory of openfire. It fixed one of the problems I had with users not able to login and stay logged in.

I know this is a little off subject, but it might have a little something to do with it. (Seems that openfire run straictly on a virtual bases with Java), which is nice; but it sucks ram down like a bad tree dripping sap. I know I made it sound bad, but at the same time - when its only preconfigured for 64 megs its a bore to have to try and figure out.

I am going to go back and check my roster caching and see if that is being taxed any…

More to come.

Does this look normal or do you think I have a problem and need to look farther into the roster caching causing the loss of message issue?

I am curious is there a way to change this to expand the max size to make it more current to the effectiveness. 20.5% is rough - at least in my head it is?

Suggestions always welcomed.

Cache Name
Max Size
Current Size
Percent Used
Effectiveness*


Roster
0.25 MB
0.24 MB
94.1%
20.5%

Java Memory looks fine (65MB of 493MB - 13% used). Doesnt look like thats the culprit, but I’ll keep an eye on that as time goes on though…

I’ve kind of suspected something with caching and the Roster. I have found that if I lock and then unlock a problem user, they work ok again. the ONLY other solution I have found is rebooting the server. Neither of these is anyhting like a long term solution though - just fixing the symptom. I’m next going to see if a user with a problem (next time I find one) has any interesting flags in the database tables. Don’t really know what Im looking for thugh (nor why they would get messed up) but I’m trying to find trends.

Sounds like a plan to look for trends - in your shoes I guess I would as well. yikes.

Something in my gut tells me that its not there - but its a good idea to go ahead and see what happens with the SQL and look for a trend?

If there are logs from the SQL server - there might be something logged there even that tells of a communication interuption at a specific part of a table that sits on the database that might be able to be recalled later - who knows?

I am not a SQL gooroo or a java gooroo…

I am trying to figure out what to do next with the fact that messages are disappearing where people are standing. interesting idea for sure.

More to come I am sure throughout the day.

Well, after finding a problem user, I was unable to find anything special about them in the database backend. Any chance the certificates could be tied in somehow? I’m totally not a cert guy, but could there be a limited number and it just truncates off the oldest user when a new one comes in? I’m so out of ideas…

Good morning,

Well to be honest - I am not a certificate person either - never had to be forced to do the research on those…I am kinda stuck for ya and feel bad about it.

Check the user once you duplicate a user from another user if you are using LDAP - switch to LDAP and drop the SQL DB all together and see what happens ---- restart the openfire server and see if that user does it right off the bat. That might be something to look into - if you have specific certificates on the 2003 server side that are being used somehow and some way - its possible however; I don’t know what you got running on the back end. I am just running with my thinking as I type. Forgive me if it doesn’t make any sense.

At the same time - Something tells me this is on the server side with Openfire giving you problems with a quiry to the server and vise versa?

Do a tester run with Spark and see if its the program that is giving you fits?

So far - the past couple of days I haven’t had any problems from the server with messages not getting there. Then again I changed the location of the behimmith and moved it to a more stable switch.

Does your network or Proxy’s do filtering out the wasoo? Just curious.

Basically, in my head I feel its something to do with the Openfire server itself - but not totally sure. (I doubt myself to much anyway).

However; when it comes to database stuff in on my server - it simply just gets the user information from the LDAP on the Win - 2003 Server! Thus its techically an embedded server.

Let me know what happens with Spark - all ideas are on the table folks?

Cheers.

I would install some of the newer plugins to give you a better feel for the usage of your server. The monitoring service will provide stas even if you do not choose to log chats.

I have had issues with users not getting messages as well generally I would uninstall and reinstall spark, and delete their local spark settings to fix it. This only worked for a short while. That sadi I now have been trying using the online versions of spark (no java included). The users i install this version of spark have not had a reoccurance.

Interesting thoughts - a good thought to try.

I agree - Then again I assumed to much - seems I do that from time to time -

I assumed that David already had this installed and had checked it.

I bet I am mistaken on this but Todd has a great point for sure to see whats going on down there.

I idi install Spark (online version) on a machine that the user was having a problem from (we usually use Pandion - user has problem at any mochine once it crops up). So:

  1. Verified Pandion having problem with messages not being recieved

  2. Installed Spark (online ver) - worked fine

  3. Stopped Spark and reran pandion - Still has problem.

  4. For kicks, in Pandion I chaned the encryption from use TLS if avail to use SSL and it worked.

Odd thing is that I have alot of users using TSL if avail (that is the default) and most of the time they dont have problems. I’ve yet to see if any SSL users get this issue - won’t know for a while as I am not sure if changing that setting simply caused the server to get its brain back with reguards to that user. The thing is though, Spark worked, Pandion didnt. Starting to think I have a client issue - not a server issue as I originally thought, though I dont know what kind of encryption Spark trys to use by default.

Ok, After a while I have found that Spark doesnt work perfect either (though it does work better).

Pandion: With Pandion, some users after a while could recieve messages, but anything they sent disappeared. Picks on differnt users, no defined times. Only thing I found that fixes the behavior is a server reboot.

Spark: With Spark, the user can send and recieve ok, however any user that showed this problem, shows up in Spark as ‘offline’ even though they do send and recieve. Further, spark closes down every 10 or 15 min.

From the Admin Console, the user does show up as online. It’s as if the server doesnt know they are there anymore and won’t let go of that state.

I need help - CIO is coming down on me - any thoughts?!