Openfire continually crashing after upgrade to 3.10 (org.jivesoftware.openfire.nio.NIOConnection)

It was run on existed db with test users that we have for experiments like this )

Actually my first suspect is improper C2S channel close so if this pool is wide enough then this issue may not appear I guess, how much connections for C2S pool u have for that open server?

Ignite’s server uses the default settings. You should not be seeing this issue with 50 clients regardless. We were seeing reports of this prior to the 3.10.2 release, but the hope was that the downgrade of MINA fixed it.

I have created accounts tester0 through tester50, with the same password as username, please try that.

Thanks, I run it several times, any errors on your side?

Nope

How about now? Pls check logs on your side…
I buried my Openfire C2S pool on t2.small amazon instance several times today, your server is pretty fast comparing to t2.small and I think all is about improper close channel logic and races on packet delivery fail. I can reproduce it pretty often on my t2.small just with 1-1 chat but massive message flow and accidental connection drop. Channel goes into some recursion and freeze for future requests, thus after several attempts all c2s pool is dead.
So from my point of view and according to stack trace it is not related to mina at all, see the stack below (that I previously posted) - NIOConnection.close + NIOConnection.delivery - several times in one stack - it tries to store the message with backupDeliverer and finally fails when reach connection where backupDeliverer == null, see the NIOConnection class line 314 that is the final destination of failure.
This logic was changed while working on OF-857 but recursion still here I guess and taking place under massive connection drop (eg mobile clients as we have) and leads to death of all C2S connection pool.
That is how I see it…

Interesting, I don’t find any new tracebacks during your testing. More boggling appears to be in store to figure out what may be happening.

Updated https://igniterealtime.org/issues/browse/OF-903

My main concern is that in stack trace same action repeats several times - it looks like recurrent calls but changes made to NIOConnection supposed to fix it, so from my point of view there is a bug in logic of processing fallback message and notifications on channel close. NPE looks suspicious too - it fails in the place where backupDeliverer is not expected to be empty (mb need additional check there or this is the effect of improper post mortem message delivery), so mb update [OF-903] ISE attempting to write data to a closed/closing session - Jive Software Open Source with NPE too?

C2S death is not related to NioConnection exception, so pls ignore my notice that Openfire stops responding. Modified code and my bug caused this, sorry.

@Daryl Herzmann, I looked into stack traces again, and starting to think that there is no problem - only unnecessary error reporting. It looks like unavailable presence status is sent on channel close to all subscribers and some of their connection is on closing routine too, so it just reports that cant deliver unavailable status to one of the subscriber coz it is not alive… What do you think?

I am worried in the case of when this happens on stanzas though, so that those are not being lost. Perhaps in the past, theses were silently lost and nobody notices. I am unsure if anybody excepts 100% message delivery fidelity in Openfire, I don’t think that is a tenable goal.

Ok, mb exclude from error reporting presence packets only?

We keep hitting the same error the last couple of weeks after successfully running 3.10.2 on Centos 7 under JRE7 build 80 for 2 months!

Issue 903 is inaccessible within JIRA and we don’t know whether it has been addressed in a nightly build or not.

I attach our log with the error.
openfire_localsession_internal_error.txt.zip (7036 Bytes)

The Jira issue this morning has been resolved, tickets should be accessible again.

We upgraded to 3.10.2 some time ago but we’ve been increasingly receiving complaints of users appearing online when they are confirmed as being offline. We’re seeing the same errors were were seeing in 3.10:

2015.11.05 17:15:01 org.jivesoftware.openfire.nio.NIOConnection - Failed to deliver packet: awayI’m not here right now<c xmlns="http://jabber.org/

protocol/caps" node=“http://pidgin.im/” hash=“sha-1” ver=“I22W7CegORwdbnu0ZiQwGpxr0Go=”/><pho

to/>

2015.11.05 17:15:01 org.jivesoftware.openfire.session.LocalSession - Internal server error

java.lang.IllegalStateException: Connection closed

Is there a solutions for this issue for Openfire 3.10.2 ? I need to restart my Openfire server every now and then.

Is there something preventing you from using 3.10.3 or even 4.0.1 release?

It’s very hard to get a Window for an upgrade as Openfire is being utilized as a corporate IM spanning 5 different continents in our organization. Is the issue already resolved in versions 3.10.3 and 4.0.1 releases ?

Has the issue been resolved in 3.10.3 and 4.0.1 ?