Repeated exception (MINA / SSL related?)

Hiya,

Sadly, I can’'t be sure when this started, but we get a lot of the Exception shown below in log.error. Two things that stick out:

  • The error occurs every 30 seconds to the second.

  • The error re-occurs after we restart Openfire, but not with the same predictable frequency.

Possible results MAY include (I can’'t be certain that they are related) a complete service freeze: no new connections allowed, and stable file descriptor count and heap space usage (as in, a completely horizontal line).

One full stacktrace of the Exception looks like this:

2007.03.19 22:08:59 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
java.lang.IllegalArgumentException
        at java.nio.Buffer.position(Buffer.java:218)
        at org.apache.mina.filter.support.SSLHandler.messageReceived(SSLHandler.java:303)
        at org.apache.mina.filter.SSLFilter.messageReceived(SSLFilter.java:393)
        at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:362)
        at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
        at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:800)
        at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.messageReceived(AbstractIoFilterChain.java:617)
        at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:362)
        at org.apache.mina.common.support.AbstractIoFilterChain.fireMessageReceived(AbstractIoFilterChain.java:353)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor.read(SocketIoProcessor.java:246)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor.process(SocketIoProcessor.java:206)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor.access$500(SocketIoProcessor.java:44)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:502)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:43)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
        at java.lang.Thread.run(Thread.java:595)

The exception is caused by a MINA class (org.apache.mina.filter.support.SSLHandler) that in its messageReceived() method calls Buffer.position(int newPosition) with a value that’'s either negative or larger than the buffer size:

// ... snippet from org.apache.mina.filter.support.SSLHandler.messageReceived()
if( isInboundDone() )
{
    // Rewind the MINA buffer if not all data is processed and inbound is finished.
    buf.position( buf.position() - inNetBuffer.position() );
    inNetBuffer.clear();
}

As an indicator of the frequency, I’'ll give you an idea of what our error logs look like.

Doing this:

cd wildfire/logs/
grep -h "org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught" error*|sort

gives us:

2007.03.18 21:32:25 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:32:55 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:33:25 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:33:55 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:34:25 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:34:55 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:35:25 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:35:55 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:36:25 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:36:55 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:37:25 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:37:55 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)] 2007.03.18 21:38:25 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]

After the server restart, the frequency changed to something less predictable. This is an hour or so after the restart:

2007.03.20 09:08:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:08:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:08:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:10 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:11 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:12 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:12 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:13 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:13 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:13 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:13 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:13 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:40 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:41 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:42 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:42 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]
2007.03.20 09:09:43 [org.jivesoftware.wildfire.nio.ConnectionHandler.exceptionCaught(ConnectionHandler.java:109)]

We’'re running Openfire 3.2.2, with one manual adjustment that sets changes:

ioSession.write(buffer).join();

to

ioSession.write(buffer).join(70000);

in org.jivesoftware.wildfire.nio.NIOConnection.java

to work around JM-993. Openfire 3.2.3 includes a similar fix, but sets the default timeout to two seconds, instead of seventy.

Hey Guus,

Thanks for this detailed input. I reported this issue in MINA mailing list and I’'m talking with Trustin (MINA main developer) about it. If we could provide a way to reproduce it it will make things a lot easier to fix.

Thanks,

– Gato

As we have no idea as of what might have caused the problem, reproduction has failed so far.

Edit: The problem did go away after another server restart though. No clue as to why.

Message was edited by: Guus