BOSH doesn't work properly since 3.10 (async)

Since OF 3.10(.2) BOSH doesn’t work properly anymore. Large packets don’t arrive at the client. I am not even sure, it’s caused by large packets, but I think it is.

I was thinking it might be related to OF-908 (any maybe it is), but it’s supposedly fixed.

I also tested with 4.0.0 Alpha, but same issue.

Any help appreciated!

In the Server Logs I am seeing this:

2015-12-03 17:32:29,555 [WARN ] org.eclipse.jetty.server.HttpChannel                     - complete failed java.nio.channels.WritePendingException
    at org.eclipse.jetty.server.HttpConnection$SendCallback.reset(HttpConnection.java:624)
    at org.eclipse.jetty.server.HttpConnection$SendCallback.access$100(HttpConnection.java:594)
    at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:479)
    at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:768)
    at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:801)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
    at org.eclipse.jetty.servlets.gzip.GzipHttpOutput.write(GzipHttpOutput.java:74)
    at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:135)
    at org.eclipse.jetty.server.HttpOutput.close(HttpOutput.java:166)
    at org.eclipse.jetty.server.Response.closeOutput(Response.java:1017)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:421)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:724)

and this:

2015-12-03 17:32:53,310 [WARN ] org.jivesoftware.openfire.http.HttpBindServlet           - Error writing response data to [127.0.0.1] java.lang.IllegalStateException: generateResponse=NEED_INFO
    at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:713)
    at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:246)
    at org.eclipse.jetty.util.IteratingCallback.succeeded(IteratingCallback.java:367)
    at org.eclipse.jetty.io.WriteFlusher$PendingState.complete(WriteFlusher.java:270)
    at org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:383)
    at org.eclipse.jetty.io.SelectChannelEndPoint.onSelected(SelectChannelEndPoint.java:111)
    at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processKey(SelectorManager.java:636)
    at org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:607)
    at org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:545)
    at org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:724)

and also this:

2015-12-03 17:33:27,901 [ERROR] org.jivesoftware.openfire.http.HttpBindServlet           - Error sending packet to client. org.jivesoftware.openfire.http.HttpConnectionClosedException: The http connection is no longer available to deliver content
    at org.jivesoftware.openfire.http.HttpConnection.deliverBody(HttpConnection.java:116)
    at org.jivesoftware.openfire.http.HttpSession.forwardRequest(HttpSession.java:598)
    at org.jivesoftware.openfire.http.HttpBindServlet.handleSessionRequest(HttpBindServlet.java:249)
    at org.jivesoftware.openfire.http.HttpBindServlet.processContent(HttpBindServlet.java:204)
    at org.jivesoftware.openfire.http.HttpBindServlet$ReadListenerImpl.onAllDataRead(HttpBindServlet.java:423)
    at org.eclipse.jetty.server.HttpInput.run(HttpInput.java:441)
    at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1175)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:354)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:724)
2015-12-03 17:33:27,901 [WARN ] org.jivesoftware.openfire.http.HttpBindServlet           - Error writing response data to [127.0.0.1] java.lang.IllegalStateException: s=ASYNC_IO i=true a=COMPLETE
    at org.eclipse.jetty.server.HttpChannelState.complete(HttpChannelState.java:439)
    at org.eclipse.jetty.server.AsyncContextState.complete(AsyncContextState.java:92)
    at org.jivesoftware.openfire.http.HttpBindServlet$WriteListenerImpl.onWritePossible(HttpBindServlet.java:458)
    at org.eclipse.jetty.server.HttpOutput.run(HttpOutput.java:812)
    at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1175)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:365)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:724)
2015-12-03 17:33:27,902 [WARN ] org.eclipse.jetty.server.HttpChannel                     - /http-bind/ java.lang.IllegalStateException: s=ASYNC_IO i=true a=COMPLETE
    at org.eclipse.jetty.server.HttpChannelState.complete(HttpChannelState.java:439)
    at org.eclipse.jetty.server.AsyncContextState.complete(AsyncContextState.java:92)
    at org.jivesoftware.openfire.http.HttpBindServlet$WriteListenerImpl.onError(HttpBindServlet.java:464)
    at org.eclipse.jetty.server.HttpOutput.run(HttpOutput.java:792)
    at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1175)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:365)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
    at java.lang.Thread.run(Thread.java:724)

That’s interesting … I’m guessing there is a related Jetty configuration setting somewhere. How large are the failing packets?

We can investigate further starting here.

Chapter 6. Configuring Jetty Connectors

Packet size: I have to investigate it deeper. They contained avatars, I think I saw 90-200 KB, not sure.

WritePendingException actually looks more like a bug in Jetty.

Bug 432901 – java.nio.channels.WritePendingException logged on WARN level when client breaks connection

as well as this one:

Bug 425422 – java.lang.IllegalStateException: AsyncContext completed

This one looks like it might also be a bug (misuse) in Openfire:

Re: [jetty-users] Meaning of "java.lang.IllegalStateException: s=ASYNCIO

We should also probably take a close look at this PR: OF-885: Use non-blocking, async API for BOSH servlet by guusdk · Pull Request #195 · igniterealtime/Openfire · GitHub

… particularly with respect to the timing of the context.complete() calls.

I did some further investigation:

I’ve tried the latest Jetty jars (9.3.6.v20151106) without success, same issue. Some less exceptions, but still the WritePendingException. One new Exception: “java.lang.IllegalStateException: Committed” with Jetty only stacktrace (occurred only once).

The packet size, which arrived at the client was: 65536, which is exactly 64 KB. This at least is suspicious. The Content-Length header however was 261279, so there was a lot of data missing.

(This was with the latest Jetty).

Re-tested it now with Openfire’s Jetty (9.2.9): The packet size at the client was 98304 (exactly 96 KB). Everything larger seems to cause issues.

edit: Just saw some (incomplete) packet having 128 KB at the client, so it’s probably random.

I suspect this issue is always reproducible, when trying to deliver 64+ KB packets via BOSH.

Tracked as OF-989.

I’ve already got the solution :slight_smile:

My first try on reproducing this issue failed. I used the current head of MASTER (4.0.0alpha).

I’m successfully sending and receiving a little over 100,000 characters of characters from JSJac to Spark and back. On both transmissions, I checked that I received all of the data (also, the content-length header was OK). I disabled compression during the tests. I used port 7070.

I’ve fixed it in Prevent WritePendingException during delivery of BOSH packets. by sco0ter · Pull Request #426 · igniterealtime/Openfire …

Got the inspiration from here:

Bug 483091 – AsyncContext.complete fails (WritePendingException) when called right after out.write

The same issue is described there as well.

Nice work!