Jetty Thread Pool Issue

We are having an issue with BOSH.

The Jetty threads just slowly keep building up until Openfire crashes.

We are using Jetty 7. I am not sure if this used to happen on Jetty 6 version?

As you can see in the attachment from the probe the the threads always increase and never decrease even during low activity periods.

When the service isn’t busy the threads just flatten out. This issue is causing Openfire to crash every couple of days.

Could the issue be in the HttpSessionManager - I am not that familiar with how it works, and wanted to see if anyone had any clue before

I start delving into all the complex code surrounding this.

If you stop and start the bind service in Openfire admin then the threads will drop down again, and then start increasing. This is the only thing we can do to prevent it from crashing.

In the thread dump the threads look like this. I would expect to have a lot of these when the site is busy but shouldn’t they close when the site is quiet? I am only guessing as I don’t know how it works.

“qtp5300374-572” prio=10 tid=0x0a003400 nid=0x4a00 waiting on condition [0x29cb7000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x34963100> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNano s(AbstractQueuedSynchronizer.java:1963)
    at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:319)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:450)
    at java.lang.Thread.run(Thread.java:619)

Thanks in advance,

Daniel

From the graph, it appears that the threads are levelling off after a certain amount of time. This could indicate that Jetty is using a thread pool with liberal reclaim-strategies. If that’s the case, then there’s not a lot to worry about (Jetty should start re-using threads after it hits the threadpool limit). We should look into this though. I’ve reopened OF-46 to be safe.

Actually it levels off while the site isn’t busy.

It will start going up tonight and hit the limit during our peak time

and probably crash.

Tonight won’t be as busy as the night before, but it will still go up -

which seems odd. It never goes down.

I’ll send you an updated graph if that happens.

I am not sure that this is related to the version of Jetty or something

else, but I’ll gather more information.

Thanks

Daniel, do you have more information available? One of the Jetty devs is working with us on this. He’s adding comments in the JIRA issue (OF-46).

Interestingly our Openfire has been stable for the last 3 days which is a welcome change!

As such, I am not sure if Jetty is an issue or not but I have attached the latest Daemon Threads Graph.

The graph is showing the threads maxed out but this behavior maybe intentional and not the cause of the crashing we were experiencing.

I was waiting to have something more definitive but hopefully this is helpful.

Thanks

Daniel

Just a thought…

We would have between 300 and 1000 connections via BOSH at anytime (about 10% of the connections - the rest are socket). Is it possible that the lower end - 300 isn’t actually that quiet and still requires the use of all the threads. I really don’t know enough about the internal workings to understand this.

Should I simply consider increasing this maximum? To what degree has this been tested under load?

Thanks

As mentioned in the issue we managed to stop this rapid growth in thread usage by reducing the amount of http requests against a servlet in our Plugin.

We still use the servlet for some operations and as a result the threads are slowly growing (much slower than before), but it is still an issue.

I have attached the graph where you can see the slow growth.

Guus - I have your changes with the thread naming and will get them into live as soon as possible to test this further.


So what could cause the threads to build up like that in our plugin?

It is almost an exact copy of the Presence plugin but we have added some more options.

I don’t even know where to begin looking?

Thanks

How many requests does this servlet handle now, in a given time period? If I’m looking at this from a optimistic perspective (read: wishful thinking), I notice two things:

  • The increase in threads appears to be levelling off;
  • The threads are mostly idle (there is no increase in the amount of threads that are in ‘ThreadsRunnable’ state);

Basically, most of the threads are just ‘sitting there’, doing very little work. This can indicate that for a significant amount of new rservlet equests, Openfire (Jetty) chooses to use a new thread from the pool, instead of reusing an existing one that is idle. This in itself doesn’t have to be a problem. I would expect problems to start only after you see a significant increase in threads threads that are in state “runnable.”

You first mentioned that Openfire crashed, later that it was stable. How were the threadpools behaving at both times? What was logged during the crash? I’d like to verify that the crash was related to Jetty thread pool usage.

Note that most likely, not all of the threads that are “ThreadsTimedWaiting” are threads related to your servlet. There will be some from other parts of Openfire too, which are completely unrelated to the issue at hand.

Hi Guus,

Previously the threads grew rapidly until it hit the maximum (254) all within a few hours (causing a crash) - and they never expired.

When this was happening we were probably doing at least 10 hits / second against the servlet via http requests - maybe a lot more during peak time.

We switched to using Smack to communicate with our plugin interceptor via XMPP and the thread issue immediately improved.

(but now we have other connection issues with Smack - but that will be another post)

This is how I knew that BOSH wasn’t the problem, as removing all these requests from our servlet immediately alleviated the issue.

But now after running for 4 days or so without a crash I can see that the threads are still slowly and consistantly going up - they never expire or decrease. I believe this is because we are still doing some requests against the servlets - possible closer to 1 every few seconds (about a 20th of what we were doing previously). And the thread is increasing about a 20th the speed as it was previously. I hope that makes sense.

Anyhow I am pretty sure (but maybe wrong) there is an issue with the servlet releasing the threads, but I am not sure if it is something I have caused (as it is a custom Plugin) or if it is an issue in the underlying code in terms of how servlets are used with Jetty. I haven’t had any time to look into it unfortunately, but I can see the server will eventually run out of threads in about a week at this rate.

A good test would be to do some load testing against the Presence plugin that comes with Openfire. Just hit the servlet a few thousand times and see if the Jetty threads are locking up. If I have time I will do something like that in our staging environment.

I’ll let you know when I find more information.

Thanks

I have attached the latest graph. As you can see the threads are slowly but surely increasing… The drop on the left is when I turned off all the http requests against the servlet a few days ago.

I’m not sure if you’re getting the point I’m trying to make. A threadpool has a specific capacity. Threads can be instantiated in the pool, until that capacity is reached.

This doesn’t mean that an instantiated thread is active. In your graph, you see that most of the threads are actually inactive (the purple bunch) They are instantiated, but are not doing anything (at the time that the graph was made - the graph is but a snapshot of the state at that exact point in time).

Most likely, every instantiated thread is active for a very short time every minute or so, preventing it from being inactive for longer than the time required to be removed from the thread pool. This typically isn’t a big deal. There are enough workers available (either uninstantiated, or instantiated but idle).

Openfire obviously isn’t firing a new thread for every request - you’d see 254 threads within as many seconds then.

Is the traffic that you get very bursty in nature? (A lot of requests arriving at the same time?)

The interesting bit is to see why Openfire crashes. Do you have the exceptions that were logged available?

Hi Guus, I do understand - just not sure why it keeps instantiating new threads. There should be more than enough instantiated already. The traffic doesn’t burst - it is very consistant. Even when the site is relatively quiet new threads keep getting added - why would it do this. Surely the amount of threads TIMED_WAITING threads already there is enough (>100). And it would be re-using these as intended. I am just not sure why it slowly adds more. Anyway it will be interesting to see if we get thread errors again when it reaches 100%.

The errors we were getting before were “Unable to instantiate a new native thread”, being returned by the servlet when the threads reached 100% of the maximum.

Thanks