What is RetryContinuation and why is it eating all my CPU?

My OpenFire server is going to 90+% CPU and if I turn on debugging, I get this (over and over):

2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@32320710
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@16343570
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@32320710
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@16343570
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@32320710
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@16343570
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@32320710
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@16343570
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@32320710
2010.05.12 18:36:00 JettyLog: continuation RetryContinuation@16343570

Anybody seen this or have a suggestion?

M.

Hi,

this may be a Jetty problem or related to http://www.igniterealtime.org/community/message/203237#203237

Does this happen also with 3.7.0-beta?

LG

I just solved it an hour ago. I also solved a year ago, which is why I’m so pissed.

Unix has a per-process file-descriptor limit. On my machine, it happened to be 1024. At about 220 users (i.e., not very many), it would run out and start kicking people out. Since they would just try to reconnect, the problem spiralled up until the CPU was pegged.

The solution, for those who care, us lonely few who use Unix-based machines and have more than a handfuls of users, is simple. As superuser,

ulimit -n 5000 # or whatever

sh bin/openfire start

Since I brought down my last company’s system in 2008 as well as this one’s yesterday, I for one hope this issue is made more prominent in the control console and the documentation.

When is 3.7 going into production?

“Experience is that marvelous thing that enables you to recognize a mistake when you make it again.”

– Franklin P. Jones

Nope, LG was right. It was a 3.6.4 problem and when I upgraded to 3.7, it went away.

I am seeing this same problem with Openfire 3.6.4 using http-bind with Sparkweb. Is this issue fixed in 3.6.4 trunk or is the recommended solution to upgrade to 3.7 beta?