Spark login / connections failing (LDAP)

Spark 2.5.8, Openfire 3.5.2, Red Hat Enterprise 5, LDAP / Active Directory

Very strange happenings here - suddenly this morning users cannot login to Spark. I’ve had sporadic problems like this before but they’ve always “fixed themselves” which in itself is worrying. Neither SSO or non-SSO works, both just time out. I have extended the timeout on Spark to 30 seconds and enabled the debug window - either NOTHING is transmitted or received during the entire 30 seconds, or I eventually get a connection and a small amount of traffic, but none of my vCard / profile data is there, no Shared Groups, no users are searchable - so pretty much useless. Any users that are connected from before the problem occurs are able to use Spark normally - it’s just new logins that fail.

When this happens in the past only a server restart has fixed it immediately - restarting the Openfire service makes no difference. There is nothing of note in the logs other than some warnings -

2008.07.22 11:03:37 Error or result packet could not be delivered

These reference the different services that are trying to be provided, Search, FastPath, Broadcast etc. Networking all seems ok, port 5222 responds from the client, the Openfire server can get a connection to the DC on 389 etc etc.

Then - the strangest thing is - eventually, if left long enough, it will all start working again. No changes, just a lot of head scratching. That’s great of course, but it doesn’t instill confidence, especially as I want to extend the number of users significantly over the next few weeks! I don’t think resources on the OF server are the problem, there are only 100 or so users connected, the server has 2GB RAM, the Java memory has been increased in Openfire to 1GB, it’s all running very “unstressed”.

So the question - does this sound familiar to anyone, and if so have you been able to resolve it? I really want to roll these products out across our company, but this is beginning to worry me with the frequency of occurence…

Nick

OK I’ve tracked this down I think to the CPU on the server - it is absolutely max’d out. Java memory in Openfire is using like 5% of the available resources at the time, so it can’t be connected to that. I can see there are a few threads on this subject, but most seem to point to Java memory, which doesn’t seem to be the proble here. Any other clues?

Closing this thread - transferring to http://www.igniterealtime.org/community/thread/34152