Problem with Openfire 4.6.4

I wasn’t able to post the files here, so I uploaded it: openfire – Google Drive

At what time did the crash happen? Are these logs what was present when it ‘crashed’?

I don’t know the time exactly.
Yesterday it was fine until 7PM.
Today everybody connected normally, but the messages were not reaching the recipient. I noticed this at 9 am.
If necessary, I could increase the log rotation time, but I don’t know how to do it.

unfortunately, the debug log looks like not enough to show the information you need. It is rotating too fast

Can you increase your debug log rotation time?
Therefore you have to edit “OPENFIREHOME”/lib/log4j2.xml and reboot openfire.

<?xml version="1.0" encoding="UTF-8"?>

<Configuration monitorInterval="30">
    <Appenders>
        <RollingFile name="debug-out" fileName="${sys:openfireHome}/logs/debug.log" filePattern="${sys:openfireHome}/logs/debug.log-%i">
            <PatternLayout>
                <Pattern>%d{yyyy.MM.dd HH:mm:ss} %c - %msg%n</Pattern>
            </PatternLayout>
            <Policies>
                <SizeBasedTriggeringPolicy size="100 MB"/><!-- Logfiles will reach 100 MB of size -->
				<OnStartupTriggeringPolicy/>
            </Policies>
            <Filters>
                <ThresholdFilter level="DEBUG"/>
                <ThresholdFilter level="INFO" onMatch="DENY" onMismatch="NEUTRAL"/>
            </Filters>
			
			<DefaultRolloverStrategy max="100"/><!--You will get max 100 files now, take care that you need 10GB for all logs now-->
        </RollingFile>
...
</Configuration>

Could you also look for theese errors in your error.log please?

java.sql.SQLException: ConnectionManager.getConnection() failed to obtain a connection after 11 retries. The exception from the last attempt is as follows: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object

I have them in times around the main issue…

Although I’m not sure that they’re relevant to the problem at hand, these Exceptions are a red flag. They indicate that the database connection pool that’s used by Openfire is undersized. This could mean that one or more queries are underperforming (maybe there’s a missing index?), that the database server is underperforming (it might be undersized), or that the sheer amount of queries used by Openfire is more than can be accommodated by the default connection pool size (this occasionally happens in Openfire instances that are under a very high load).

The Openfire admin console contains a couple of pages that you can use to review your current settings and usage patterns (under “Server → Server Management → Database”). If memory serves me well, the maximum amount of connections in the database pool can be configured in the openfire.xml file. Do make sure that your database is configured to allow the additional concurrent connections (and is able to handle the additional load)!

I allready had setup the pool size to min 25 and max 100 connections…
I cant imagine that 400 people are to much for openfire as i read that it could manage over 10k+!?
OF runs in a virtual machine (RHEL) with 8GB Ram (6GB available for OF) and it mostly consumed 2GB max.
The average waiting time for a connection are 11ms, max time was 1sec and 40ms
But i think this is another issue (maybe it has to do with the database - sql server in a vm with 8gb ram)

Coming back to the main issue of this thread. I also have this from time to time…
The mysterious part is that while the sessions become invalid it is also not possible to connect with a fresh connection to OF and create a new session. Only a reboot solves it.
I think getting invalid session could have to do with the network, but why is OF not accepting new connections?

I can’t imagine that the problem is related to network problems.
Since I reported the problem in July and rolled back to 4.6.3 OF is running fine and stable 24/7

On the other side I wonder why not more people reporting problems with 4.6.4

I have theese issues with versions prior 4.6.3 too

as I already wrote, no single problem since July when switching back to 4.6.3 :wink:
also all versions before were running perfect
maybe your problem is different from the one described in this thread

Are all your sessions at once became invalid or one by one ?

Hi,
not exactly at the same time, but if I remember correctly, while some clients are still shown as connected in status, others are already shown “invalid session/connection”.

After a while all clients show “invalid session/connection”

Ok same here hmm, did you set the preferences “xmpp.client.idle” and/or “xmpp.httpbind.client.idle” to -1?
-1 = never disconnect idle clients

no I didn’t
I only used admin webif for the whole setup and I do not use any special settings

One more question @guus : what does the following warning stands for?

[socket_c2s-thread-2]: org.jivesoftware.openfire.nio.ConnectionHandler - Closing connection due to exception in session: (0x0000F4E3: nio socket, server, /172.254.1.38:63221 => /172.31.22.80:5222)
java.io.IOException: Closing session that seems to be stalled. Preventing OOM

This warning is flooding my logs :confused:

That message occurs when Openfire has queued up to a configurable limit (5MB by default) of data to be read by a client, without the client reading that data. If that happens a lot, then it suggests that clients connections disappear from Openfire, without Openfire realizing it (until this limit kicks in, that is).

ok, so this could be a sympthome for the origin invalid session problem too :thinking:

update: that points me to the xmpp ping… if i would decrease the ping time, then openfire should notice such dead connections ealier

1 Like

btw. i have noticed that i am not able to force close “invalid session/connection” or “detached” connections on the admin console “sessions-summary.jsp” tab

furthermore i have noticed that OF will close detached session itself but invalid session will stay although this session is marked as offline and status is closed

@guus : i hope this wont result in a memory leak…

Na versão 4.6.4 o plugins de monitoramento de conversar está instalado porém, não aparece na barra principal.

I have debugged around and found out that if the (mina) Connection object in LocalSession is somehow corrupted (network errors or other …) a
sess.getHostAddress() in session-row.jspf
will fail and results in
Invalid session/connection

A click on “Close connection” on admin console (session-summary.jsp) allways fails on detached or invalid session because the call of sess.close(); (in session-summary.jsp) fails. It should call the close method of the Session object, but if this is corrupt or null it wont call the closing event with further clean up tasks, so the session wont be deleted. The problem is that new sessions will not be correctly initialized then too but i dont now exactly why.

I found a thread on stackoverflow with a good explanation which might be a reason for this issue (the post which was marked as solution) java - Apache Mina, How to detect when you're sending messages using an invalid socket to the client side? - Stack Overflow