powered by Jive Software

S2S: Openfire memory leak when remote server is inaccessible

I came across this in a high load scenario (~40k users). I have setup Openfire in a S2S configuration. The local server is setup with adequate heap size and normally (when both servers are up) handles the load perfectly well. The remote server goes down (due to firewall, network issues, maintenance, etc.) for an extended time (10 hours). The error reported in the logs indicates ‘Error trying to connect to remote server’. After this time the local server throws an OutOfMemoryException.

Analyzing the generated core/heapdumps indicates a very high number of accumulated objects within the OutgoingSessionPromise instance. In particular they are Packet objects added to the packetQueue (ConcurrentLinkedQueue). I modified the implementation of

returnErrorToSender() (http://fisheye.igniterealtime.org/browse/openfire/trunk/src/java/org/jivesoftwar e/openfire/server/OutgoingSessionPromise.java?hb=true) and removed the routes iteration (passing ‘from’ to setTo()) for the Presence error replies. This improved the issue a bit but the memory leak still appears in some cases.

Have you come across this? Is this a known issue? I’ve seen some bugs reported in the same area but they seem to indicate different issues.

Thanks,

Kostas

The relevant documentation from RFC6120 (http://www.rfc-editor.org/rfc/rfc6120.txt) states:

10.4.3. Error Handling

If routing of a stanza to the intended recipient’s server is

unsuccessful, the sender’s server MUST return an error to the sender.

If resolution of the remote domain is unsuccessful, the stanza error

MUST be (Section 8.3.3.16). If resolution

succeeds but streams cannot be negotiated, the stanza error MUST be

(Section 8.3.3.17).

If stream negotiation with the intended recipient’s server is

successful but the remote server cannot deliver the stanza to the

recipient, the remote server MUST return an appropriate error to the

sender by way of the sender’s server.

The Openfire code appears to be written up to spec, but the packetQueue continuously increases in size.