Troubleshooting 504 Gateway Timeout and Session Errors in Openfire WebSocket Connections

In our customized Openfire setup, we have a consumer that maintains a WebSocket connection pool with Openfire, consisting of 100 connections for further processing.

Architecture Details:

The client maintains a WebSocket connection through Openfire (3) via an HTTP Load Balancer (1) and Nginx (2).

When the sender and receiver exchange packets for communication:

  1. The packet first reaches Openfire (3).
  2. Openfire then publishes this packet to the Node Kafka topic (4).
  3. From this topic, WsWrite (5) consumes the packet and pushes it to the corresponding Kafka topic.
  4. WsWrite maintains a WebSocket connection pool and writes the packet over the established WebSocket connection.
  5. Finally, the packet reaches Openfire (3) again, where it checks if the receiver’s WebSocket connection is active.
  6. If the connection exists, Openfire delivers the packet over the WebSocket connection to the receiver.

504 Incident: After restarting our ws_writer consumers(5), we encountered a 504 error code from Nginx (2). Correspondingly, Openfire’s(3) debug logs displayed the following messages:

  • Error: “Error detected; session”
  • Warning: “Closing session due to incorrect stream header”

This is a very customized instance of Openfire. Also, the error 504 is logged by another piece of software, and the trigger appears to be the restart of yet another piece of software.

Without intimate knowledge about the customization, but also without basic information like what software versions are used, or how the integration is realized, there’s little chance that someone here will be able to help.

Finally, this has all the hallmarks of a elaborate product integration project. I don’t expect that you’ll find help within the open source community with such a specific setup. You could consider enlisting help from someone in our professional support partners listing.