powered by Jive Software

Openfire slow startup 4.6.2 as a service

Hello, we are running openfire 4.6.2 as a service on server 2012 r2.

from reboot this takes about 5-6 minutes to startup. during this time we cannot access the console or log into spark. is there a way to speed up the process.

My guess is that there is a very slow database loading process or you have some sort of DNS lookup / network connection timeout issue for some blocking resource. Are you able to review the logs during the startup or watch the database for which queries it is running?

Now that you mentions DNS we do have this error. on the dashboard. could it be related?

There appear to be no DNS SRV records at all for this XMPP domain. With the current configuration of Openfire, it is recommended that DNS SRV records are created for this server.

alternatively, where could we investigate the timeout issue, if that were the case.

We had “XMPP Domain Name” on the OF console show an error till we configured it correctly in properties (xmpp.domain), but that did not solve our slow start. We observe that on startup it hangs on this like quite a bit:

03:35:35.239 [main] INFO org.jivesoftware.openfire.muc.spi.MultiUserChatServiceImpl - Multi User Chat domain: conference.[REDACTED DOMAIN].

I have found this line in org.jivesoftware.openfire.muc.spi.MultiUserChatServiceImpl.java (like 544 or there about, search for “startup.starting.muc” in Openfire code base, it only appears once outside of resource files)

    Log.info(LocaleUtils.getLocalizedString("startup.starting.muc", Collections.singletonList(getServiceDomain())));

    // Load all the persistent rooms to memory
    for (final LocalMUCRoom room : MUCPersistenceManager.loadRoomsFromDB(this, this.getCleanupDate(), router)) {
        localMUCRoomManager.addRoom(room.getName().toLowerCase(),room);

        // Start FMUC, if desired.
        room.getFmucHandler().applyConfigurationChanges();
    }

this particular code seems to be loading all the MUC rooms in memory right at the start. In our case we have 48K rooms and I believe this is what slows down our startup. Our startup can take about 10 minutes easily.

I should also mention that we have a cluster setup with 3 nodes, and we are on version 4.5.4.

Could this be a problem? and is it possible to defer the loading of the MUC rooms till they are actually used (on demand), or add a property that defers this to post startup (like a separate thread)?

Thank you.
DT.

BTW, Tested with 4.6.2 on a standalone r5.2xl instance with Aurora RDS and still 15 minutes to start. Still waiting at loading the MUC rooms into memory. Also increased the Minimum DB connections in openfire.xml and still took 14 min (same time). Will keep digging but if you have any ideas or how to move this out of the startup path or deffer it it would help.
DT.

Any recommendations?

Did some more investigation but first a question:

@A.31 how many conference rooms do you have in your OF installation? this is important to see if we are dealing with the same issue.

Our issue is related to the number of conference rooms we have. It seems that for 50-80K (50K show up in console, 85K rows in ofMucRoom table - not clear why the discrepancy) conference rooms it takes about 15 minutes to startup Openfire. Most of the time is spent in one method alone. If we remove all but 2K of the rooms the startup time goes down to a few minutes (not linearly decreasing but a lot faster).

Questions for @guus or openfire team:

  1. Must the conference rooms be loaded all at startup, or can they be loaded on demand/use? We have many rooms but not all need to be loaded at once/startup.
  2. Could the conference rooms be loaded in the background/separate thread?

Here is what I found (follow the log and I will add annotations at each point):

[1] - Launch Openfire:

[2] - Gets stuck on this line
05:40:39.570 [main] INFO org.jivesoftware.openfire.muc.spi.MultiUserChatServiceImpl - Multi User Chat domain: conference.REDACTED.com
05:40:39.570 [main] DEBUG org.jivesoftware.openfire.muc.spi.MUCPersistenceManager - Loading rooms for chat service conference

[3] - wait wait, about 15 min

[4] - seems to go thorugh every user’s JID
05:55:58.221 [main] DEBUG org.jivesoftware.openfire.group.GroupJID - Parsing JID from string: USER1@REDACTED.com
05:55:58.223 [main] DEBUG org.jivesoftware.openfire.group.GroupJID - Parsing JID from string: USER2@REDACTED.com
… thousands of records (Seems like once per user at least ) - 10 seconds
05:56:08.642 [main] DEBUG org.jivesoftware.openfire.group.GroupJID - Parsing JID from string: USERN-1@REDACTED.com
05:56:08.642 [main] DEBUG org.jivesoftware.openfire.group.GroupJID - Parsing JID from string: USERN@REDACTED.com

[5] - finishes the loading
05:56:08.659 [main] DEBUG org.jivesoftware.openfire.muc.spi.MUCPersistenceManager - Loaded 50777 rooms for chat service conference

[6] - now switches to FMUC handler (16 seconds)
05:56:08.660 [main] DEBUG org.jivesoftware.openfire.muc.spi.FMUCHandler - (room: ‘CONF1@conference.REDACTED.com’): Changing outbound join configuration. Existing: null, New: null
05:56:08.660 [main] DEBUG org.jivesoftware.openfire.muc.spi.FMUCHandler - (room: ‘CONF2@conference.REDACTED.com’): Changing outbound join configuration. Existing: null, New: null
… thousands of records, seems like one per conference room (16 seconds later)
05:56:25.811 [main] DEBUG org.jivesoftware.openfire.muc.spi.FMUCHandler - (room: ‘CONFN-1@conference.REDACTED.com’): Changing outbound join configuration. Existing: null, New: null
05:56:25.811 [main] DEBUG org.jivesoftware.openfire.muc.spi.FMUCHandler - (room: ‘CONFN@conference.REDACTED.com’): Changing outbound join configuration. Existing: null, New: null

The problem is the extreme long amount of time taken here:
https://github.com/igniterealtime/Openfire/blob/v4.6.2/xmppserver/src/main/java/org/jivesoftware/openfire/muc/spi/MultiUserChatServiceImpl.java (line 1486)
// Load all the persistent rooms to memory
for (final LocalMUCRoom room : MUCPersistenceManager.loadRoomsFromDB(this, this.getCleanupDate(), router)) {
localMUCRoomManager.addRoom(room.getName().toLowerCase(),room);

        // Start FMUC, if desired.
        room.getFmucHandler().applyConfigurationChanges();
    }

Specifically, loadRoomsFromDB, implemented here:
https://github.com/igniterealtime/Openfire/blob/v4.6.2/xmppserver/src/main/java/org/jivesoftware/openfire/muc/spi/MUCPersistenceManager.java (line 543)
Between the first and the last debug line in this method 15 min are spent.

This is extremely long time to take out for a reboot of a server. We have tried this in 4.5.4 and 4.6.2 and both are super slow.

Any help would be appreciated.
Thank you.
DT

The slow start-up of Openfire when there are many MUC rooms is a known issue. I’ve raised this ticket in our issue tracker: https://igniterealtime.atlassian.net/browse/OF-2259