PubSubModule - Internal server error

I get an internal server error when retrieving published items of a pub sub node.
There are no published items on my node yet, but somehow Openfire tries to put result of the getPublishedItem into the cache. The result is null, so the put in the itemCache fails:

debug.log:
2020.09.14 11:44:28 org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider - No cached item found. Obtaining it from delegate.

error.log:

2020.09.14 11:44:29 org.jivesoftware.openfire.pubsub.PubSubModule - Internal server error
java.lang.NullPointerException: Null value is not allowed!
at org.jivesoftware.util.cache.DefaultCache.checkNotNull(DefaultCache.java:634) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.util.cache.DefaultCache.put(DefaultCache.java:145) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.util.cache.DefaultCache.put(DefaultCache.java:59) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.util.cache.CacheWrapper.put(CacheWrapper.java:157) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.util.cache.CacheWrapper.put(CacheWrapper.java:30) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider.getPublishedItem(CachingPubsubPersistenceProvider.java:660) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.pubsub.LeafNode.getPublishedItem(LeafNode.java:393) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.pubsub.PubSubEngine.getPublishedItems(PubSubEngine.java:1219) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.pubsub.PubSubEngine.process(PubSubEngine.java:149) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.pubsub.PubSubModule.process(PubSubModule.java:187) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routeToComponent(RoutingTableImpl.java:451) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.java:266) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:336) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:118) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:74) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:380) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.net.ClientStanzaHandler.processIQ(ClientStanzaHandler.java:95) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:322) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:209) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandler.java:183) [xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]

I am testing our Openfire plugin for dynamic chat forms against the latests Openfire xmppserver-4.6.0-SNAPSHOT jar (taken from the igniterealtime repo).

I think you’re hitting a bug in the new Pubsub code that will get into 4.6.0. I’ve raised this ticket to track this issue: https://issues.igniterealtime.org/browse/OF-2080

As a workaround, you could consider using a different persistence provider. By default, Openfire will use org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider. This implementation offers caching functionality, but uses another provider as a delegate, to do the actual database interaction. The one that’s used by default is org.jivesoftware.openfire.pubsub.DefaultPubSubPersistenceProvider.

Instead of using the cached variant, you can configure Openfire to use the org.jivesoftware.openfire.pubsub.DefaultPubSubPersistenceProvider directly. That will bypass the code that’s giving you this issue. The downside is that you’ll see more database interaction.

To configure what provider is used, you can use this Openfire property: provider.pubsub-persistence.className. It should be configured with the full class name of the implementation that you want to use (eg: org.jivesoftware.openfire.pubsub.DefaultPubSubPersistenceProvider)

1 Like

As a follow-up: can you provide the stanza that triggers this bug? From the stack trace that you provide, I think I can conclude that Openfire is processing a query that explicitly asks for an item by item-id (perhaps something like ‘current’ is used). That ID doesn’t seem to correspond with an existing item, which triggers this bug.

I’m still thinking that Openfire should behave nicer here, but if I’m right, then you might have another workaround available to you: prevent the client from requesting non-existing items.

I can confirm the workaround to set the persistence provider to DefaultPubSubPersistenceProvider does get me around the problem.

The stanzas that trigger this bug look like:

<iq xmlns="jabber:client" to="pubsub.openfire.xxxx.yyy" id="R30ws-4705" type="get">
  <pubsub xmlns="http://jabber.org/protocol/pubsub">
    <items node="fdp/submitted/rapidrequest">
      <item xmlns="http://jabber.org/protocol/pubsub" id="itemId"/>
    </items>
  </pubsub>
</iq>

where ‘itemId’ looks pretty much like the ‘current’ you are mentioning.

But I also see stanzas like these:

<iq xmlns="jabber:client" to="pubsub.openfire.xxxx.yyy" id="R30ws-4707" type="get">
  <pubsub xmlns="http://jabber.org/protocol/pubsub">
    <items node="fdp/submitted/rapidrequest">
      <item xmlns="http://jabber.org/protocol/pubsub" id="ef7070c2-241c-4b13-a249-faa551dacdce"/>
    </items>
  </pubsub>
</iq>

which seem to have a unique id tag, but the result is also an internal server error.

The id value for the item element references an identifier for an item. How do you determine this value? If you know that an item exists, then having this error is more problematic - apparently, Openfire has lost the item?

If you know that no item exists, then you shouldn’t request a specific item to be returned, I think.

I have committed a fix, which will be available in tomorrows nightly build.

I tested with last nights build and can confirm there is no internal server error anymore.

I now see the following message being logged:
2020.09.15 10:05:34 org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider - Item for item identifier UniqueIdentifier{serviceId='pubsub', nodeId='fdp/submitted/medevac', itemId='ffdab261-bd9d-4c37-92aa-e2fee1b25c38'} was not found on node UniqueIdentifier{serviceId='pubsub', nodeId='fdp/submitted/medevac'}

which is the message you added, so that is fine.

I am investigating why these item id’s are requested. Already found out (some of) these id’s come from items stored clientside as concept forms. Those should not be sent in requests to the server.

Great, thanks for the feedback.

The effort that you’re putting in to verify where these item IDs come from is useful, I think. There’s a good chance that this is totally benign (eg: the client just checking to see if an item exists, or using some kind of default), but if Openfire has previously (in the same session) told the client that this item exists (and thus is serving up an inconsistent state) then we’d need to take further action.

I’m very interested in your findings.

I have an issue now: after the restart of my Openfire server discovering pubsub items does return nothing.

When the server was running and configured with the casrep form a disco items request resulted in items:

<iq xmlns="jabber:client" to="pubsub.openfire.xxxx.yyy" id="lsCFT-434" type="get">
  <query xmlns="http://jabber.org/protocol/disco#items"/>
</iq>
<iq xmlns="jabber:client" to="niels@openfire.xxxx.yyy/2vyjvd5agb" from="pubsub.openfire.xxxx.yyy" id="lsCFT-434" type="result">
  <query xmlns="http://jabber.org/protocol/disco#items">
    <item jid="pubsub.openfire.xxxx.yyy" name="" node="fdp/xslt/casrep"/>
    <item jid="pubsub.openfire.xxxx.yyy" name="" node="fdp/template/casrep"/>
    <item jid="pubsub.openfire.xxxx.yyy" name="" node="fdp/submitted/casrep"/>
  </query>
</iq>

After the restart the result is empty:

<iq xmlns="jabber:client" to="pubsub.openfire.xxxx.yyy" id="5Vxuz-302" type="get">
  <query xmlns="http://jabber.org/protocol/disco#items"/>
</iq>
<iq xmlns="jabber:client" to="niels@openfire.xxxx.yyy/apemq959s2" from="pubsub.openfire.xxxx.yyy" id="5Vxuz-302" type="result">
  <query xmlns="http://jabber.org/protocol/disco#items"/>
</iq>

I don’t see errors in the log files. The debug log shows adding pubsub items during loading of our plugin:

2020.09.16 08:52:14 int_.nato.ncia.jchat.plugin.XepNodeManager - updateTemplate
2020.09.16 08:52:14 org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider - Saving published item UniqueIdentifier{serviceId='pubsub', nodeId='fdp/xslt/casrep'} 1
2020.09.16 08:52:14 org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider - Saving published item UniqueIdentifier{serviceId='pubsub', nodeId='fdp/template/casrep'} 1
2020.09.16 08:52:14 org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider - Added new (inbound) item to cache
2020.09.16 08:52:14 org.jivesoftware.openfire.pubsub.CachingPubsubPersistenceProvider - Added new (inbound) item to cache

I checked the Node Summary for pubsub via the Openfire admin console. That correctly shows the fdp/xslt/casrep, fdp/template/casrep and fdp/submitted/casrep nodes with the items I loaded (form template, form xslt and 1 submitted form).

So content is there, but I cannot retrieve it via the disco#items request.

I can reproduce this problem. I’ll continue to investigate. From an initial assessment, I believe that this can be worked around with again with using the Default- instead of the CachingPubsubPersistenceProvider.

My initial findings seem to indicate that this affects nodes that are being stored that do not contain any items. If true, then adding an item to the node would work around the issue.

For me using the DefaultPubSubPersistenceProvider instead of the CachingPubsubPersistenceProvider does not help. After restart of the Openfire server the disco#items query does not return the nodes.
This applies to nodes that have no items as well as nodes with an item on them.

In the logs I see the DefaultPubSubPersistenceProvider being used when loading our plugin:
2020.09.16 13:47:10 org.jivesoftware.openfire.pubsub.DefaultPubSubPersistenceProvider - Loaded item from DB

No errors or warnings in the logs when doing the disco#items query.

It really seems to be related the requesting the items after startup of the server.
When I remove a form and add it again via Openfire admin console (basically this is removing a node and add it again) the items of this form are returned fine in the disco#items response.

Also my unittest that publishes an item to one of these nodes, checks the item count via disco#items request and removes the item from the node is succesfull.
So the node is existing on the server (what was already shown in the Pubsub node summary in the admin console).

As a status update (no solution yet): I believe there’s at least two, potentially three different issues at play here:

  1. When shutting down Openfire fast enough, recently created nodes might not have been persisted
  2. The caching persistence provider might neglect to persist nodes at all, when there are no items to be stored (I suspect that any item on any node will do, but the persistence flush seems to be aborted when there’s no item at all).
  3. When re-initializing after a restart, Openfire fails to properly relate nodes to their parent root node.

I suspect that the latter is what’s causing you most grief, but all these combined make for a tricky investigation.

I’ve raised the following JIRA tickets to track these issues:

Fixes are in the works at https://github.com/igniterealtime/Openfire/pull/1704

The fixes were merged, and a new nightly build has been generated. @Niels_Kranendonk would you care to try if this resolves your issues?

I tested with the latest nightly build of this morning and can confirm the issue of retrieving pubsub items after restart of the server is resolved.

I have used the CachingPubsubPersistenceProvider and see the correct items being retrieved via the diso#items request. I tested already existing nodes as well as newly added ones, both are correctly in the list of returned items.

During testing removal of nodes one time I saw that some removed nodes were still being returned as part of the disco#items request although they were not shown on the Pubsub node summary anymore. However I could not reproduce this when I tried again.

1 Like

Thanks for the feedback!

That’s a bit concerning. Maybe there’s a race condition that we need to more properly account for. Was this in context of a cluster?

No, the server I use for testing is not setup with clustering.