powered by Jive Software

MUC & Unreliable s2s

I’‘ve got a somewhat unusual environment that I want to deploy XMPP servers in, and I’'m looking for ideas.

The use case: we’‘ve got multiple sites, each with an XMPP server. Network connections are good within a site, but connections off-site are highly unreliable. I’'d expect any open TCP connection to get whacked on a regular basis, which means S2S will be going up and down, for minutes or even hours at a time, and frequently, as in several times per hour.

I expect the muc rooms to be hosted on a relatively reliable and well-connected server, though the individual server connections federated to that may be going up and down.

The problem is that several people from a given site may be on a muc hosted on the stable server. When the S2S connection goes down I’'d like a) the people within that remote site to be able to continue to talk to each other, and b) when the connection comes back up to have the messages on each side played back out to each other.

As near as I can tell (and I only have a glancing reading of the RFCs) this isn’'t handled well by the existing protocol.

My initial impression is that I can do some sort of an interceptor on the XML router on the remote site’'s XMPP server. When the S2S connection goes down, the router hot-wires the locally generated messages to be delivered locally, and spools up a copy of all the locally generated messages. When the s2s connection comes back up it gets a history from the muc from the time the connection went down, and sends off the spooled up local messages (marked appropriately as old.)

Mucking about with the XML router on the server will make it slower, but I’‘m not extremely worried about that; it’'s not really a high traffic environment and we can throw hardware at the problem.

Is this completely off base? Is there a better solution?


There’‘s a lot of interest in this type of feature from various people. In fact, there’'s been discussions about extending the MUC protocol to deal with it more explicitly.

Check out the recent thread “MUC Across Unreliable S2S” on the jdev mailing list:


We’‘ll definitely track any standard that emerges and will likely contribute to it, especially if we’‘re working with any customers on the project. If you’'d like to see this feature in Wildfire and can sponsor some of the work, drop us a line.




Has there been some progress in this area?