My Openfire 3.5.1 installation runs on a FreeBSD jail. I had severe server2server connection problems when the server did not run on a machine that matched the domain part of the JID, but should be looked up using DNS SRV records. My Openfire installation failed that DNS SRV lookup and connected to the plain domain part and (of course) did not get a connection.
The reason for that behaviour is deeply buried in the FreeBSD implementation of UDP sockets when running in a jail: The Java DNS/JNDI code which is used by Openfire reconnects the UDP socket for the DNS lookups for every request. Only the first connect(2) call succeeds on my FreeBSD, all subsequent calls fail with EINVAL (an error value that is undocumented for connect(2) ). Some research indicates that this might be a restriction of the jail implementation on FreeBSD.
The fix for this problem is to acquire a fresh InitialDirContext for each DNS lookup and closing this context again. I have changed the Openfire DNSUtil class which handles the DNS lookups for Openfire and added a property to switch from the singleton DirContext that is used in the 3.5.1 implementation to a DirContext per lookup required to work around that operating system restriction on my site. Default behaviour is that of Openfire 3.5.1.
If anyone is interested in the patch, see attachment.