My Openfire 3.5.1 installation runs on a FreeBSD jail. I had severe server2server connection problems when the server did not run on a machine that matched the domain part of the JID, but should be looked up using DNS SRV records. My Openfire installation failed that DNS SRV lookup and connected to the plain domain part and (of course) did not get a connection.
The reason for that behaviour is deeply buried in the FreeBSD implementation of UDP sockets when running in a jail: The Java DNS/JNDI code which is used by Openfire reconnects the UDP socket for the DNS lookups for every request. Only the first connect(2) call succeeds on my FreeBSD, all subsequent calls fail with EINVAL (an error value that is undocumented for connect(2) ). Some research indicates that this might be a restriction of the jail implementation on FreeBSD.
The fix for this problem is to acquire a fresh InitialDirContext for each DNS lookup and closing this context again. I have changed the Openfire DNSUtil class which handles the DNS lookups for Openfire and added a property to switch from the singleton DirContext that is used in the 3.5.1 implementation to a DirContext per lookup required to work around that operating system restriction on my site. Default behaviour is that of Openfire 3.5.1.
If anyone is interested in the patch, see attachment.
Thanks for taking my suggestion into account. I considered using the DNS override option once I isolated the problem, but since I want to be open for unlimited s2s connections, the dnsutil.dnsOverride property is not useful for my installation, therefore I went for the patch. Congratulations to the structure of the code, it has been extremely easy to put that change in without too much fear of breaking everything else. OpenSource is great!
Has this been fixed in 3.6.2? If not, will the patch be compatible with 3.6.2? I seem to be having the same problem. It connects to some servers but not all.
Server Properties
Server Uptime:
44 minutes – started Dec 21, 2008 4:56:47 PM
Version:
Openfire 3.6.2
Server Directory:
/usr/local/share/java/openfire
Server Name: neko.im
Environment
Java Version:
1.6.0_07 The FreeBSD Foundation – Diablo Java HotSpot™ 64-Bit Server VM
Appserver:
jetty-6.1.x
Host Name: muspelheim.nulani.net
OS / Hardware:
FreeBSD / amd64
Locale / Timezone:
en / Central European Time (1 GMT)
Java Memory
2008.12.22 14:41:52 [org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSes sion(LocalOutgoingServerSession.java:258)] Error trying to connect to remote server: gmail.com(DNS lookup: gmail.com:5269)
java.net.ConnectException: Operation timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSess ion(LocalOutgoingServerSession.java:253)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.authenticateDomain (LocalOutgoingServerSession.java:144)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.sendPa cket(OutgoingSessionPromise.java:239)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.run(Ou tgoingSessionPromise.java:216)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java: 885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
I had forgotten to enable dnsutil.useLocalContext. I’ll see if it works tomorrow. Thanks!
Edit: How can I check if the patch has been applied like it should? I untarred the source, replaced DNSUtils, and retarred it, and fixed the checksums.