Strange server2server DNS problem with FreeBSD jail - Are patches welcome?

My Openfire 3.5.1 installation runs on a FreeBSD jail. I had severe server2server connection problems when the server did not run on a machine that matched the domain part of the JID, but should be looked up using DNS SRV records. My Openfire installation failed that DNS SRV lookup and connected to the plain domain part and (of course) did not get a connection.

The reason for that behaviour is deeply buried in the FreeBSD implementation of UDP sockets when running in a jail: The Java DNS/JNDI code which is used by Openfire reconnects the UDP socket for the DNS lookups for every request. Only the first connect(2) call succeeds on my FreeBSD, all subsequent calls fail with EINVAL (an error value that is undocumented for connect(2) ). Some research indicates that this might be a restriction of the jail implementation on FreeBSD.

The fix for this problem is to acquire a fresh InitialDirContext for each DNS lookup and closing this context again. I have changed the Openfire DNSUtil class which handles the DNS lookups for Openfire and added a property to switch from the singleton DirContext that is used in the 3.5.1 implementation to a DirContext per lookup required to work around that operating system restriction on my site. Default behaviour is that of Openfire 3.5.1.

If anyone is interested in the patch, see attachment.

Martin.

Hi,

I did add this as a comment to JM-898, so it’s not yet implemented but one will take a look at it when resolving this issue.

If you allow only a few s2s connections the “internal DNS server” JM-711 may be an option for you.

LG

Thanks for taking my suggestion into account. I considered using the DNS override option once I isolated the problem, but since I want to be open for unlimited s2s connections, the dnsutil.dnsOverride property is not useful for my installation, therefore I went for the patch. Congratulations to the structure of the code, it has been extremely easy to put that change in without too much fear of breaking everything else. OpenSource is great!

Has this been fixed in 3.6.2? If not, will the patch be compatible with 3.6.2? I seem to be having the same problem. It connects to some servers but not all.

Server Properties
Server Uptime:
44 minutes – started Dec 21, 2008 4:56:47 PM
Version:
Openfire 3.6.2
Server Directory:
/usr/local/share/java/openfire
Server Name:
neko.im
Environment
Java Version:
1.6.0_07 The FreeBSD Foundation – Diablo Java HotSpot™ 64-Bit Server VM
Appserver:
jetty-6.1.x
Host Name:
muspelheim.nulani.net
OS / Hardware:
FreeBSD / amd64
Locale / Timezone:
en / Central European Time (1 GMT)
Java Memory

54.45 MB of 247.50 MB (22.0%) used

Using the patched file didn’t fix it.

I still get errors like this:

2008.12.22 14:41:52 [org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSes sion(LocalOutgoingServerSession.java:258)] Error trying to connect to remote server: gmail.com(DNS lookup: gmail.com:5269)
java.net.ConnectException: Operation timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSess ion(LocalOutgoingServerSession.java:253)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.authenticateDomain (LocalOutgoingServerSession.java:144)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.sendPa cket(OutgoingSessionPromise.java:239)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.run(Ou tgoingSessionPromise.java:216)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java: 885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

Any suggestions?

Bump.

It’s still a problem in OpenFire 3.6.3. Any ideas?

I don’t like to keep bumping this, but is there any progress on this at all?

The DNSUtil.java sources have been basically unchanged from 3.5.1 to 3.6.3: only comments have changed.

I just have built a version with my patch and it still works in my environment.

Sorry if the patch is not helpful in your environment.

EDIT: You might want to check that you have

  • applied the patch and
  • enabled the boolean property dnsutil.useLocalContext in the server configuration.

I had forgotten to enable dnsutil.useLocalContext. I’ll see if it works tomorrow. Thanks!

Edit: How can I check if the patch has been applied like it should? I untarred the source, replaced DNSUtils, and retarred it, and fixed the checksums.

Are you running FreeBSD 6.4 or 7.1?

Edit: It works. HuzzaH!