Server-to-Server TLS Problems

Server #1 (Openfire.Internal.MyDomain.Com) is inside network boundary. Server #2 (Server.Invalid.Foo) is in a DMZ network segment.

Server #1 can send Chat messages successfully, but does not recieve Presense / Online information from Server #2.

Server #2 cannot send Chat messages. Nor does it recieve Presense / Online information from Server #1.

If I drop the requirement for secure connections between servers, both servers can communicate successfully and send/recieve presense information properly. The problems only occur when I require TLS. Both servers have internal-CA signed certs installed for both RSA and DSA. All four certs appear within the admin console as being CA signed and have green checkmarks.

Server #1 Error Log:

2009.05.27 12:26:51 [org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSes sion(LocalOutgoingServerSession.java:258)
] Error trying to connect to remote server: foo(DNS lookup: foo:5269)
java.net.UnknownHostException: foo
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSess ion(LocalOutgoingServerSession.java:253)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.authenticateDomain (LocalOutgoingServerSession.java:185)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.sendPa cket(OutgoingSessionPromise.java:239)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.run(Ou tgoingSessionPromise.java:216)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Server #1 Warning Log:

2009.05.27 12:03:47 Closing session due to incorrect hostname in stream header. Host: internal.mydomain.com. Connection: org.jivesoftware.openfire.net.SocketConnection@1e74d83 socket: Socket[addr=/192.168.xxx.xxx,port=2716,localport=5269] session: null

Server #2 Error Log:

2009.05.27 12:03:40 [org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSes sion(LocalOutgoingServerSession.java:258)
] Error trying to connect to remote server: mydomain.com(DNS lookup: mydomain.com:5269)
java.net.ConnectException: Connection timed out: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(Unknown Source)
at java.net.PlainSocketImpl.connectToAddress(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.createOutgoingSess ion(LocalOutgoingServerSession.java:253)
at org.jivesoftware.openfire.session.LocalOutgoingServerSession.authenticateDomain (LocalOutgoingServerSession.java:185)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.sendPa cket(OutgoingSessionPromise.java:239)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.run(Ou tgoingSessionPromise.java:216)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Server #2 Warning Log:

2009.05.27 12:10:37 Closing session due to incorrect hostname in stream header. Host: invalid.foo. Connection: org.jivesoftware.openfire.net.SocketConnection@10eb2f0 socket: Socket[addr=/10.xxx.xxx.xxx,port=2453,localport=5269] session: null

I’d suspect DNS resolution given the errors, but these exact same servers work normally if I remove the requirement for TLS. Anyone?

61 look-sees and no responses… That’s encouraging…

Anyways, I looked at a Wireshark capture of the traffic between the two servers in question, and I noticed something odd… During a single connection attempt, it appears to make numerous DNS lookups, and some of them don’t seem to make sense. I took the capture from the DMZ server (server.foo.invalid) while logging in via a SparkWeb client. The only friend the logged-in user has is located on the internal server (Server.internal.mydomain.com). During this process, it appears the server attempts to connect 3 different times to the remote (internal) server, each time using a different name space.

First round, it’s looking for Server.Internal.MyDomain.Com. This resolves only when it falls to just doing an A request, because the server doesn’t have it’s own dedicated DMZ zone. Second time it requests resolution, it tries for Internal.MyDomain.Com and resolves early in the process because I have the SRV records built for _xmpp-server and _jabber both and it’s looking at the proper name space. Lastly, it attempts resolution for a third round, this time only looking for MyDomain.Com. This eventually falls to requesting the A record, and it gets numberous results for the name space’s DCs.

To Summarize:

Query for SRV _xmpp-server._tcp.server.internal.mydomain.com
No Such Name
Query for SRV _jabber._tcp.server.internal.mydomain.com
No Such Name
Query for A server.internal.mydomain.com
Response A 10.xxx.xx.xx

msolap >> << xmpp-server

Query for SRV _xmpp-server._tcp.internal.mydomain.com
Response SRV 0 0 5269 10.xxx.xxx.xxx

tams >> << xmpp-server

Query for SRV _xmpp-server._tcp.mydomain.com
Response SOA DomainController.internal.mydomain.com
Query for SRV _jabber._tcp.mydomain.com
Response SOA DomainController.internal.mydomain.com
Query for A mydomain.com
Response (Multiple Domain Controller IP’s)


Now, to me, this seems like a bug. Other than adding _xmpp-server and _jabber SRV records to MyDomain.Com, are there any other solutions you guys can see?

If someone does have some ideas find this, I’ll keep checking back from time to time. Going to have one of our developers look at the Source and see if he can fix this. If not, guess we’ll look for another solution that doesn’t use OpenFire.