Troubleshooting GTalk s2s Communication

Hey Everyone,

I’ve been running a Jive/Wildfire/Openfire XMPP server since 2005. I’d never once had a problem with s2s communication until now. It seems that s2s communication between my server (hhsn.net) and Google Talk has stopped working. s2s communication between other XMPP server seems to still function fine as I created an account on another XMPP server and was able to IM users on my server. I hadn’t specified any SRV records before, so I added those appropriate for XMPP. Still no go. I’m not sure how to go about troubleshooting this!

Can somebody verify that DNS for my domain is set up properly? XMPP usernames are user@hhsn.net and the SRV records should be pointing XMPP and Jabber to ‘jabber.hhsn.net’.

Has Google made any recent changes that require some configuration changes on my end?

Any help would be greatly appreciated!

I just tried dig SRV _xmpp-server._tcp.hhsn.net and it resolves correctly to jabber.hhsn.net. I’m replying because we have a suspiciously similar situation to you - GTalk s2s was working fine up until about the same date that you mention, which suggests Google has changed something at their end to mess up Openfire.

There is some stuff in the debug logs:

2010.05.10 12:34:12  LocalOutgoingServerSession: OS - Trying to connect to gmail.com:5269(DNS lookup: xmpp-server1.l.google.com:5269)
2010.05.10 12:34:12 LocalOutgoingServerSession: OS - Plain connection to gmail.com:5269 successful
2010.05.10 12:34:12 LocalOutgoingServerSession: OS - Going to try connecting using server dialback with: gmail.com
2010.05.10 12:34:12 ServerDialback: OS - Trying to connect to gmail.com:5269(DNS lookup: xmpp-server.l.google.com:5269)
2010.05.10 12:34:12 ServerDialback: OS - Connection to gmail.com:5269 successful
2010.05.10 12:34:12 ServerDialback: OS - Sent dialback key to host: gmail.com id: 0B52476E22BF4ECD from domain: jabber-test.warwick.ac.uk
2010.05.10 12:34:32 OutgoingServerSocketReader: Finishing Outgoing Server Reader. No session to close.
java.io.EOFException: no more data available - expected end tag </stream:stream> to close start tag <stream:stream> from line 1, parser stopped on START_TAG seen .../streams" xmlns="jabber:server" xmlns:db="jabber:server:dialback">... @1:141
at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:3035)
at org.xmlpull.mxp1.MXParser.more(MXParser.java:3046)
at org.jivesoftware.openfire.net.MXParser.nextImpl(MXParser.java:76)
at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)
at org.dom4j.io.XMPPPacketReader.parseDocument(XMPPPacketReader.java:317)
at org.jivesoftware.openfire.server.OutgoingServerSocketReader$1.run(OutgoingServerSocketReader.java:93)

Then later, the final failure:

2010.05.10 12:36:12  ServerDialback: OS - Time out waiting for answer in validation from: gmail.com id: 0B52476E22BF4ECD for domain: jabber-test.warwick.ac.uk
2010.05.10 12:36:12 OutgoingSessionPromise: Error sending packet to remote server:
<presence to="thisuser@gmail.com" type="subscribe" from="n.howes@jabber-test.warwick.ac.uk"/>
java.lang.Exception: Failed to create connection to remote server
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.sendPacket(OutgoingSessionPromise.java:252)
at org.jivesoftware.openfire.server.OutgoingSessionPromise$PacketsProcessor.run(OutgoingSessionPromise.java:216)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)

The main thing seems to be the error parsing the XML. Perhaps that might point an Openfire dev in the right direction.

Our problems with our test server might be related to a firewall - I’ll have to investigate and reply again once we’ve sorted that out, or if we manage to get it working.

So we got an exception added in the firewall for our test server and now everything works. The other thing we had to change was that we had a DNS override for gmail.com and googlemail.com in our Openfire system properties, I think to handle an older bug where it didn’t like there being multiple records in an SRV response. This override broke when Google changed IP addresses, but it seems the override is no longer necessary so we’ve taken it out.