S2s oddities with Google Talk

All:

Have an odd situation with s2s connections. If I try to add a Google buddy from the Wildfire side, everything works fine. A request is sent to the Google account, a confirmation is sent back to the Wildfire side, and everything works as you’‘d expect. However, if the Google buddy trys to add the Wildfire buddy, they get the same request/confirmation boxes, but status isn’'t shown on either end - both parties look offline. Messages can be delivered back and forth, however.

Has anyone else seen this sort of behavior?

I have almost the same problem. I’'m using wildfire build from 2006-02-04 and on the wildfire side i sometimes see my gtalk contacts.

I am using wildfire 2.4.4. When first start up the Wildfire server I can add contacts both ways and start chatting. But after a little bit (5-10 minutes maybe) I no longer can communicate with my contacts and they dissappear. For a short bit I can send messages to my gmail contacts but they cannot send back.

It shows jcluff@xmpp.securedminds.com/spark is offline and can’'t receive messages right now.

So I hit the admin console and I saw that I only have an outgoing session so I looked up the error log and found these errors:

2006.02.08 10:04:48 org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:159) Connection closed before session established

Socket[addr=/64.233.166.129,port=30649,localport=5269]

2006.02.08 10:05:30 org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:159) Connection closed before session established

Socket[addr=/64.233.166.129,port=30772,localport=5269]

Any thoughts as to the cause of this? Is looks like wildfire is closing the connection but I am just not sure.

I also have the seen then not seen issue for the last few weeks with Google Talk. Lately, it has been more seen that not seen.

Also add problems with inviting Google IM accounts to a Conference Room. I tried using the PSI 0.10 Linux client. Google IM account cannot connect to the conference rooms but using the same PSI client a local account on Wildfire works… This could be a newbee issue because I have not testing with any non-Local accounts other than Google IM accounts.

I’'m seeing this behavior also.

Things will work fine for about 5 minutes and then connectivity from gmail/gtalk to wildfire stops. The debug log entry is:

2006.02.15 22:44:31 Connect Socket[addr=/64.233.166.129,port=19502,localport=5269]

2006.02.15 22:44:31 RS - Received dialback key from host: gmail.com to: hoffmang.com

2006.02.15 22:44:31 RS - Error, incoming connection already exists from: gmail.com

I’'ve been trying not closing idle server connections and closing them in 10 minutes of idle time. Neither seems to make much of a difference.

-Gene

I’'m also having these issues… has anyone made any rhyme or reason of them? Things go back to normal after I manually terminate the google s2s connection.

Hey Guys,

Based on Gene debug information I see that google is trying to establish many connections to Wildfire. This issue was reported by another community member and he was testing a version that removes the restriction that only 1 connection is allowed from remote servers. Once that testing is over we can include that fix so you don’'t have this problem.

Thanks,

– Gato

This problem can get more complicated if you depend on DNS SRV records to identify the XMPP server in your network. Google tries to connect to your hostname, instead of your domain name, which is a significant difference in these cases. Wildfire drops these connections, because ‘‘host name’’ does not equal ‘‘domain name.’’ You’'ll see a lot of

“Closing session due to incorrect hostname in stream header.” messages in warn.log if you’‘re suffering the same problem. As a side-effect of this problem, Google presence stanza’'s are routed to only one client per user: the client with the highest priority setting.

I’‘ve talked this over with Gato off forum, and he’'s talking to a number of people to get this fixed (it turned out to be a misinterpretation of the XMPP Core specification by Google, not Wildfire).

Returning to the problem at hand: wildfire checks if a connection to or from a federating domain already exists. If it does, it denies the second attempt. I’‘ve turned off this check by commenting out lines 426 to 435 in the org.jivesoftware.wildfire.server.ServerDialback class (you’‘ll have to make a few minor modifications to be able to compile again). As far as I can see, it works fine. You’‘ll see more than one incoming connections for google. Make sure that you set a timeout value for server connections though! You’'ll swamp your server in idle connections otherwise. (Timeout values for server connections can be set in the admin panel, under ‘‘Server to Server.’’ The default value is 30 minutes, I think).

Because I’‘ve been suffering from both problems I described here, I can’‘t be sure that my last solution works 100%. It’‘s hard for me to pinpoint which problem is caused by what cause. I can only say that things -seem- to work a little bit smoother when you modify the code to allow for multiple connections, but I won’'t recommend doing this on a production server without further testing.

Well, that’'s about it I think.

– ‘‘another community member’’

It appears to me that a change has been made at Google. I have been seeing better S2S behaivor with Google. Oh it been good for about 2-3 days…

FYI, Guus’'s enhancement has been incorporated and now (latest nightly build) it is possible to have multiple incoming connections from the same remote server. This will also be useful when dealing with high s2s load.

Regards,

– Gato

I’‘ve just installed the nightly wildfire_2006-02-26.tar.gz. So far so good, but I’'ll report back in a few hours to confirm that this is now resolved.

BTW - I am not running SRV records.

-Gene

Update: It seems to be working but has one issue. If I have a conversation with a test account, things work fine. However if I let it idle after a while, the first IM sent from Gtalk (inside gmail) to the wildfire user is lost. The very next line does seem to come through though.

Message was edited by: hoffmang

This problem can get more complicated if you depend

on DNS SRV records to identify the XMPP server in

your network. Google tries to connect to your

hostname, instead of your domain name, which is a

significant difference in these cases. Wildfire drops

these connections, because ‘‘host name’’ does not equal

‘‘domain name.’’ You’'ll see a lot of

"Closing session due to incorrect hostname in stream

header." messages in warn.log if you’'re suffering the

same problem. As a side-effect of this problem,

Google presence stanza’'s are routed to only one

client per user: the client with the highest priority

setting.

I’‘ve talked this over with Gato off forum, and he’'s

talking to a number of people to get this fixed (it

turned out to be a misinterpretation of the XMPP Core

specification by Google, not Wildfire).

Guus,

I am seeing this exact issue in my environment. We are using DNS SRV records with our pubilc DNS provider to get to our WF server. (NAT’'d behind firewall with ports 5269 and 5222 open). Chat works okay with gmail.com users for a few minutes, then presence info quits working and chat breaks down. I have a ton of the following messages in my Warn Log

2006.02.27 08:34:30 Closing session due to incorrect hostname in stream header. Connection: org.jivesoftware.wildfire.net.SocketConnection@daa6a6 socket: Socket[addr=/192.168.1.4,port=4835,localport=5269] session: null

Any work-arounds that you know of for this issue? You mentioned this is a Google problem, not WF, anyway to know their progress on this if any?

-Erik

Hi Erik,

Last week I discussed the problem with Gato. While we weren’‘t sure how the interpret the XMPP specs, Gato later discussed the problem with Peter Saint-André (Jabber Software Foundation). Peter and Gato agreed that Google made an error when they implemented the dialback functionality. He told Gato he’‘d have a talk with someone at Google. I haven’'t heard since then.

A workaround could be to alter the code a little to disable the check that leads to the error. I haven’'t tried yet though.

  • Guus

Any work-arounds that you know of for this issue? You

mentioned this is a Google problem, not WF, anyway to

know their progress on this if any?

It will help me debug the problem if you can provide examples of the problem stanzas (with private information redacted). If that’'s difficult to provide, can you provide a more detailed description of the what the code checks before writing the “Closing session” log message?

gburd,

Are you asking me (end user) to supply something, or the Spark developers? Or both?

I am asking for more details about “Closing session due to incorrect hostname in stream header” log message from anybody who can provide them.

I was working with another user who switched his domain from jabber.example.com to example.com. His server stayed at the host jabber.example.com. After the switch, the Google Talk service continued to send dialback stanzas to the domain jabber.example.com. It turned out that the Google Talk service was trying to connect to the domain jabber.example.com because a Google Talk user still had a user@jabber.example.com contact in his roster.

Guus, could this be the cause of your server getting the host name instead of the domain name in the dialback stanzas?

I have the same problem but instead of with wildfire it’'s between two wildfire servers I have running.

Basically what happens is the person that sent between person A and person B, person B gets the auth box etc as usual. When person B accepts the person A is able to see their status and everything fine. But person B is never able to see person A status at all, they always appear to be offline to them. But they are both able to still send and receive messages to each other.

Gary,

Sorry for my late reaction, I was out for a couple of days.

I am asking for more details about "Closing session due to incorrect hostname

in stream header" log message from anybody who can provide them.

The error is thrown when someone tries to connect to your Wildfire server, using another domain name than what’'s configured for your server. The SRV records scenario which I described in a previous post will cause the error to appear. Note that this is not a Wildfire error: other servers should use a domain name, not a server name (or any other name) if they want to connect to your server.

In case you want to look at the code yourself: the error is thrown on lines 600 and 601 in url=http://www.jivesoftware.org/fisheye/viewrep/svn-org/wildfire/trunk/src/java/ org/jivesoftware/wildfire/net/SocketReader.java?r=3484#l600

java.org.jivesoftware.wildfire.net.SocketReader.java[/url] (for future reference: check revision 3484).

was working with another user who switched his domain from jabber.example.com to

example.com. His server stayed at the host jabber.example.com. After the switch, the

Google Talk service continued to send dialback stanzas to the domain

jabber.example.com. It turned out that the Google Talk service was trying to connect

to the domain jabber.example.com because a Google Talk user still had a

user@jabber.example.com contact in his roster.

The problem you describe is not erroneous, as far as I can see. If a Google Talk user wants to talk to someone in his roster at the domain jabber.example.com, Google will try to connect to that domain - even if the domain turns out to be invalid.

Regards,

Guus

Jason,

I’‘m not sure why this is happening. I’‘m not using two Wildfire servers, so I can’‘t try to replicate your problem. Try to debug the package streams - maybe you’'ll see something off.

If the problem is related to the problem described in this thread, take particular notice of the server- and domain names that are transmitted. In general, Wildfire doesn’‘t make use (or: shouldn’'t make use of) specific server names - everything is routed based on domain names (which could coincide with a server name, of course).

Regards,

Guus

The Google Talk Service sets stream tag “to” attribute to receiving server’‘s domain. The server’'s host name is not used in in the stream tag.

I will remove the “to” attribute because it’‘s not required. It’'s probably a good idea for Wildfire to remove the “to” attribute check because this attribute is ignored in the protocol.

Based on a quick look at the Wildfire code, I think one of these things is causing the log message:

  • A Google Talk user has a contact with your server’'s host name.

  • SocketReader.serverName is not set to the server’'s domain name.