Weirdness with 2.2.0rc

I tried upgrading to 2.2.0rc today and was met with a few problems. Unfortunately I don’‘t have any logs to share at the moment, but here’'s what happened.

  • I had almost no s2s capability. I attempted to join several MUCs and was unable to all but one time out of 15. Messages sent to other Jabber buddies went undelivered.

  • My transports for AIM/MSN/ICQ (all the pytransports) kept sending me presence information over and over and over again.

I just went back to 2.2.0 beta2 and figured I’‘d post here and see if anyone had ideas. I may have upgraded incorrectly too, as I just overwrote everything except the configuration files in conf (I’'m using MySQL for my datastore). Has anyone else had these problems? Any ideas?

MysticOne

Yes. I’‘ve noticed quite a bit of instability with both local contacts and contacts using S2S connections. That’‘s the only problem I’‘ve noticed, but it is serious. Right now, I can connect to the servers, but with a fairly high chance of all contacts failing to show any status or even function. Restarting the servers a few times and reconnecting a few times fixes it, temporarily. Sorry I can’‘t be more specific than that, but as Mysticone has described, it’‘s just been acting, well, weird. I’‘m not sure if my specific problem is occurring with everyone, as this could likely be related to issues I’'ve been having with my subdomain setup. Thanks again for the support.

I may have upgraded incorrectly too, as I just overwrote everything except the configuration files in conf (I’'m using MySQL for my datastore).

I did this a couple of times with nightly builds and got a situation much the same as your describing. My wife called me on Wednesday of last week to ask me what was going on with my Jabber as I was “rapid fire online / offline”. The weird thing was my laptop was in suspend in the back of my truck and I had been on the road for 12 minutes when this started.

I didn’'t worry too much about it as I figured it was something with one of the nightly builds or the way I was upgrading and I knew I was going to be moving it all onto a new Slackware box this weekend any way.

I personally say things go “nuts” on Friday when I saw my work account (again on the laptop that was suspended in the bag next to me) go rapid fire offline/online when I got home.

Since today with the new Slack box and fresh install of JM2.2.0RC1 nothing weird, but I’‘ve been the only one connected to it and I’'ve chatted to myself via my jabber.org account.

So in summary I dont thing the build is bad, but something about the “overwrite/upgrade” route you went.

Peter

Just as a note, I didn’'t upgrade personally. I installed the Release Candidate fresh, then applied the nightly build. Thanks again!

Hey guys,

Do you see any error in the log files? Is there any way I can reproduce the problem you are seeing?

Thanks,

– Gato

Since i moved my JM2 onto a new box I have had no problems. I could res the old box but it would be a little messy…

Let me know…

Peter

Unfortunately, I cannot describe how to reproduce my problem. I personally cannot reproduce this problem at my own will, simply because it occurs too randomly. When the problem does occur, I am able to connect to each of my two servers, but one or the other will indicate all contacts (local and remote) offline, and they do not function at all. If I restart the server, reconnect, and wait about 5 minutes, the contacts will eventually come online again and work properly. They will continue to work for a limited amount of time, then it returns to the problematic state. As nobody else has reported a similar problem recently, I have reason to believe this is still related to the subdomain problems. Feel free to connect to both of my servers again if you need to test anything. The hostnames are cloakedhunter.com and ryan.cloakedhunter.com. In the error.log file on both of my servers, I see the following two error messages repeating fairly often. These specific messages were taken from the cloakedhunter.com log file:

2005.08.02 23:02:36 [org.jivesoftware.messenger.server.ServerDialback.createOutgoingSession(ServerD ialback.java:194)

] Error creating outgoing session to remote server: ryan.cloakedhunter.com(DNS lookup: ryan.cloakedhunter.com)

java.net.ConnectException: Connection refused: connect

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.PlainSocketImpl.doConnect(Unknown Source)

at java.net.PlainSocketImpl.connectToAddress(Unknown Source)

at java.net.PlainSocketImpl.connect(Unknown Source)

at java.net.SocksSocketImpl.connect(Unknown Source)

at java.net.Socket.connect(Unknown Source)

at java.net.Socket.connect(Unknown Source)

at java.net.Socket.(Unknown Source)

at javax.net.DefaultSocketFactory.createSocket(Unknown Source)

at org.jivesoftware.messenger.server.ServerDialback.createOutgoingSession(ServerDi alback.java:130)

at org.jivesoftware.messenger.server.OutgoingServerSession.authenticateDomain(Outg oingServerSession.java:113)

at org.jivesoftware.messenger.spi.RoutingTableImpl.getRoute(RoutingTableImpl.java: 92)

at org.jivesoftware.messenger.net.SocketPacketWriteHandler.process(SocketPacketWri teHandler.java:51)

at org.jivesoftware.messenger.spi.PacketDelivererImpl.deliver(PacketDelivererImpl. java:65)

at org.jivesoftware.messenger.spi.PresenceManagerImpl.probePresence(PresenceManage rImpl.java:318)

at org.jivesoftware.messenger.handler.PresenceUpdateHandler.initSession(PresenceUp dateHandler.java:191)

at org.jivesoftware.messenger.handler.PresenceUpdateHandler.process(PresenceUpdate Handler.java:94)

at org.jivesoftware.messenger.handler.PresenceUpdateHandler.process(PresenceUpdate Handler.java:141)

at org.jivesoftware.messenger.PresenceRouter.handle(PresenceRouter.java:87)

at org.jivesoftware.messenger.PresenceRouter.route(PresenceRouter.java:60)

at org.jivesoftware.messenger.PacketRouter.route(PacketRouter.java:73)

at org.jivesoftware.messenger.net.SocketReader.processPresence(SocketReader.java:3 01)

at org.jivesoftware.messenger.net.ClientSocketReader.processPresence(ClientSocketR eader.java:49)

at org.jivesoftware.messenger.net.SocketReader.readStream(SocketReader.java:208)

at org.jivesoftware.messenger.net.SocketReader.run(SocketReader.java:111)

at java.lang.Thread.run(Unknown Source)

2005.08.02 23:11:27 [org.jivesoftware.messenger.net.SocketReader.run(SocketReader.java:145)

] Connection closed before session established

Socket[addr=/71.35.209.222,port=1625,localport=5269]

On the other server, messages almost identical to these two are also logged. If necessary, I can provide those also. I am most appreciative of your assistance. Thanks!

Well, it seems to be working with the Release Candidate build 2005-08-02, so far. I’‘m going to try the final release and I’'ll tell you if I continue to experience this problem. Thanks!

With continued testing using the final release of 2.2.0, the problem still exists. The follwoing error message still appears in the error.log file every 30 seconds while I am experiencing the problem. Any help with this? I’'d appreciate any support you can provide. Thanks!

2005.08.04 16:54:48 [org.jivesoftware.messenger.net.SocketReader.run(SocketReader.java:145)

] Connection closed before session established

Socket[addr=/71.35.208.77,port=3535,localport=5269]

Hey djholt,

Could you turn on the debug.log and post what you are getting in that file? One thing to check is that the hostname of the servers matches the domain of the packets. And if you are using whitelists permissions then make sure that the hostname of the remote server is present in the list.

Regards,

– Gato

As far as I can tell, all hostnames seem to match everywhere, and no, I am not using whitelists. I have my error.log and debug.log files available for you to take a look at. (http://www.cloakedhunter.com/error.log and http://www.cloakedhunter.com/debug.log). Upon analyzing them, it’'s rather obvious that the following two repeating messages are associated with the problem. Thanks again for your support!

DEBUG.LOG:

2005.08.05 13:00:14 Connect Socket[addr=/71.35.208.77,port=2204,localport=5269]

2005.08.05 13:00:14 RS - Received dialback key from host: ryan.cloakedhunter.com to: cloakedhunter.com

2005.08.05 13:00:14 RS - Error, incoming connection already exists from: ryan.cloakedhunter.com

ERROR.LOG:

2005.08.05 13:00:14 [org.jivesoftware.messenger.net.SocketReader.run(SocketReader.java:145)

] Connection closed before session established

Socket[addr=/71.35.208.77,port=2204,localport=5269]

Any ideas on this, anyone?

Hey djholt,

I will try to reproduce this problem and get back to you later. Sorry for the problem.

Regards,

– Gato

Hey, thanks! I really appreciate your help. In fact, it’‘s actually been working properly for the past 10 hours, which is very rare. Anyway, during this time, I noticed the following message repeating in my debug.log file (although nothing in the error.log file). So, I thought I’'d let you know in case it makes any difference. Thanks again for your support!

2005.08.08 12:01:39 OS - Trying to connect to ryan.cloakedhunter.com:5269

2005.08.08 12:01:39 OS - Connection to ryan.cloakedhunter.com:5269 successfull

2005.08.08 12:01:39 OS - Sent dialback key to host: ryan.cloakedhunter.com id: 9511aaef from domain: cloakedhunter.com

2005.08.08 12:01:39 OS - Unexpected answer in validation from: ryan.cloakedhunter.com id: 9511aaef for domain: cloakedhunter.com answer:<stream:error xmlns:stream=“http://etherx.jabber.org/streams”></stream:error>

2005.08.08 12:01:39 Finishing Outgoing Server Reader. No session to close.

java.net.SocketException: socket closed

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(Unknown Source)

at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(Unknown Source)

at sun.nio.cs.StreamDecoder$CharsetSD.implRead(Unknown Source)

at sun.nio.cs.StreamDecoder.read(Unknown Source)

at java.io.InputStreamReader.read(Unknown Source)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2971)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3025)

at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1144)

at org.xmlpull.mxp1.MXParser.nextToken(MXParser.java:1100)

at org.dom4j.io.XPPPacketReader.parseDocument(XPPPacketReader.java:268)

at org.jivesoftware.messenger.server.OutgoingServerSocketReader$1.run(OutgoingServ erSocketReader.java:91)

Hey djholt,

Yesterday I was trying to reproduce this problem and I found something interesting though not very happy. And I think that this issue may be what you are experiencing.

The "OS - Unexpected answer in validation from: ryan.cloakedhunter.com id: 9511aaef for domain: cloakedhunter.com answer:<stream:error xmlns:stream="http://etherx.jabber.org/streams"><not-authorized"[/i] is a consequence of "RS - Error, incoming connection already exists from: "[/i] which you mentioned you were having. What I found yesterday is that under certain circumstances (eg. router, firewall or NATing involved in the s2s communication) the Java VM is not alerted that a socket connection was closed. Not even sending heartbeats raises any exception so Java assumes that the packets have been sent fine.

Only after many data was sent the TCP layer alerts Java that the connection was lost. This problem generates missed online notifications like the one you and other people reported. This problem can also generate the "RS - Error, incoming connection already exists from: "[/i] warning that you are seeing since the server assumes that the remote server is already connected.

We are going to find a workaround for this problem. I will keep you updated when a fix is available. Meanwhile, if you find more information that may help us understand more this problem please post it.

Thanks,

– Gato

Well, I’‘m glad you may have found the problem. You mentioned NAT, so I just thought I’‘d let you know that both of these servers are behind NAT routers/firewalls. Because of this, the servers are not able to access themselves using their assigned FQDN, but only by their direct IP addresses. In my experience, this can cause significant confusion with servers running behind NAT. I have no idea if that is of any significance with this problem, but I just thought you should be aware of the network configuration of the servers. I’'ll let you know if I discover anything new. Thanks again for your help!

Figure anything out?

Hey djholt,

Some time ago there was an ACK JEP that was proposed but was finally rejected. The solution would be to use some kind of ACK to confirm that the message was received by the remote server. I sent a post to standards-jig mailing list (where the XMPP spec and JEPs are discussed) about this problem. I will be on vacation the next week so we will have to wait for a fix. Have you checked if there is a TCP setting that may be modified to fix this problem?

Regards,

– Gato

Gato, have you gotten any closer to finding a fix for my problem? Have you received a reply on the mailing list?

Anything?