Intermittent problem: No response on AIM, Yahoo works fine

I’ve written a bot (using Ruby and the xmpp4r library) that uses Openfire to talk to Yahoo and AIM. It’s been running fine in production for 4 or 5 months. But lately I’ve twice seen a severe problem affecting AIM but not Yahoo. (I’m running Openfire 3.5.0 with the IM Gateway 1.2.3, running on Linux in production)

The problem is that IMs sent to the bot from AIM don’t get a response. If I look at my logging code for the bot, I see that the request gets delivered by Openfire to the bot, the bot processes it, and apparently successfully sends the response to Openfire. Looking at the Openfire debug log, I don’t see the response at all. There is no indication that Openfire even received the response. (I’m guessing that it did receive the response, but something happened before Openfire/IMGateway writes the first message to the log. But that’s just a guess.) The Openfire debug log shows no indication of an error.

This has happened twice recently, once for about 24 hours, and once for about 48 hours. In both cases the problem eventually went away without anything being done to make it go away.

In the first case, I manually sent messages from AIM to the bot. It responded to 2 out of 9 messages. The other 7 got no response at all. Looking at the bot logs, all 9 cases looked the same, and from the logs the bot seems to be handling the messages correctly. Looking at the Openfire debug log, in the 2 cases that worked successfully, the log looks fine. In the other 7 cases, there’s no evidence that Openfire/IMGateway even received the response from the bot (but the bot log shows that the response was successfully sent.) I sent some of these messages as quickly as I could type, and it might not respond to the first, correctly respond to the second, then not respond to the third, all within a span of several seconds.

I’m not sure how to debug this further. I haven’t ruled out that this may be a problem in the bot, but the evidence currently seems to point to Openfire. I may try to dive into the IMGateway code and put in more debugging messages, then rebuild and deploy the IMGateway. This is a bit awkward, as the problem is intermittent and only fails on the live production server.

As I mentioned, while AIM is having this problem, Yahoo is working just fine.

On a related note, when the bot sends the response, the expected behavior is that it doesn’t get any reply from Openfire. Is there some way it could get a “message successfully received” reply, or alternatively. some other message it could send to Openfire that would provoke a response, just to check that the communication between the bot and Openfire is basically working?

Thanks for any advice!

Wayne Vucenic