Constant Spark disconnects - "your connection was closed due to an error"

I am attempting to deploy OpenFire with FastPath and Spark. I have about 25 internal users; in my current test build I’m running with only about ten.

All users regularly see the “your connection was closed” error. Some users will ALWAYS see it upon messaging another specific user, while messaging others does not cause the error. Some users see it immediately upon messaging anyone. It’s vexing.

I’m using the newest builds of OpenFire and Spark. My network is robust, the servers (Win2012 Datacenter on VMware) are local. DB server is MySQL 5.6. User authentication is local, not via AD.

OF logs attached. I’m pretty stuck here, and I’d be most grateful for a fix. Thank you in advance for any help any of you might be able to contribute.

-Scott
OFlogs.zip (29738 Bytes)

looks to me like you have a database configuration problem.

Theres’ lots of logs similar to:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table ‘openfire.ofrrds’ doesn’t exist

Did you follow the database setup guide: http://www.igniterealtime.org/builds/openfire/docs/latest/documentation/database .html

If you did not do this, you will need to do it then re-run the first-time setup of openfire. To trigger the setup again, follow sixthring’s advice in this post: Openfire is not responding. Can't login to admin gui or into spark client - #4 by NO_MESSAGES_OR_FRIEN - Openfire Support - Ignite Realtime Community Forums

Thank you very much for your reply.

I did follow those instructions exactly.

I’ve already been looking at my DB (as well as the web server), for any possible causes. Nothing has jumped out so far. I’ll continue looking.

which openfire version are you using? it looks like you have a problem similar to OF-654

I am currently using 3.9.1. I was using the previous version before the 3.9.0 release and had the same problem.

I’m looking over the referenced thread. ofroster.jid needs to be changed to varchar(255), is what seems to be indicated. Is this correct? I used the default DB setup, which created that table as it is.

although i personally have not run into this myself, from the thread it does appear that is the fix.

although this ticket was marked as fixed in version 3.9.0, so i would have thought you would not have this problem… anyhow, try it and we’ll see

Change made, still seeing disconnects

I think my next move will be to put Openfire and MySQL on an otherwise sterile machine, together - 1 tier, no other products using the DB.

hmm… well, the error you are seeing says the openfire tables do not exist in your database… so either something has removed them, or they never existed (meaning there was a problem during the import schema phase of the database setup portion). Did you receive any errors during the database setup (originally when you first setup openfire and followed the database guide)?

No, the tables are all there and intact… It’s more like connection to the DB, by the feel of it, although I see no problems there. Hence my plan to rebuild this environment on a single machine.

that could be. when you first setup openfire, the credentials you enter on the setup page need to have privledges to create a new user and assign privledges, so some sort of admin account. it will then create the openfire database user, a random password unknown to anyone but openfire, and it then uses that. so maybe the credentials you used during setup could not create the user with proper privledges?

I used a DB user with DBA privileges, and was able to add my users manually after setup and DB creation.

I’m getting it running on the other machine now. I’ll report back.

Well, a completely new install on a different machine, with a new DB on the same machine, yields the same results. I’m honestly a little perplexed.

try it with SA account.

also, ensure your db is setup to allow connections, etc… all the normal stuff.

Make sure you got no errors during the database setup.

Try a slightly older version of mysql (not 5.6, try 5.1 or something).

Just trying to determine where the error is occuring.

Just checking in. I have done multiple builds on different Win2012 server machines trying to get a stable instance. Today I am starting over on Debian w/ MySQL5.6. I’ll report back.

Are you using e1000 nic with esx 5.x? If so, thats probably your issue. Change your nic to vmxnet 3 or upgrade your host/vmware tools to 5.1 update 2.

After upgrading my host to 5.x, I ran into some odd issue. Some guest machines were dropping packets and having connection issues. Only the guest running e1000 nic had the issues. After changing to vmxnet 3, the issue went away. A couple of months later vmware released the following kb.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=display KC&externalId=2056468

I am using e1000 NICs in my Proliant ESXi cluster, 5.1 update 1. Looking at the KB article and I will do the update 2 patch. However…

ESXi is apparently not the problem. I have done another build on a Debian box in my cabinet at a datacenter downtown. All previous builds were in-house on my cluster. This clean offsite install did not help - Spark clients on my network still disconnect. This seems to imply that the Spark client is having trouble in my environment. I’m the network admin and my stuff is pretty solid, if I do say so myself, so I find this pretty confusing. I connect and authenticate fine; I just get a bunch of client disconnects.

I need to run Spark as FastPath is the goal here. Actually a customized version of FP, but clearly I can’t introduce exotic code until I get the base build running stably.

are you still getting database errors in your logs? maybe try switching to a previous version of mysql or maybe a different database for testing (try the embedded database, it’s easy and will help determine if we have a database problem still, or something else). in the past, people have had issues with mysql version 5.6…

My new build is on Postgres.

The DB errors we’ve seen in the logs don’t always coincide with the client disconnects. Fantastic, huh?

I may have spoken too soon when I said “Spark is still disconnecting” in this scenario - I saw disconnects right off but it seems to have settled down now. I’ll continue to work on this and report back.

I really want to thank both of you who have taken the time to contribute to this discussion. I’m grateful.

I didn’t say - trying Spark from offsite, no disconnects. Internally, well, I did just get another disconnect but Spark was also behaving funny during a user search just then, so I have not yet deemed it stable or unstable in-house.

DNS resolver issue?