What's the difference b/t clustering and connection managers?

keith_thornhill · October 15, 2007, 10:43pm

both seem to deal with scalability. what are the details which affect the implementation of these?

thanks.

Gaston_Dombiak · October 16, 2007, 12:08am

Hey Keith,

Connection Managers were created, a long time ago, when Openfire was not able to handle asynchronous connections. In those very old days the server was not able to scale to more than 6,000 concurrent connections. So the idea of Connection Managers was to have several of them in front of the Openfire server so that people would connect to them. Each Connection Manager was able to scale to 6,000 connections.

When we added NIO support to Openfire the use of Connection Managers was not that important since a single Openfire was able to scale to 170,000 concurrent connections (note: we stopped our tests in 170K since we run out of user accounts) or more. Connection Managers also have NIO support so their scalability improved quite a lot too.

But there was still a big problem with this solution. Even if you have 10 Connection Managers connected to a single Openfire if your Openfire server goes down you will be affecting a big number of users. In other words, you would have a scalable solution but with a single point of failure that would compromise the XMPP service. That is where clustering appears.

With clustering you are avoiding the single point of failure problem. That means that you will now be able to have several machines running Openfire and hosting the same XMPP domain. And if one of those machines goes down the XMPP service will still be available. Of course, you can still use Connection Managers in front of an Openfire cluster but you might want to reconsider that architecture and instead use the CM machines as cluster nodes. With clustering you have scalability as well as fail-over.

Regards,

– Gato

keith_thornhill · October 23, 2007, 8:57pm

thanks a bunch for such a robust answer.

along that, can you use connection managers along with server-to-server connections to mitigate the SPOF?

also, will clustering only be available via the “enterprise” version?

thanks,

Gaston_Dombiak · October 23, 2007, 9:12pm

along that, can you use connection managers along with server-to-server connections to mitigate the SPOF?

Currently, CMs are only usefule for client-2-server connections.

also, will clustering only be available via the “enterprise” version?

Yes, being a feature for mission critical solutions it was decided to be part of the enterprise edition. BTW, the beta promotion is still active so if you want to help us test the 3.4.0 version you can get valuable discounts.

Thanks,

– Gato

keith_thornhill · October 23, 2007, 9:48pm

let me be a little more specific.

would a setup as follows be useful in terms of scaling w/o clustering?

ServerA with connection managers CM-A, CM-B, CM-C
ServerB with connection managers CM-D, CM-E, CM-F
s2s connection b/t server A and B.

Gaston_Dombiak · October 23, 2007, 10:30pm

Hey Keith,

Do you know the traffic per second you will put on the servers? And how many concurrent users you will have per server? You might be fine even without using CMs.

Regards,

– Gato

keith_thornhill · October 23, 2007, 10:57pm

at first we could easily run everything on one server with no CMs, (super light load, ~10,000) but if i do my job right, we will definitely be pushing the limits for concurrent users. (i.e. more than 150,000) :] i’m unsure about the expected traffic per second.

the main issue is that i need to have a server on each coast to reduce latency. i need multiple servers to work together so users connected to ServerA and ServerB could be in the same MUC room, chatting together.

i haven’t messed with s2s stuff yet, so i’m not sure if it would suit my purpose.

Gaston_Dombiak · October 23, 2007, 11:11pm

Hey kthorn,

at first we could easily run everything on one server with no CMs, (super light load, ~10,000) but if i do my job right, we will definitely be pushing the limits for concurrent users. (i.e. more than 150,000) :] i’m unsure about the expected traffic per second.

There are two variables to consider when dealing with scalability. One is number of concurrent users and the other one is traffic. Having just a few hundreds users generating hundreds of thousands of packets per second could kill a server. On the other hand, having hundreds of thousands of barely active users will not be a great problem to the server. That is why I was asking about the expected traffic. That is the key piece of information you need to collect. High traffic means high CPU usage. Using CMs is a way to reduce the CPU burden put on the server and share it with the CMs.

the main issue is that i need to have a server on each coast to reduce latency. i need multiple servers to work together so users connected to ServerA and ServerB could be in the same MUC room, chatting together.

If users in each coast will typically speak to users in the same coast then your solution is just fine. Even if users are in the other coast performance should be acceptable (of course depends on your network setup and usage). Note that using 2 servers (without clustering) means that each server will have its own domain (e.g. server1: east.mycompany.com and server2: west.mycompany.com). So users should have an account in only one server (e.g. john@east.mycompany.com). Rooms in each server can be accessed from other servers. So that part is fine too.

Regards,

– Gato

keith_thornhill · October 23, 2007, 11:30pm

again, thanks for all this information. i really appreciate it.

does your point about users having an account in only one server take into account external user maps? (via external mysql database) it seems that if both server A and B pointed to identical user databases, then a user could theoretically connect to either and still be uniquely identifiable.

same thing with chat rooms, if the storage engine for MUC is external (mysql) then both servers would see the same list of MUC rooms. if a chat room was created on west.mycompany.com and a user connection to east.company.com and tried to join chatroom@east.company.com does the s2s connection route to the chat room on west.mycompany.com, or would i need to write a plugin to intercept the join, check for existence, and reroute?

don’t feel like you need to answer all my questions since you’ve already done so much. i’m sure i’ll find out if it doesn’t work when i actually start implementing stuff, but just trying to plan ahead. :]

Gaston_Dombiak · October 24, 2007, 12:00am

Hey kthorn,

That won’t work. That would be like an ad-hoc solution for clustering. If you connect 2 servers to the same DB then both servers will share the same domain.

Regards,

– Gato

LG1 · October 27, 2007, 6:06pm

Hi,

if you use an external LDAP or JDBC auth provider you could use indeed the same user base for authentication. Maybe you need to modify the exiting providers to make sure that the users get the right domain appended. Reference: http://www.igniterealtime.org/builds/openfire/docs/latest/documentation/db-integ ration-guide.html

Every Openfire server still needs it’s local database which stores the server name (east.example.com and west.example.com) and all other things like MUC information. So if you are looking for the “cheap” solution without cluster support you may have a lot to do to get it done, a packet interceptor which routes packets depending on the MUC rooms could probably do this.

One should limit the amount of shared MUC rooms, maybe by a naming convention, so you could create all rooms which start with “east_” on east.example.com and route all packets for “east_” from west to east. Maybe do the same thing for “west_” room names to keep the plugin simple. Keeping all other MUC rooms local would reduce traffic a lot.

LG

peter2 · January 23, 2009, 7:52am

Hi Gaston

I am wondering how you can create 170k TCP socket connections in one physical machine.

When we added NIO support to Openfire the use of Connection Managers
was not that important since a single Openfire was able to scale to
170,000 concurrent connections (note: we stopped our tests in 170K
since we run out of user accounts) or more. Connection Managers also
have NIO support so their scalability improved quite a lot too.

For TCP protocol, only two bytes are allocated for port number.

Am i wrong in some place?

Best Regards

Peter

sefs · May 11, 2009, 1:36pm

This is the best news I have heard all day. It’s now 2009 May 11th.

I have to do an implementation of this with an approx. maximum of 25000 users. We strongly suspect it is less person than this…but to be on the safe side we rounded up to 25000 people.

From what you are saying…today we don’t have to worry about connection managers anymore at all? That one server would eaisly handle 25000 concurrent users??

My implemenation was to have two machines, one running the server and a connection manager, and the other a connection manager to handle all 25000. From what you are saying it seems that I can get away with one server … but what I would now like to do is install a second openfire on the second server…remove all connection managers and use the two machines as 2 two node cluster. Will this work for our 25000 users well.

Thanks.