Scalability: Turn it to Eleven

The parody movie This is Spinal Tap includes a hilarious scene where the members of the “loudest band in the world” point out that their amplifiers can actually be turned up to eleven. I was reminded of the scene as we’ve been doing scalability testing on the next version of Wildfire. It turns out we had little idea of how far we could push the limits – we keep cranking the “scaling knob” louder and Wildfire just keeps scaling.

So far we’ve hit 33k concurrent users with a single connection manager, running on a (old) Sun 280R server. CPU usage in the connection manager and core Wildfire server both hovered around 7% each. Those numbers are a pretty huge improvement over the previous version of Wildfire, which was barely able to hit 7500 concurrent users with maxed out CPU and memory usage. We’re also only part way through the optimization process. The goal for the 3.2 release is to demonstrate 100k concurrent users on a single domain.

How did we get here? In Wildfire 3.2 we decided to replace our networking layer with Apache MINA, giving us support for asynchronous I/O and a foundation for better scaling. For testing, we created a Wildfire plugin that generates users, populates rosters and creates vCards. The rosters are populated with 20 to 30 contacts each. We’ve been using the load testing tool Tsung. Tsung is a master-slave tool, and for our tests we are using four slaves.

As we’ve tested ever higher scalability numbers, we’ve made lots of core code improvements along the way. For full details, see my forum post. The goal for this week is to reach 50K concurrent users. But based on our experiences over the past couple of weeks, that might not be much of a challenge at all.

awesome! glad to hear mina is working out for ya

Hey Peter,

I think that MINA was one of the easiest integrations I ever done considering that replacing an entire networking layer is not a small deal. After I finished doing the refactoring and integration work I was expecting to find a big number of issues while testing the integration. Surprisingly, the only issue that we found was with stream compression. Compression is still an open issue that I’m planning to get it fixed this week. Besides this problem the migration to MINA was quite smooth.

Regards,

– Gato

Can you say which Java SE release this is? Also, which version of Solaris or Linux is the Sun 280R running? It would be interesting to know if good scalability you are seeing is with the Solaris /dev/poll or the Linux epoll SelectorProvider.

This looks wonderful!

Would you mind sharing the tsung confiuration that you are using? I am stress loading some servers, and i am not sure if muy tsung test looks “real enough”.

The way you manage this software, the community are great. Keep going!!!

Alan – we’re using Solaris 10 and testing with Java 6. We haven’t had a chance to test on Linux with epoll, but we’re eager to see how those numbers come out. One of our goals is that by the end of this round of scaling testing we’ll have a matrix of what kind of performance one can expect to get on different platforms.

Matiyam – thanks for your kind words! Can you post the Tsung config questions you have in the forums? Perhaps as a reply to Gato’s tuning thread linked in his blog entry? One thing we’re trying to do now is make Tsung do disco and vCard operations so that testing is as realistic as possible.

Wow - that’s awesome guys. To see that kind of performance is a real testament to the quality of work you all have done with Wildfire, the MINA framework and Java in general.

100K is single dot,it’d conquer to more private chat server in future