Interesting - we are fighting similar problems with Java VM tuning at our site. We don’'t use a CM, but when we started to reach 5000 concurrent users, strange things began happening. I agree with the ulimit -n (per process file descriptors) suggestion - we bumped ours from 8192 to 16384 (we have a big linux box with 8GB of RAM).
However, we have seen various Java Hotspot compiler errors of the form (which of course causes the server to crash):
Exception java.lang.OutOfMemoryError: requested 2048000 bytes for GrET* in /BUILD_AREA/jdk1.5.0_06/hotspot
/src/share/vm/utilities/growableArray.cpp. Out of swap space?
Exception in thread “CompilerThread0” java.lang.OutOfMemoryError: requested 1053808 bytes for Chunk::new.
Out of swap space?
Now, I’‘m 99% sure that our system isn’‘t running out of swap space (checked that), so a little Googling led to a suggestion to tune the VM by increasing the ‘‘permanent generation’’. Apparently, this is where the VM keeps ‘‘reflective’’ data, pointers to class objects on the heap, etc. I’‘ve already increased our heap to 2GB max, and we are barely touching that, but by default, when using the ‘’-server’’ flag to the VM, the permanent generation is 64MB.
I’‘ve put the following flags into our wildfire.vmoptions file to pass to the VM when it starts (though I haven’‘t restarted since I put them there, as I’‘m afraid to touch that now) - just the PermSize flags are new BTW - we’'ve been running with the stack size and max heap flags for a long time:
-Xss128k
-Xmx2100m
-XX:PermSize=128m
-XX:MaxPermSize=256m
Our production server takes almost 1 1/2 hours to fully stabilize when 5000 people try to have their clients reconnect at once. Frankly, this is a huge problem that the Wildfire folks still haven’'t solved, and it makes it very painful for us upon restart.
Jive Software folk - numerous people have been asking for a ‘‘tuning guide’’ for this sort of thing for a long time. Do you guys have anything like that? Also, specifically around the increase of the ‘‘permanent generation’’, what are your thoughts? If the heap is holding 5000 connection objects (plus who knows what else, since in our case we run about 130 conf rooms as well), could the permanent generation be running out of memory?
I know in the past you guys have said that tuning the VM wasn’'t in your purview as developers of the software, but IMHO, at larger installations, tuning the VM goes hand in hand with the stability of Wildfire, and this needs to be addressed. Thanks.
-Guy Martin