Again. I get the same error message! at user count 490

2007.01.11 09:00:14 org.jivesoftware.wildfire.net.BlockingAcceptingMode.run(BlockingAcceptingMode.ja va:62) Trouble accepting connection

java.lang.OutOfMemoryError: unable to create new native thread

at java.lang.Thread.start0(Native Method)

at java.lang.Thread.start(Thread.java:574)

at org.jivesoftware.wildfire.net.BlockingAcceptingMode.run(BlockingAcceptingMode.j ava:52)

at org.jivesoftware.wildfire.net.SocketAcceptThread.run(SocketAcceptThread.java:11 1)

What is the deal with this. My ulimit is currently at 4096, my Xms is 512m, my Xmx is 1500m… Why am Istill getting these errors???

jeff

I THINK I MIGHT HAVE FOUND A BUG in 3.0.0. It isn;t shutting off the ports it usees to authenticate the users in Novell EDIR (LDAP) Ihave currently 1300 Active LDAPconnections.

Message was edited by: jeff_garner

Jeff,

LDAP not shutting down the ports would be a huge problem and is likely the cause of the OOM. The LDAP code was updated a lot for 3.1 – any chance you could test with that release to see if the problem is resolved? Otherwise I’'d like to get it fixed ASAP.

Thanks,

Matt

Currently I cannot upgrade the server that is in production, however I AM going to download the 311 code for the test box and find a load test script or or something to get this running. Currently I have taken my ulimits and increased the file descriptor to 10490 and given the -H value unlimited.

I have turned the max mem lock down to 4, decreased the amount of process per user, wanting a refreshing of all connections. Hopefully this will facilitate the cure for what ails the server. Not sure if it will resolve or not, but it seems that with every tweak I get a little closer. The magic number of users was 492, I currently have 502 users logged in and I do not have that error message.

As soon as I get the problem with production stable, moving to the test box to d/l 311 or 32, and test it. Know of any good load testers that will simulate 5000 to 10000 users?

Jeff

Hi Jeff,

ah, an old Wildfire version and LDAP. So you FOUND A BUG in 3.0.0. It isn;t shutting off the ports it usees to authenticate the users in Novell EDIR (LDAP) Ihave currently 1300 Active LDAPconnections.

Did you already take a look at http://www.igniterealtime.org/forum/thread.jspa?messageID=127516 ? To make it short I did post there:

  1. So I wonder if it would help do set “” to disable it completely, the auth and all other connections should be closed immediately.

Pat did reply:

Disabling the LDAP pooling did the trick!

LG

Sorry no, I did not attempt to search the forms. A short-sighted reaction on my part. As I see several time_waits in the netstat -a|grep ldap list Iam monitoring, I will put this into place.

Also, was reading on Sun’'s webiste and wanted to get someones opinion on this.

Icurrently have -Xms=512m -Xmx=1500m. I believe it was you LG, that suggested I drop Xmx to 1300, which Iwill do. As this is dual proc 64 bit system, I believe it was also suggested to put -XX:UseParrallelGC in as well. The question that I have is what impact do you think putting -XX:PermSize=128m -XX:MaxPermSize=128m will have? Unlike Pat in the other thread, I have 4 gb of Ram, so I have a little extra to play with, however Iwant to make as many positive corrections on the next server bounce as possible, to minimize the impact to the 500 users that have already migrated. I do not know if you monitor this thread at night, so if I do not see a reply by 5am cst I will implement :

Xmx1300, permsize, maxpermsize, connectionpooling false, and ParrallelGC… Or if anyone else has a suggestion before LG can get back to me, please reply on the thread, at this point I am looking for any helpful advice.

Thanks for the Help, Matt, LG…

Jeff

Ok. so now the new magic number is 499… haven;t gotten more than that to attempt login. One thing though. I set my max mem to 1300, and with 499 users logged in I see mem used at 540 mb… That is somewhat troubling as Ihave far more NOT migrated than Ido actually chatting on the server… Guess I need to take it back to 1500 max or even higher… ewww…

Jeff,

So, to clarify:

  • You upgraded to 3.1?

  • The LDAP issue is fixed?

Also, the memory used should not be proportional. In other words, 1000 users shouldn’'t be 1080 MB mem if 499 is users is 540 MB.

Regards,

Matt

Hi Jeff,

I did mention the 1,3 GB memory as I’‘m quite sure that a java process can allocate only 2 GB on 32 bit servers. The early 64 bit JVM’‘s were very memory intensive, they took nearly the double size of 32 bi JVM’‘s to store data, I don’'t know if this is still a problem. Anyway setting Xmx to 1,5 or 3 GB should be no problem for 64 bit.

Did you disable the ldap connection pool?

For the parameters see http://www.tagtraum.com/gcviewer-vmflags.html - it may be better not to set MaxPermSize so the JVM can allocate as much as it needs. To tune this value you need to enable a gc log or to connect to your JVM with VisualGC.

(-XX:+UseParallelGC … Under 1.5 this option is the default option, if the machine is a server-class machine.)

LG

Ok. To clarify:

I did not upgrade this server to 3.1… still using 3.0.

I was told a good general rule was 1 mb per user, however this looks like it is gonna be small, so as Iget closer Iwill have to down it for the bump in Mem. The number of users got to 507 on Friday without a oom error.

I did turn connection pool to false, so the connections are not staying latent forever. They close in a few seconds which is greatness.

Yes this is a server class 64 bit machine running SLES 9. Sun v40 twin 3.2 AMD procs.

As I am not even a novice in regards to Java I used the different parms Ifound on the SUNwebsite for what they were described to do… I have PermSize set, also have UseParrallelGC set too.

It is still running around 500MB, but Inoticed that once it settles down, the server will slip for a time to use 200 MB when resting (not caching LDAP entries).

SO don’'t know if Iam ‘‘fixed yet’’ but certainly on my way.

Matt,

Currently waiting on information of professional support over the current support structure we have in place. That is when my decision will be made on upgrading the box… don;t wanna down it too many times, too many big names on the company use it, and well I enjoy job stability… heh…

W00T! one of the settings seemed to do it (LDAP I believe)… 570 users this morning, no oom’'s…

gonna call this ‘‘well on it’‘s way’’, don;t wanna put out any bad mojo…

That’'s good news. Keep us updated.

-Matt