Plenty of allocated memory left but still

jeff_garner · February 13, 2007, 5:17pm

UPDATE Confirmed somethign about the user count 1058. everytime it hits that number of users it stops allowing connections with the error below… anyone have an idea as to why?

Getting Out Of Memory errors.

2007.02.13 11:19:36 org.jivesoftware.wildfire.net.BlockingAcceptingMode.run(BlockingAcceptingMode.ja va:62) Trouble accepting connection

java.lang.OutOfMemoryError: unable to create new native thread

at java.lang.Thread.start0(Native Method)

at java.lang.Thread.start(Thread.java:574)

at org.jivesoftware.wildfire.net.BlockingAcceptingMode.run(BlockingAcceptingMode.j ava:52)

at org.jivesoftware.wildfire.net.SocketAcceptThread.run(SocketAcceptThread.java:11 1)

What is the deal with that? I currently have 1058 users active, but noone else can log in. I have noticed that it seems to have this issue when it starts to cache the LDAP directory for authorized users. Anyone got an idea on this…

Jeff

Also, if someone coule PLEASE explain to me, how if the server has 2.2 GB of ram available, and the allocated memory of Wildfire is 1500MB and it is only using 443MB, how in the world can I be running out of RAM? is this a limitation of Wildfire, or of Java?

jeff_garner · February 13, 2007, 10:42pm

anything from anyone on this? is this a Java error or a Wildfire issue. it has to be some sort of stop in one of these items…

Gaston_Dombiak · February 13, 2007, 10:57pm

Hey Jeff,

Based on the stack trace I would say that you are not using Wildfire 3.2.0. I would recommend upgrading to Wildfire 3.2.0 or latest nightly build. Having said that, have you tried reducing the stack trace size? Could you take a thread dump of the JVM to confirm processes dangling in the JVM?

Thanks,

– Gato

LG1 · February 13, 2007, 10:58pm

Hi Jeff,

is this Wildfire 3.2.0 or the “old” version?

For 32 bit processes there may be a 4 GB size limit 2 GB for the native and 2 GB for the Java heap. And there may also be a limit of the memory pages which can be used. For a 64 bit process there should be no such limit.

Java itself uses a native heap for the compiler and to manage the threads inside the VM. And it may create a native thread for every thread inside the VM, so you have a 1:1 mapping of native and Java threads.

Using -Xmx1400m you set the Java heap to 1400 MB, this memory is used by Wildfire and its data.

It could help to decrease the Xmx size to make sure that you have more native memory and more memory pages available.

LG

jeff_garner · February 13, 2007, 11:34pm

Yes Gato, it is the 3.0.0 version, same one I have been tweaking for a year.

so what you are saying is that if I have a Xms512M Xmx1400 that is a bad thing? so realistically I could just take out those two items and the server would run better? my total VM vars are Xms512 Xmx1400 -XX:+UseParrallelGC -Xss256k

the thing I don’'t understand is the traditional idea that you throw more RAM at something it runs better doesn;t work properly here, then you put less RAM in these fields to allow it MORE RAM in the long term? Maybe I don;t get Java very well. So to ask a simple question, I should set something like Xms256M Xmx1200M -XX:UseParrallelGC -Xss256k ?

I need to know to implement tonight (US CST), if one of you guys can answer that would be great. Also, Gato, You want a nohup.out dump?

Jeff

This is a SUSE 9 i686 dual Athlon x2 64 system with 4GB Ram and 100GB HDD. if that makes any difference.

Gaston_Dombiak · February 13, 2007, 11:39pm

Hey Jeff,

As LG explained in his post the JVM uses the memory for “native” operations and also for “Java” operations. My guess is that you are having lots of never ending LDAP connections that end up consuming all the JVM resources. We will need a thread dump (nohup.out dump) to confirm that. IIRC, that problem was fixed after 3.0.0 and a possible workaround would be to not use an LDAP pool (or vice versa).

Regards,

– Gato

LG1 · February 13, 2007, 11:45pm

Hi Jeff,

it may help to use -XX:ThreadStackSize=128 (without k). -Xss256k may not work as expected if you want to decrease the stack size.

LG

jeff_garner · February 13, 2007, 11:45pm

Yeah turning off the pooling for LDAP got me this far.

can you give me a command in linux to dump the nohup ? no familiar with that.

As a question, the idea of changing the Xms Xmx values it NOT a good idea?

Jeff

LG1 · February 13, 2007, 11:58pm

Hi Jeff,

“kill -3 wildfire-pid” will dump a javacore to nohup.out.

As you use only 500 MB of your heap you can set -Xmx1000m without a problem now. And try to add -XX:ThreadStackSize=128 - this may help even more.

I expect that there are 1500-2000 threads in your JVM, so you get this error. It’'s never a good idea to have so much threads - the thread handling alone costs very much time - but only a javacore will show which threads do not terminate.

LG

jeff_garner · February 14, 2007, 12:05am

ok, this is the new parms:

INSTALL4J_ADD_VM_PARAMS="-Xms256m -Xmx1000m -XX:+UseParallelGC -XX:ThreadStackSize=128"

using this should allow for better operations?

Will apply and dump if you give the OK. I trust your judgement Gato, LG.

as a side note, I now have a user base of 4000 active now, with 20K inactive at present.

550 users logged in.

Jeff

Gaston_Dombiak · February 14, 2007, 12:07am

Sounds good to me.

Let us know how it goes.

– Gato

Gaston_Dombiak · February 14, 2007, 12:09am

BTW, using UseParallelGC may or may not be a benefit depending on JMV version, OS version and number of CPUs (of course). Follow this link for more info: http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html , http://java.sun.com/performance/reference/whitepapers/tuning.html.

Regards,

– Gato

jeff_garner · February 14, 2007, 1:46am

the process would not die…

had to -9 it, and I know it didn;t dump…

watching the server in the morning to see what happens.

jeff_garner · February 14, 2007, 7:45pm

Ok. got up to user number 1150 without errors, however NOWmy MSNtransport will not stay running on the server. users report that when they go into their transports tab in Soapbox, it reports out as an unknown transport type. This worked yesterday even before I had all of the issues that Iam having now. Server is not using that much memory anymore… related? Ithink so, but don;t know where to look… the LDAPthreads are muxh lower, however it looks like somethign that I have done in the config is dropping the MSN transport and not reconnecting.

Message was edited by: jeff_garner

Gaston_Dombiak · February 14, 2007, 7:49pm

Hey Jeff,

I think that the gateway problem is another issue unrelated to your JVM changes. Have you post this issue in the gateway forum? I would say that it would be very useful to post the exchanged XML between the server and the client to figure out what is going on. I think that the type of transport is determined by the disco#info sent to the gateway itself. So hunt for that packet and see what it is returning.

Regards,

– Gato

jeff_garner · February 14, 2007, 7:56pm

Well, I am having a difficult time with this thing. still looking for the entry, however the transport looks to be running on the box, just wildfire is not communiction with it. it worked this morning before I got over 1000 users on the machine

UPDATE:: Fairly certain this IS a wildfire issue, it still shows the connection, even though I just shut it down… hmmmm… not refreshing maybe…

Message was edited by: jeff_garner

Gaston_Dombiak · February 14, 2007, 8:00pm

“…it reports out as an unknown transport type.”

That text gave me the impression that the client is finding the gateway but it fails to determine the type of gateway. Is that the case? If it does then we need to see the exchanged IQ disco packets.

Regards,

– Gato

jeff_garner · February 14, 2007, 8:19pm

and now I can;t do a discovery on the server… this is great.

as for all of the problem I am having , can you tell me what the advantages of moving to 3.1.1 would be? I don;t want the 3.2 version yet as I need something alittle more tested in production,. but I think that 3.0 has outlived it’'s usefullness…

Message was edited by: jeff_garner

Gaston_Dombiak · February 14, 2007, 8:59pm

Hey Jeff,

I think this is the first time I hear the disco is not working in Wildfire 3.0.*. Do you see any error in the log files? In the change log you will find the list of things that were fixed for Wildfire 3.1.1 and Wildfire 3.2.0.

Regards,

– Gato

jeff_garner · February 14, 2007, 9:12pm

Gato, what DON’'t I seen in the logs lately…

I get nothing in the logs. Wildfire still beleives the transport is working when it is totally unresponsive in disco and the server shows that it is still connected, but will not allow logging off of the transport or logging into it, registering etc. Something is hosed with my Wifi config or something, as nothing has changed with the MSN config for 3 months.

jeff

Message was edited by: jeff_garner