Numfile reaches black zone (hard limit) after 2 weeks cause of OF running persistent

Hi there,

on my machine i have a numfile (number of open files) hard limit of 16,448. Running the machine with no OF process the numfile is around 4,695. Starting OF its around 7,461. That should be no point to be worried about. But after 2 weeks of running OF persistent the numfile reaches the hard limit and that screwed up the hole system. In my mind that shouldnt have to be. I am running OF 3.5.1, Ubuntu 6.06, Sun Java 1.6 SDK (OF located 986.12 MB but use almost 10 - 20 % of it) with 2 GB RAM (up to 5 GB dynamically).

Does anyone noticed the same issue or can give me a hint of what is going wrong?

Thanks!

We seem to run into the same issue after a couple of weeks. The server keeps running but throws too many open files exceptions.

Hi,

you may want to install lsof and check the # of used files with “lsof -Pp Openfire-PID”. Redirect the output to a file so you can easily compare where the # of files increases. A javacore (kill -3 Openfire-PID) does not really help as lsof does not print the thread-id.

LG

Hi,

i saved my lsof output into a file and attached them. Maybe you can tell me a bit more…

Hi,

if you could provide another file with much more open file handles it will be much more easy to find a problem. Currently it looks fine as far as I can tell.

“mem” lines 202 (my server: 131)

“non-mem” lines 402 (my server: 107)

I have only a few plugins installed and an idle server, so these values are fine.

There are always two open file handles for every jar file, so the more plugins you load the more open files you will have:

java 7591 root mem REG 0,76 427486 405962800 /opt/openfire/lib/postgres.jar

java 7591 root 18r REG 0,76 427486 405962800 /opt/openfire/lib/postgres.jar

java 7591 root 29r REG 0,76 427486 405962800 /opt/openfire/lib/postgres.jar

I wonder if this is a Jetty or an Openfire issue that 2 file handles are open - other java applications do not have this problem.

You also use mysql.jar and hsqldb.jar - I wonder if your server needs support for three databases. Anyhow it will decrease the # of open files only by 4 (6 if you add the “mem” handle) if you remove not needed db drivers.

LG

Hm, but this is the whole lsof output - there is no more!

Have i mentioned that i am running a virtual machine? 12 other clients are physically installed on this node and sharing the ressources, anyway i got guranteed ressources and up from there the rest is dynamically - maybe this caused the limit? I asked my ISP if the other customers on my node are using a lot of the physically machine ressources and they told me back, that only i am the one that uses/rechead the limit.

Without OF running i have a system utilisation of 20%. Starting OF it jumps right on 50%. OF is now running for 5 days and 5 hours and my system utilisation is around 86%! At 89% i reached the “yellow zone” of my hardware limit - caused by numfile. The “black zone” ist reached at 96% - if this happens, even smtp, apache aren’t responding. I think, if i let OF run for 2 or 4 more days i should reached the yellow zone. If so, i am doing another lsof and save the output - maybe this contains more information.

I am using the embedded db for my OF installation. So how can i disable not needed jars? Maybe should i add some parameters like told in this doc http://www.igniterealtime.org/community/docs/DOC-1033 ?

Thanks for your help!

EDIT: here is another lsoaf output!

Hi,

which ulimit did you set for the user (likely `jive´) which runs Openfire? Did you really increase it?

You may want to run what I did run and posts the results:

root # su - jive
jive # ulimit -n
1024

Do get rid of the useless jar files you may simply delete them. Anyhow the next update will them install again.

http://www.netadmintools.com/part295.html is a nice page about file descriptors

So “ls -l /proc/{openfire-pid}/fd/” should return the proper value of file descriptors which are counted as open files. For me these are only 103.

LG

Hi,

today i run into the yellow zone of my system utilisation (92%). QoS tells me that’s because of numfile.

ulimit -n gives me also 1024 but ls -l /proc/7591/fd/ gives me 399 in return.

I have attached the output of ls -l /proc/7591/fd/ to this post - hope you can figure something out !?

Hi,

400 open files are usually not a problem. You may want to add

jive hard nofile 2048

jive soft nofile 2048

to `/etc/security/limits.conf´.

Without OF running i have a system utilisation of 20%.

Starting OF it jumps right on 50%.

OF is now running for 5 days and 5 hours and my system utilization is around 86%!

Are the % values `user time´? An idle Openfire server does not consume any system resources so after starting OF it should not change.

LG

Hm, this is really confusing me. The % value tells me, how keenly my system ressources are used at the time. If it runs up to 100% my system reaches the hard limit of all set “parameters” e.g. numfile, kmemsize …

I’ve done a “cat /proc/user_beancounters” and attached the output.

I am going to be helpless

EDIT: Here a number of open files sorted by user:

bind 33

drweb 28

list 327

mysql 137

ossec 37

ossecm 16

popuser 176

psaadm 189

qmaild 13

qmaill 8

qmailq 7

qmailr 7

qmails 14

root 1549

syslog 25

tomcat 153

www-data 418

Gesamtanzahl 3137

Openfire 454, Apache 418

Hi,

so we have a VPS which causes these problems. I should have noticed this … “with 2 GB RAM (up to 5 GB dynamically)”.

Maybe it a problem with numproc and numfile - both values are quite high.

Anyhow I wonder if you will get happy with Openfire on a V-Server.

LG

Damn! But thanks for your reply. Is there anything i can “tune” in order to get openfire more “stable” at my vps? What are the ~ values of numfile and numproc openfire deals with? I hab a numfile hardlimit of 16,448 and a numproc hardlimit of 1,028 - but with numproc i never had a problem.

My workaround for the last couple of weeks are a cron, that restarts openfire every sunday night - but that’s not a perfect solution.

Maybe disableing the message audit or something else will help - a bit?

Hi,

according to http://download.swsoft.com/virtuozzo/virtuozzo4.0/docs/en/lin/VzLinuxUBCMgmt/toc 1871244.htm numproc is a “primary” parameter while “numfile” is mentioned within “Auxiliary Parameters”. So I wonder if the VM software decreases your performance because of the number of existing threads. But these things you want to ask your provider and if there’s a way to get Openfire running stable.

You may shutdown Apache and MySQL or configure Apache with “IdleServers=1, MaxServers=10” (or something like this) to make sure that Apache spawns only 10 worker threads and keeps only one idle thread running.

LG

Hi,

thanks for the hint. I’ll try to configure my apache to use as less ressources as possible. Today i’ve talked to my ISP in order to figure out why my machine reaches the hardlimit of numfile. My ISP told me that a numfile hardlimit of over 16.000 are very high (that’s true!) and they never had a customer using this limit, so i’ll be the first Anyway we figured out that either openfire nor java could be the problem but i don’t know how to be sure which one causes the problem.

Do you have any data for me about the numfile usage of openfire and java seperatly? Maybe i’ve openfire misconfigurated… but how to be sure?

The behaviour of openfire is very strange according to this output:

lsof -p 32358 | wc -l (32358 = of pid)

695

Openfire running: numfile 12253 12286 16448 16448 3720

Directly after stopping the openfire deamon: numfile 3702 3890 16448 16448 3720

Numfile drops over 10k - that’s a lot!

For any further remark i’ll be very grateful!

Hi,

this could be a feature or a bug of the use VM software. Either it opens new files for you which are not used (prefork, similar to apache idle servers) or it does not close files correctly. You could try to connect, disconnect, connect, … with your XMPP client quite often and take a look at numfiles.

In any case you may want to work with your ISP to solve this problem.

As plugins use only a few threads and numfiles uninstalling them will help to delay this problem but it will likely not solve them.

Restarting Openfire daily at 4 am should be an acceptable option. Chatters which are still online will notice that it’s time to go to bed (;

LG

Yes, you are right. It seems to be the Java VM. With each new unique user connects to the server numfile and numproc increases rapidly. I’ve started another thread in order to get the rest out of my java vm - maybe you want to have a look in there.

Today my server “crashed” because of an # java.lang.OutOfMemoryError: Cannot create GC thread. Out of system resources. The report is in the log:

  1. An unexpected error has been detected by Java Runtime Environment:

  1. java.lang.OutOfMemoryError: Cannot create GC thread. Out of system resources.

  1. Internal Error (47433441534B3448524541440E4350500017), pid=7478, tid=3085376432

  1. Java VM: Java HotSpot™ Server VM (1.6.0-b105 mixed mode)

  2. If you would like to submit a bug report, please visit:

  3. http://java.sun.com/webapps/bugreport/crash.jsp


T H R E A D -


Current thread (0x08058400): JavaThread “Unknown thread”

Stack: [0xb7e22000,0xb7e72000), sp=0xb7e70fe0, free space=315k

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)


P R O C E S S -


Java Threads: ( => current thread )

Other Threads:

VM state:not at safepoint (not fully initilizated)

VM Mutex/Monitor currently owned by a thread: None

Heap

PSYoungGen total 25216K, used 0K [0xadb00000, 0xaf720000, 0xb4cc0000)

eden space 21632K, 0% used [0xadb00000,0xadb00000,0xaf020000)

from space 3584K, 0% used [0xaf3a0000,0xaf3a0000,0xaf720000)

to space 3584K, 0% used [0xaf020000,0xaf020000,0xaf3a0000)

PSOldGen total 230784K, used 0K [0x74cc0000, 0x82e20000, 0xadb00000)

object space 230784K, 0% used [0x74cc0000,0x74cc0000,0x82e20000)

PSPermGen total 16384K, used 0K [0x70cc0000, 0x71cc0000, 0x74cc0000)

object space 16384K, 0% used [0x70cc0000,0x70cc0000,0x71cc0000)

Dynamic libraries:

Can not get library information for pid = 7479

VM Arguments:

java_command: <unknown>

Launcher Type: SUN_STANDARD

Environment Variables:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/ usr/games

LD_LIBRARY_PATH=/usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/i386/server:/usr/lib/jv m/java-6-sun-1.6.0.00/jre/lib/i386:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/…/lib/i 386

SHELL=/bin/sh

Signal Handlers:

SIGSEGV: http://libjvm.so+0x51cf50, sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004

SIGBUS: http://libjvm.so+0x51cf50, sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004

SIGFPE: http://libjvm.so+0x43d040, sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004

SIGPIPE: http://libjvm.so+0x43d040, sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004

SIGILL: http://libjvm.so+0x43d040, sa_mask[0]=0x7ffbfeff, sa_flags=0x10000004

SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000

SIGUSR2: http://libjvm.so+0x43f050, sa_mask[0]=0x00000000, sa_flags=0x10000004

SIGHUP: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000

SIGINT: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000

SIGQUIT: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000

SIGTERM: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000

SIGUSR2: http://libjvm.so+0x43f050, sa_mask[0]=0x00000000, sa_flags=0x10000004


S Y S T E M -


OS:testing/unstable

uname:Linux 2.6.9-023stab046.2-enterprise #1 SMP Mon Dec 10 15:22:33 MSK 2007 i686

libc:glibc 2.3.6 NPTL 2.3.6

rlimit: STACK 8192k, CORE 0k, NPROC infinity, NOFILE 1024, AS infinity

load average:0.44 0.17 0.10

CPU:total 8 family 6, cmov, cx8, fxsr, mmx, sse, sse2

Memory: 4k page, physical 16611108k(201032k free), swap 32764556k(32739084k free)

vm_info: Java HotSpot™ Server VM (1.6.0-b105) for linux-x86, built on Nov 29 2006 01:11:40 by “java_re” with gcc 3.2.1-7a (J2SE release)


FYI: till i solved the problem i let this thread marked as unanswered - cause it is

Hi,

“java.lang.OutOfMemoryError: Cannot create GC thread. Out of system resources.” could be a “native” OOM error, so one which takes place in the native heap and not in the java heap (specified with -Xmx). Is it possible that “numproc” was reached?

Anyhow this is not a problem of Openfire but of your VM / VPS server. Run it on a physical server and you will not have these problems. I did update Linux Installation Guide with the possible VPS numproc and numfiles problems.

LG

Yes, in this case numproc (hardlimit 1,028) was reached! For testing purpose my isp set the numfile hardlimit from 16.000 up to 33.000 but didn’t increase numproc. So this error was a result of reaching the numproc hardlimit.

FYI: At all these results (reaching hardlimit of numfile and/or numproc) only ~50 clients where connected to my openfire. I wonder. that only 50! user will have such an degree upon the system ressources?

Anyway i’ll trying to play around with my java vm setting and hope the best

Btw: It was a good idea to update the Linux install guide! But there is a line i never heard before about: “Some VPS require that you create a file .hotspotrc with this single line for 64 MB:MaxHeapSize=64000000” - could that has any effect to my issue?

Thanks!