Wildfire on Redhat goes 99.9% and Spark won''t connect

We’'re having a problem with WildFire running in Redhat ES v3u2.

We had no problems when it was running on a Windows 2003 server. It’'ll work for maybe 4 hours on the Red Hat server then suddenly, no one can send messages anymore. It will show everyone still online. But then TOP shows java doing 99.9% CPU usage.

Or sometimes, a user will show Idle when he’'s actually active.

Another thing that occurs as well is that /opt/wildfire/bin/wildfire stop will not stop the daemon. I have to kill it manually.

Any ideas?

Thanks

Hi,

as the admin console seems to be still running you could check the memory usage and the logs. Maybe you are getting a lot of OutOfMemory errors.

You could also add

INSTALL4J_ADD_VM_PARAMS=’’-XX:+PrintGCDetails -Xloggc:$/logs/gc.log’'
to bin/wildfire where $/logs/ should exist and be writable by the user running Wildfire.

To increase the available memory for Wildfire add -Xmx128m or something like this to INSTALL4J_ADD_VM_PARAMS.

If you have the intention that you don’'t have a memory problem try to run ‘‘kill -3 wildfirepid’’ - this will write a javacore / stacktrace to nohup.out. If you are doing this 3 times you should be able to identify the looping thread(s).

LG

Hey mindtrap,

Which Wildfire version are you using? Could you get a thread dump of the JVM when the server becomes unresponsive? Execute kill -3 [process id] to trigger a thread dump. The info will be stored in the stdout.

Regards,

– Gato

The admin is not accessible. The browser times out.

Funny thing, I did a kill -3 and no dump occurs. I would assume that the signal ids are the same for all Linux/Unix flavours, but as a precaution mine is labelled SIGQUIT.

I’'m wondering if perhaps its trying to perform garbage collection.

You wouldn’‘t happen to be using the Asterisk-IM plugin? I’'ve had very similar issues using that plugin while changing around settings on the asterisk server.

I’'m only using the Wildfire, no plug-ins and using the JVM that came with the package.

But as a side note, we are running JRun Application server on the same server and it’‘s also using it’'s own JVM. But this application is behaving normally.

One thing I have noticed though which is new is that the X server goes 100%. We do have the GUI running, don’'t know which one, but we usually only

Hi,

it is the same for nearly all JVM’‘s and I assume you are using Sun’'s 1.5.0_07-b03 ?

Did you check nohup.out for the stack trace?

LG

Well DUH!! LOL!

Thanks. Titally forgot that Wildfire was writing to that file.

  • Here’'s what I found in it:

Full thread dump Java HotSpot™ Server VM (1.5.0_06-b05 mixed mode):

  • Here are the last living moments of this app.

“Thread-1” daemon prio=1 tid=0x0821b8c8 nid=0x314 waiting on condition

at java.lang.Thread.sleep(Native Method)

at org.jivesoftware.database.ConnectionPool.run(ConnectionPool.java:370)

at java.lang.Thread.run(Unknown Source)

“Thread-0” daemon prio=1 tid=0x41e2fee8 nid=0x312 waiting on condition

at java.lang.Thread.sleep(Native Method)

at com.install4j.runtime.Launcher$StopWatcherThread.run(Unknown Source)

“Low Memory Detector” daemon prio=1 tid=0x41e0a710 nid=0x310 runnable

“CompilerThread1” daemon prio=1 tid=0x41e09360 nid=0x30f waiting on condition

“CompilerThread0” daemon prio=1 tid=0x41e08410 nid=0x30e waiting on condition

“AdapterThread” daemon prio=1 tid=0x41e07490 nid=0x30d waiting on condition

“Signal Dispatcher” daemon prio=1 tid=0x41e065e8 nid=0x30c waiting on condition

“Finalizer” daemon prio=1 tid=0x081118a0 nid=0x30b in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x82f59748> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(Unknown Source)
  • locked <0x82f59748> (a java.lang.ref.ReferenceQueue$Lock)

at java.lang.ref.ReferenceQueue.remove(Unknown Source)

at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

“Reference Handler” daemon prio=1 tid=0x081113c0 nid=0x30a in Object.wait()

at java.lang.Object.wait(Native Method)

  • waiting on <0x82f63180> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Unknown Source)
    at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
  • locked <0x82f63180> (a java.lang.ref.Reference$Lock)

“VM Thread” prio=1 tid=0x0810d800 nid=0x309 runnable

“GC task thread#0 (ParallelGC)” prio=1 tid=0x08076418 nid=0x305 runnable

“GC task thread#1 (ParallelGC)” prio=1 tid=0x08076ff8 nid=0x306 runnable

“GC task thread#2 (ParallelGC)” prio=1 tid=0x08077bd8 nid=0x307 runnable

“GC task thread#3 (ParallelGC)” prio=1 tid=0x080787b8 nid=0x308 runnable

“VM Periodic Task Thread” prio=1 tid=0x41e0bc40 nid=0x311 waiting on condition

=======

It seems like it’'s running out of memory. I will confirm that the next time the system stalls.