Thread deadlock? anyway i can help debug?

hi,

i’'m using xiff/flash(group chat) client side and wildfire on the serverside.

i did something somewhat crazy today, i put the flash chat which connects to a certain room, on every result on my homepage, which gets at least 1 hit per second, making massive connects/disconnects.

last time i checked i had over 300 concurrent users in the chat room.

anyways, the chat room would freeze up a few seconds every 5~6 minutes, which was still ok, but two times today the server would just freeze, and not respond to even

./wildfire stop

./wildfire start

wildfire stop would say wildfire stopped but it didn’'t and i had to kill it manually.

it froze “completely” twice today, once even kill didn’‘t work and i just rebooted(didn’'t try kill -9 though)

the server was running only wildfire(LAMP setup but only the flash file being sent through apache and wildfire was in use, all other homepage stuff is in a different server)

anyways i’'m thinking a thread deadlock occured, is their anyway i could help debug?

send error.log etc?

not sure how to use debug.log …

b.t.w. the server load when it froze was exactly 0.0

thanks.

Hi,

there is a KB article on how to obtain a thread dump, kill -3 $PID should be fine for unix systems. There you would see if a deadlock did occur.

LG

i’'ll try that.

uh… what does “kb article” mean? :stuck_out_tongue:

i searched google, seems to be a common phrase, but i can’'t seem to define it :stuck_out_tongue:

Different ways to obtain a thread dump of the Java VM[/url] - with KB = (Jivesoftwares) Knowledge Base, you may enter it here http://www.jivesoftware.org/community/index.jspa if you want to take a look at it as it contains some (only ) useful documents.

LG

sorry :stuck_out_tongue: i’‘m not that familiar with kill -3… but shouldn’'t a kill -3 log it in stdout.log or something like threaddump… in jre directory or log directory?

ps -ef|grep bot

bot 7683 1 2 18:10 ? 00:08:21 /opt/wildfire/jre/bin/java -server -Dinstall4j.jvmDir=/opt/wildfire/jre -Dinstall4j.appDir=/opt/wildfire -Dexe4j.moduleName=/opt/wildfire/bin/wildfire -classpath /opt/wildfire/.install4j/i4jruntime.jar:/opt/wildfire/lib/activation.jar:/opt/w ildfire/lib/bouncycastle.jar:/opt/wildfire/lib/commons-el.jar:/opt/wildfire/lib/ hsqldb.jar:/opt/wildfire/lib/hsqldbutil.jar:/opt/wildfire/lib/jasper-compiler.ja r:/opt/wildfire/lib/jasper-runtime.jar:/opt/wildfire/lib/jdic.jar:/opt/wildfire/ lib/jtds.jar:/opt/wildfire/lib/mail.jar:/opt/wildfire/lib/mysql.jar:/opt/wildfir e/lib/postgres.jar:/opt/wildfire/lib/servlet.jar:/opt/wildfire/lib/startup.jar:/ opt/wildfire/lib/wildfire.jar com.install4j.runtime.Launcher start org.jivesoftware.wildfire.starter.ServerStarter false false /opt/wildfire/bin/…/logs/stderror.log /opt/wildfire/bin/…/logs/stdoutt.log true true false true true 0 0 20 20 Arial 0,0,0 8 500 version 2.6.2 20 40 Arial 0,0,0 8 500 -1 -DwildfireHome=/opt/wildfire -Dwildfire.lib.dir=/opt/wildfire/lib start

root 16692 16424 0 23:57 pts/0 00:00:00 grep bot

kill -3 7683

pwd

/opt/wildfire/jre/bin

ls -al

total 1348

drwxr-xr-x 2 bot users 4096 May 31 21:41 .

drwxr-xr-x 4 bot users 4096 May 31 21:41 …

-rwxr-xr-x 1 bot users 4153 Apr 21 02:12 ControlPanel

-rwxr-xr-x 1 bot users 64812 Apr 21 02:12 java

-rwxr-xr-x 1 bot users 26366 Apr 21 02:12 java_vm

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 keytool

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 kinit

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 klist

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 ktab

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 orbd

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 pack200

-rwxr-xr-x 1 bot users 72760 Apr 21 02:12 policytool

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 rmid

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 rmiregistry

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 servertool

-rwxr-xr-x 1 bot users 72752 Apr 21 02:12 tnameserv

-rwxr-xr-x 1 bot users 394235 Apr 21 02:12 unpack200

pwd

/opt/wildfire/logs

ls -la

total 296

drwxr-xr-x 2 bot users 4096 Jun 2 00:33 .

drwxr-xr-x 11 bot users 4096 May 31 21:41 …

-rw-rr 1 bot users 3063 Jun 1 17:53 admin-console.log

-rw-rr 1 bot users 0 May 31 21:43 debug.log

-rw-rr 1 bot users 96269 May 31 22:29 error_1.log

-rw-rr 1 bot users 162680 Jun 1 20:12 error.log

-rw-rr 1 bot users 3341 Jun 1 18:10 info.log

-rw-rr 1 bot users 271 Jun 1 18:09 stderror.log

-rw-rr 1 bot users 0 Apr 21 02:09 stderr.out

-rw-rr 1 bot users 117 Jun 1 18:10 stdoutt.log

-rw-rr 1 bot users 0 May 31 21:43 warn.log

ps -ef|grep bot

bot 7683 1 2 Jun01 ? 00:10:07 /opt/wildfire/jre/bin/java -server -Dinstall4j.jvmDir=/opt/wildfire/jre -Dinstall4j.appDir=/opt/wildfire -Dexe4j.moduleName=/opt/wildfire/bin/wildfire -classpath /opt/wildfire/.install4j/i4jruntime.jar:/opt/wildfire/lib/activation.jar:/opt/w ildfire/lib/bouncycastle.jar:/opt/wildfire/lib/commons-el.jar:/opt/wildfire/lib/ hsqldb.jar:/opt/wildfire/lib/hsqldbutil.jar:/opt/wildfire/lib/jasper-compiler.ja r:/opt/wildfire/lib/jasper-runtime.jar:/opt/wildfire/lib/jdic.jar:/opt/wildfire/ lib/jtds.jar:/opt/wildfire/lib/mail.jar:/opt/wildfire/lib/mysql.jar:/opt/wildfir e/lib/postgres.jar:/opt/wildfire/lib/servlet.jar:/opt/wildfire/lib/startup.jar:/ opt/wildfire/lib/wildfire.jar com.install4j.runtime.Launcher start org.jivesoftware.wildfire.starter.ServerStarter false false /opt/wildfire/bin/…/logs/stderror.log /opt/wildfire/bin/…/logs/stdoutt.log true true false true true 0 0 20 20 Arial 0,0,0 8 500 version 2.6.2 20 40 Arial 0,0,0 8 500 -1 -DwildfireHome=/opt/wildfire -Dwildfire.lib.dir=/opt/wildfire/lib start

root 18823 16424 0 00:39 pts/0 00:00:00 grep bot

process still alive.

Hi,

it will write to nohup.out in wildfire/bin and it will of course not kill the process, so one can get more than one dump and keep the application running.

LG

thanks, this is very helpful~~

pwd

/opt/wildfire/bin

ls -al

drwxr-xr-x 3 bot users 4096 Jun 2 00:47 .

drwxr-xr-x 11 bot users 4096 May 31 21:41 …

drwxr-xr-x 2 bot users 4096 May 31 21:41 extra

-rw------- 1 bot users 272018 Jun 1 23:55 nohup.out

-rwxr-xr-x 1 bot users 7213 Apr 21 02:12 wildfire

file is HUGE… what should my next step be :stuck_out_tongue:

Full thread dump Java HotSpot™ Server VM (1.5.0_06-b05 mixed mode):

“Client SR - 11316080” daemon prio=1 tid=0xa2d01888 nid=0x4095 sleeping[0xa7c9d000…0xa7c9df30]

at java.lang.Thread.sleep(Native Method)

272 kb…

Hi,

“grep -i dead nohup.out” - it should display some lines if there is a deadlock, then you can use an editor to locate the threads. And check the content of stderror.log, I hope that you don’'t see OutOfMemory errors there.

LG

thanks, i tried it when the chat was in a “few second frozen” mode :P. chat server is responding though, unlike the two times when it seems to “really freeze”.(i’'ve been running wildfire for months on a low volume site with no problems, but only for 24hrs on my high volume site which is giving me problems)

“grep -i dead nohup.out” -> no dead threads… stderror.log is essentially empty.

i’'ll try the kill -3 next time i see the server “really freeze”.

right now i have only 150 users concurrently in the chat room, it froze last time after getting over 300 users… so i think i’‘ll have to wait till peak visiting time again. unless i get rid of the chat room before then because it’'s harming the quality of my site .

Digirave, did you ever get more information on this problem? I’'m having something similar.