Red Hat init script doesn't always successfully stop openfire service

Openfire on RHEL5 (32 and 64-bit) has problems with periodically running out of memory and hanging; I’ve seen these documented in forum threads and on the issue tracker. As a workaround, I’ve configured my monitoring system to automatically restart Openfire as an initial troubleshooting step when the service stops responding; unfortunately, when the java process hangs, it does not respond to SIGTERM. The init script doesn’t know this, since kill cat $OPENFIRE_PIDFILE`` returns successful even when the java process ignores the signal; I’ve written a little patch, included below, which adds another check to make sure the java process really died.

--- openfire.orig       2008-11-17 09:55:27.000000000 -0500
+++ openfire    2008-11-17 10:08:06.000000000 -0500
@@ -133,6 +133,9 @@         [ -f "$OPENFIRE_PIDFILE" ] && kill `cat $OPENFIRE_PIDFILE`
        RETVAL=$?
+
+       # if it's not dead, kill it harder
+       ps --no-headers `cat $OPENFIRE_PIDFILE` > /dev/null && kill -9 `cat $OPENFIRE_PIDFILE`
        echo         [ $RETVAL -eq 0 -a -f "$OPENFIRE_PIDFILE" ] && rm -f $OPENFIRE_PIDFILE

I hope this will be of some help to someone; there is no doubt a more elegant way to implement this check, but this has been working for me so far.

-steve
openfire_init.patch (408 Bytes)

Hi Steve,

Thanks for posting and providing a patch I am a rhel5 user as well and have noticed this problem… One thing I worry about tho with this approach is ‘killing’ openfire too quickly before it can fully flush out its writes / whatever else it is trying to do during the shutdown path. Maybe it would be better to have a ‘hardstop’ service option that would do the -9 if necessary ? Then one could do

service openfire stop

service openfire hardstop

What do you think?

daryl

Hi Steve,

your patch looks ugly. Openfire may need some seconds to shutdown completely so you should wait at least 30 seconds for a clean shutdown before you kill the process.

I usually add a “kill” option to scripts when needed which send “-9” to the process while stop sends “-15”.

LG

Thanks for the feedback! Below is a cleaned-up patch that’s a bit more careful about how it behaves. I decided not to add an extra “kill” argument to the script because it breaks the common Red Hat paradigm for services (i.e. service start|stop|restart|status); instead, I first try a clean shutdown, wait a little while, then check to see if the server is still hanging around, and only then try to kill it harder. -steve

— openfire.orig 2008-11-17 09:55:27.000000000 -0500
+++ openfire 2009-01-13 10:55:04.000000000 -0500
@@ -33,6 +33,9 @@

If pid file path is not set in sysconfig, set to /var/run/openfire.pid.

[ -z “$OPENFIRE_PIDFILE” ] && OPENFIRE_PIDFILE="/var/run/openfire.pid"

+# Give openfire some time to shut down safely (in seconds)
+TIMEOUT=30
+

-----------------------------------------------------------------

If a openfire home variable has not been specified, try to determine it.

@@ -135,13 +138,24 @@
RETVAL=$?
echo

  •   # wait a bit, then kill it again to be sure
    
  •   sleep $TIMEOUT
    
  •   if [ -f "$OPENFIRE_PIDFILE" ]; then
    
  •           REMAINING=`pgrep -f -x '.*java.*openfire.*'`
    
  •           if ( [ $? ] && [ $REMAINING ] && [ $REMAINING -eq `cat $OPENFIRE_PIDFILE` ] ); then
    
  •                   # ok, it's still hanging around
    
  •                   kill -9 $REMAINING
    
  •                   RETVAL=$?
    
  •           fi
    
  •   fi
    
  •   [ $RETVAL -eq 0 -a -f "$OPENFIRE_PIDFILE" ] && rm -f $OPENFIRE_PIDFILE
      [ $RETVAL -eq 0 -a -f "/var/lock/subsys/openfire" ] && rm -f /var/lock/subsys/openfire
    

}

restart() {
stop

  •   sleep 10 # give it a few moments to shut down
      start
    

}


openfire_init.patch (1108 Bytes)

Hi,

I filed this in Jira to hopefully get a devels comments:

http://www.igniterealtime.org/issues/browse/JM-1515

thanks!

daryl