powered by Jive Software

Server stops reading port 5222 and won''t shut down

Wildfire 2.4.4 on RHEL 3

We’‘re having a problem where the server will suddenly stop reading port 5222 and locks up so tight it can’'t be shutdown cleanly but requires a kill -9. The last thing the server seems to be doing before the lock-up is processing a roster with a malformed JID entry.

2006.03.08 10:55:29 org.jivesoftware.wildfire.handler.IQRosterHandler.handleIQ(IQRosterHandler.java: 118) Internal server error

java.lang.IllegalArgumentException: Illegal JID: first last@jabber.wildfire.example.com

at org.xmpp.packet.JID.init(JID.java:398)

at org.xmpp.packet.JID.(JID.java:254)

at org.xmpp.packet.Roster.getItems(Roster.java:236)

at org.jivesoftware.wildfire.handler.IQRosterHandler.manageRoster(IQRosterHandler. java:196)

at org.jivesoftware.wildfire.handler.IQRosterHandler.handleIQ(IQRosterHandler.java :103)

at org.jivesoftware.wildfire.handler.IQHandler.process(IQHandler.java:48)

at org.jivesoftware.wildfire.IQRouter.handle(IQRouter.java:256)

at org.jivesoftware.wildfire.IQRouter.route(IQRouter.java:79)

at org.jivesoftware.wildfire.PacketRouter.route(PacketRouter.java:65)

at org.jivesoftware.wildfire.net.SocketReader.processIQ(SocketReader.java:395)

at org.jivesoftware.wildfire.net.ClientSocketReader.processIQ(ClientSocketReader.j ava:50)

at org.jivesoftware.wildfire.net.SocketReader.readStream(SocketReader.java:263)

at org.jivesoftware.wildfire.net.SocketReader.run(SocketReader.java:119)

at java.lang.Thread.run(Unknown Source)

Caused by: org.jivesoftware.stringprep.StringprepException: Contains prohibited code points.

at org.jivesoftware.stringprep.Stringprep.nodeprep(Stringprep.java:120)

at org.xmpp.packet.JID.init(JID.java:347)

… 13 more

The actual JID involved contains an underscore ‘‘first_last’’ even though the log message appears to be reporting the JID as ‘‘first last’’. Is it possible that the presence of a malformed JID in a user-supplied roster entry can hang the entire server? Is there some way we can sanity check roster entries before they are accepted by the server? Or are we on the wrong track in trying to diagnose this problem?

Thanks,

Bryan.

Hey Brian,

How is the CPU of the server doing? Is it close to 100%? Can you get a couple of thread dumps to see what the JVM is doing? For Unix: kill -3 . For Windows: Ctrl-Break in the server window. The logged info will be stored in stdout.log.

Regards,

– Gato

How is the CPU of the server doing? Is it close to

100%? Can you get a couple of thread dumps to see

what the JVM is doing? For Unix: kill -3 .

For Windows: Ctrl-Break in the server window. The

logged info will be stored in stdout.log.

Gato

Can you define server window.

thanks

loonybin88

Hey loonybin88,

I just wrote this document[/url] that explains how to obtain a thread dump. Let me know if you have further questions.

Regards,

– Gato

In this particular instance, the load was fine but we did catch it right away. Previous occurances coincided with huge spikes in CPU load which was initially what caught our attention and was why we were watching the server so closely today. It may be that the load builds the longer the server is hung.

Hey Brian,

In that case another thing to monitor is GC activity of the JVM. You can enable -verboseGC to track the time the JVM spent doing Garbage Collection. If you have a multiprocessor server then you may want to enable parallel GC if that is not the default.

Other useful links:

http://java.sun.com/performance/reference/whitepapers/tuning.html

http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

Regards,

– Gato

Hi,

it’'s -verbose:gc and not -verboseGC and one should mention http://www.tagtraum.com/gcviewer.html as this seems to more suitable for production systems while your links may be too detailed.

LG

We captured the thread dumps and logs from another lockup this morning. Is there someplace I can upload the tarball?

BTW, our current thinking is that it might be TLS-related. We’'re now running our server with TLS disabled and using SSL instead to see if it helps any.

Hey Bryan,

Feel free to send them to me by email. I will review them. BTW, which server version are you using?

Thanks,

– Gato