Major issues with Wildfire

I am not sure what to do here. I am having major issues with wildfire 2.5.0 I dont know if they are related or not but here they go.

100% processor Usage with huge amounts of Virt Mem Usage.[/b]

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

5553 jabber 16 0 523m 155m 3852 S 99.9 41.3 326:02.36 java

S2S Attempts to components??? and I dont really understand what most of these errors mean:

/bentries in error log

2006.02.22 20:20:09 org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(Out goingServerSession.java:258) Error trying to connect to remote server: aol.jabber.corp.inthosts.net(DNS lookup: aol.jabber.corp.inthosts.net:5269)

java.net.UnknownHostException: aol.jabber.corp.inthosts.net

at java.net.PlainSocketImpl.connect(Unknown Source)

at java.net.SocksSocketImpl.connect(Unknown Source)

at java.net.Socket.connect(Unknown Source)

at org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(Ou tgoingServerSession.java:253)

at org.jivesoftware.wildfire.server.OutgoingServerSession.authenticateDomain(Outgo ingServerSession.java:139)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPac ket(OutgoingSessionPromise.java:126)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSess ionPromise.java:37)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSession Promise.java:91)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

2006.02.22 20:20:10 org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(Out goingServerSession.java:258) Error trying to connect to remote server: aol.jabber.corp.inthosts.net(DNS lookup: aol.jabber.corp.inthosts.net:5269)

java.net.UnknownHostException: aol.jabber.corp.inthosts.net

at java.net.PlainSocketImpl.connect(Unknown Source)

at java.net.SocksSocketImpl.connect(Unknown Source)

at java.net.Socket.connect(Unknown Source)

at org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(Ou tgoingServerSession.java:253)

at org.jivesoftware.wildfire.server.OutgoingServerSession.authenticateDomain(Outgo ingServerSession.java:139)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPac ket(OutgoingSessionPromise.java:126)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSess ionPromise.java:37)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSession Promise.java:91)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

2006.02.22 20:20:11 org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(Out goingServerSession.java:258) Error trying to connect to remote server: aol.jabber.corp.inthosts.net(DNS lookup: aol.jabber.corp.inthosts.net:5269)

java.net.UnknownHostException: aol.jabber.corp.inthosts.net

at java.net.PlainSocketImpl.connect(Unknown Source)

at java.net.SocksSocketImpl.connect(Unknown Source)

at java.net.Socket.connect(Unknown Source)

at org.jivesoftware.wildfire.server.OutgoingServerSession.createOutgoingSession(Ou tgoingServerSession.java:253)

at org.jivesoftware.wildfire.server.OutgoingServerSession.authenticateDomain(Outgo ingServerSession.java:139)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPac ket(OutgoingSessionPromise.java:126)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSess ionPromise.java:37)

at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSession Promise.java:91)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

debug log entries:[/b]

2006.02.22 20:20:08 OS - Trying to connect to aol.jabber.corp.inthosts.net:5269

2006.02.22 20:20:08 Error sending packet to remote server:

java.lang.Exception: Failed to create connection to remote server
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPac ket(OutgoingSessionPromise.java:139)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSess ionPromise.java:37)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSession Promise.java:91)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2006.02.22 20:20:09 OS - Trying to connect to aol.jabber.corp.inthosts.net:5269
2006.02.22 20:20:09 Error sending packet to remote server:

java.lang.Exception: Failed to create connection to remote server
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPac ket(OutgoingSessionPromise.java:139)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSess ionPromise.java:37)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSession Promise.java:91)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2006.02.22 20:20:10 OS - Trying to connect to aol.jabber.corp.inthosts.net:5269
2006.02.22 20:20:10 Error sending packet to remote server:

java.lang.Exception: Failed to create connection to remote server
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPac ket(OutgoingSessionPromise.java:139)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSess ionPromise.java:37)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSession Promise.java:91)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2006.02.22 20:20:10 OS - Trying to connect to aol.jabber.corp.inthosts.net:5269
2006.02.22 20:20:11 Error sending packet to remote server:

java.lang.Exception: Failed to create connection to remote server
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.createSessionAndSendPac ket(OutgoingSessionPromise.java:139)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise.access$300(OutgoingSess ionPromise.java:37)
at org.jivesoftware.wildfire.server.OutgoingSessionPromise$1$1.run(OutgoingSession Promise.java:91)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2006.02.22 20:23:36 Connect Socket[addr=/10.46.4.22,port=34659,localport=5222]
2006.02.22 20:23:36 Logging off jabber.corp.inthosts.net/4605fd5a on org.jivesoftware.wildfire.net.SocketConnection@1a04f7f socket: Socket[addr=/10.46.4.22,port=34659,localport=5222] session: org.jivesoftware.wildfire.ClientSession@15af5f1 status: 1 address: jabber.corp.inthosts.net/4605fd5a id: 4605fd5a presence:

2006.02.22 20:29:16 Connect Socket[addr=/10.46.4.22,port=44717,localport=5222]
2006.02.22 20:29:17 Logging off jabber.corp.inthosts.net/8d7f51eb on org.jivesoftware.wildfire.net.SocketConnection@19a0ca9 socket: Socket[addr=/10.46.4.22,port=44717,localport=5222] session: org.jivesoftware.wildfire.ClientSession@b07b12 status: 1 address: jabber.corp.inthosts.net/8d7f51eb id: 8d7f51eb presence:

I have killed all components and removed nearly all plugins. Search and Asterisk IM still exist but the rest I installed are all removed and still no change to processor usage or the very high virtual memory usage.

Hi,

may I ask about the clients "pkzizi@aol.jabber.corp.inthosts.net", "randallnbills@aol.jabber.corp.inthosts.net"

and "khirex@aol.jabber.corp.inthosts.net" ?

It seems that they are registered on server aol.jabber.corp.inthosts.net which is not resolvable.

LG

the server is jabber.corp.inthosts.net

aol is a component on jabber.corp.inthosts.net

Should I add entries into the host file for it? I dont think I had to do that with ejabberd or jabber-1.4.2 when I used them but if that will help wildfire I will do that. But I still dont understand why it is attempting S2S connections.

I think my extremely high processor usage issues are related to garbage collection. I have looked at http://www.tagtraum.com/gcviewer-vmflags.html but I dont understand what it is talking about on many of the switches. I am sure they would help me fine tune the server for about 30 or so users (more if possible for 256meg) but I dont understand enough about them to know how to manipulate them.

Hi,

so you are using some components like pyAIM? I have actually no idea about them

“Best results are achieved with: -Xloggc: -XX:+PrintGCDetails” to monitor the GC behavior. If you see too much (Full) GC’‘s and the GC’'s take too long then you may increase the memory (Xmx) value.

Don’'t try the other available options. Of course you may increase the speed of Wifi a little bit if they are set right, but usually this is below 5%. When you got the difference between eden, old and young generation you may also set some more values.

LG

okay so if I understand you correctly then what I might want for my VM parameters would be export INSTALL4J_ADD_VM_PARAMS="-Xmx256m -verbose:gc -XX:*PrintGCDetails -XX:*PrintGCTimeStamps -Xloggc:"$/logs/gc.log"" instead of: export INSTALL4J_ADD_VM_PARAMS="-Xms32m -Xmx256m -XX:PermSize=35m -XX:MaxPermSize=70m -verbose:gc -XX:*PrintGCDetails -XX:*PrintGCTimeStamps -Xloggc:"$/logs/gc.log""

I may have been tweaking it way too much.

Hi,

it does not do any harm to set these parameters as they are so you can keep them. They should increase speed about 3% as java starts with larger initial memory values and thus does not need to run a lot of GC’'s just to start Wifi.

LG

should this be able to handle 15-30 users without using 100% processor on a pIII 450?

should this be able to handle 15-30 users without

using 100% processor on a pIII 450?

Don’‘t worry jcluff, you’‘re not crazy. We upgraded from 2.4.x to fix a memory leak, and now 2.5.0 steals all our CPU time. It was bad enough that another admin had to disable the Wildfire service altogether. And we’'re running a P4 2.8 Ghz, with only about 20-30 users.

Hi,

you can login about 100 concurrent clients with 32 MB. So it should not be a memory problem. 450 MHz is not really fast, but with the GC logs you should be able to measure the time the JVM spent with GC’'s every hour. During GC it usually uses 100% CPU.

LG

During GC it usually uses 100% CPU.

Any way to get Wildfire to start up with a lower process priority and/or prevent it from pegging the CPU during GC?

Hi,

on unix you’'d just use nice, like “nice ~/bin/wildfire start” or you may use renice as root to change the priority without restarting.

I don’'t know about the wildfiredaemon.exe. For wildfire.exe you may use the start command (see “start /?”) and specify “/LOW”. There are 3rd party programs available to change the process priority and the priority of each thread (also without restarting).

During GC the JVM stops all threads, so you really don’‘t want a GC to take long. IM is not a realtime application and TCP is a friendly protocol, but slowing down GC’'s is never a good idea. Actually if another process is also requesting CPU cycles then the operating system will divide the available cycles between the requesting processes.

LG

During GC the JVM stops all threads, so you really

don’'t want a GC to take long. IM is not a realtime

application and TCP is a friendly protocol, but

slowing down GC’'s is never a good idea. Actually if

I’‘m not even convinced that GC is the problem… The admin who restarted Wildfire yesterday (and then disabled it today) said that it was just locked at 100% CPU. Meanwhile, the max heap size is only 64 MB so how long could GC possibly take? I wonder if it’'s a deadlock somewhere else. If I had seen it myself I might have more details…

Unfortunately it’‘s a production server… and for that and other (more political) reasons, I can’'t just start the service back up again to see what the problem is.

jeoffw,

Locked at 100% CPU definitely has me concerned. The best way to diagnose these types of issues is with a thread dump. On Unix, that’‘s done by sending “kill -3” to the process (it then prints the dump to the stdout.log). The thread dump will let us see what’'s going on in the server. Taking a couple in a row (10 seconds apart) is also very helpful.

Regards,

Matt

Hey guys,

For those of you that are seeing that the Wildfire process is consuming > 90% I’'m going to ask you to get some thread dumps and send them to me. To get the thread dumps under *nix you will need to execute kill -3 . The information will be saved to the stdout.log file. To get the thread dump under windows press Ctrl-Break. Remember to get 2 or 3 dumps with a couple of seconds in between while the CPU is heavily used. Send me the dumps to my email: gaston at jivesoftware dot com.

Thanks,

– Gato

Woops! Matt got there first.

– Gato

this is exactly what I am seeing. my server was locked at 99% processor usage for hours today but it is production and so I had to wait till tonight:

5553 jabber 16 0 527m 202m 5304 S 99.9 53.8 1736:29 java

Kill -3 didnt have any effect. Niether did trying to kill it from admin console web interface.

I removed some of the memory configs from the JVM parameters so I just specify -Xmx128m and nothing else but gc related printouts and processor useage seems to have dropped as well as its memory footprint. I will monitor this and see how we do.

How do you get a thread dump when Wildfire is running as an NT service? Is that even possible?

Hey Jeoff,

I found this link that provides a tool to generate thread dumps and many other useful monitoring things when running as a windows service. I tried it and it worked just fine.

Links:

http://tmitevski.users.mcs2.netarray.com/trace.do

http://www.bpurcell.org/blog/index.cfm?mode=entry&entry=1062

Send me the thread dump as soon as you have them so I can analyze them.

Thanks,

– Gato

Hey jared,

Make sure that you are using the correct when executing kill -3. If you are running the server with nohup then I think that the information will be dumped to the nohup file instead of the stdout.

Hope that helps.

Regards,

– Gato