I recently deployed openfire 3.5.2 into a production environment, using exodus as the client piece.
In our test environment I noted some strange performance issues in this thread: http://www.igniterealtime.org/community/thread/33875, which appeard to be related to this thread: http://www.igniterealtime.org/community/thread/32740
I believed I had fixed the problem, and it has not occured during a months worth of testing. After deploying to our production environment and adding in many more users, however, the problem is back.
Using the statistics plugin, I have noticed that the packets per minute will “randomly” jump to the 60000 - 250000 range, when 100-500 is typical. The java process then jumps up in CPU time to between 30 and 70%, where 0.0 - 1.0 is typical.
To troubleshoot, I turned on packet auditing. After a few seconds it had queued up over 70000 packets, so I turned it off and waited for openfire to dump the log file. While dumping the log file, the java process reached 99%.
I took a look at the audit logs, and it seems these packets are all from a single logged on user. They all appear to be presence update and presence query packets to other random users logged on.
I checked this particular users session details, and noticed an insane amount of packets sent/recieved.
Session Statistics: Packets Received/Sent: 5,009,948/5,010,013
This doesn’t look like the exact same issue as the above threads since the IQ packets are being sent to random users instead of gateway or other server subdomains.
Comments or advice? Has anyone else ran into this and come up with a fix for 3.5.2 (I can’t use nightly builds since this is in production…)
It is also worth noting that if I clear out all the server cache, the packets stop spraying… but after I do that, all the group chat rooms and presence updates seem to break down (users cannot see eachother as online, cannot get into chat rooms… etc) and I end up having to restart openfire. I am not sure if there is a particular cache that stops the spraying (instead of clearing out everything) and I haven’t gone through one by one to test since I cannot risk breaking group chats during on hours again, and after hours the packet spraying does not seem to happen.