I recently installed PacketFilter to try and prevent users of a particular group from messaging each other. When I created a filter rule (Drop all Messages from GROUP1 to GROUP1) the CPU utilization shot up to 100%. I was unable to access the plugin’s admin page or remove the plugin. I ultimately had to shut the service down and manually remove the plugin because users were unable to log into Openfire and the users that were already logged on were unable to get messages through. Are there any preformance enhancing steps I could take?
I’m using OpenFire 3.5.1 and PacketFilter 2.0.1. The host machine is a 1.8 GHz Xeon with 1 GB RAM. I’m using the built in database, and the system uses LDAP for AD integration. Right now we’re in a pilot program, so there are only 10 users on the server.
I’ve tried reproducing your problem and haven’t had any success. I’m going to try again later today with an AD setup to see if that allows me to reproduce the problem.
I just tried this with a OF 3.5.1/PF 2.0.1/AD setup and couldn’t reproduce the CPU spike. To help debug could a I get a couple thread dumps in sequence while the cpu is spiking? Let me know if you need some more instruction on how to take a thread dump of a running Java process.
Sorry it took me so long to get back to this thread - I was working on implementing the system without the packet filter. While doing some internal testing, I determined that the packet filter only choked when I attempted to filter using groups with spaces in the names. Removing the whitespace and recreating the rules works perfectly.
When I looked in the logs, I noticed that, even though I specified a group with whitespace, the LDAP parser (or whatever) was attempting to locate that group, only without whitespace. This query failed, but used up all the available CPU time before doing so. Then, when I went back into the packet filter admin page to verify my settings, I noticed that the plugin removed the whitespace. It appears as though Packet Filter can’t handle AD groups with whitespace?
I talked with Jadestorm about this and he said it was a known issue in OF. It should be fixed in 3.6.0.
Wonderful! Thank you for looking into this, and thanks for an amazing product - we’ve got about 70 people in our office on the server now, and it’s a huge hit!
No problem. Its always good to hear when people are using what I wrote. Be sure to upgrade to Packet Filter 2.0.2 it has a bunch of performance improvements.
I’m having a similar issue where the lsass.exe process spikes on the DC OpenFire is pointing to so I’m unable to put the packet filter into production and it’s hindering our current deployment. Is there an ETA on OF 3.6 and Packet Filter 2.0.2? I can start a separate thread if required for help on my specific issue, just let me know.
OF 3.5.1, packet filter 2.0.1, Windows 2003 SP2, AD integrated.
I’m pretty sure this is an Openfire issues. I’ll chat with Jadestorm about it this week to see what’s up with it.
I’m afraid I don’t know enough about active directory to speak on what is going on with the Isass.exe process. Whatever that is. =) Most likely the space has confused Openfire and Openfire is doing a query that results in not hitting indexes or something and so the LDAP service is spiking trying to answer the query.
I appreciate the help and feedback… as per another msg I ran another test this morning. I renamed my AD groups eliminating all spaces in the names. My DC runs about 8% CPU and my dedicated IM server runs about 15% (all openfire-service). Once I renamed the AD groups I created 1 packet filter rule and all hell broke loose… ok not that bad but the DC lsass.exe spiked to a continuous 40-50% and the openfire-service on my IM server spiked to 50-60%. LDAP queries from the IM server to the DC in question shoots through the roof and starts tripping our OpManager alarms with 176 simultanious sessions.
There were only about 20 people on IM at the time and there was msg lag of approx 5mins for current users and any new user was unable to login to IM. The second I delete the msg rule everything goes back to normal within a minute or so. If there are any logs I can set and provide to help troubleshoot let me know and I’ll gladly put them up. We are a call center environment and at this time pilot rollout is stopped as the prime requirement is for disable communications between agents.
lsass.exe -> Local Security Authentication Server. It verifies the validity of user logons to your PC/Server. It generates the process responsible for authenticating users for the Winlogon service. This process is performed by using authentication packages such as the default Msgina.dll. If authentication is successful, Lsass generates the user’s access token, which is used to launch the initial shell. Other processes that the user initiates inherit this token.
Thanks thus far and hope to have this resolved soon… OF3.5.1, W2k3 SP2, AD
Packet Filter 2.0.2 is mentioned earlier by Nate? I can’t seem to find this in downloads? Only 2.0.1 which is what I’m running…
I’ll cut you a build of 2.0.2 tonight if you would like to test it out.
Here is a cut of 2.0.2 that should work with your setup. Let me know if you run into any issues.
packetFilter.jar (59088 Bytes)
Posting up a snip of the error and warning log after a reboot and service start. Also did a 6 part thread dump of OF service:
-dump1 : packet filter plugin loaded, no rules created
-dump2 : taked during the creation of the filter rule
-dump3 : taken after the filter rule was created
-dump4 : taken after the filter rule was created
-dump5 : taken after the filter rule was created
-dump6 : taken after the filter rule was deleted and CPU back to normal
I know I have to clean up my security group memberships as noted in the error log. Those “non-existant username” errors are due to disabled user accounts still in the open fire security groups.
Hope this helps shed some light… deployment to none management has been put on hold until I can resolve the packet filter issue. Much appreciated.
OFLogs.rar (63283 Bytes)
Sorry it has taken me so long to reply. I looked at your thread dumps and I do see where the problem is happening in Packet Filter. I suspect that the root problem is in the populateGroups method for the ldapGroupProvider. A couple things I would ask you to do :
Ensure that all your caches are full and effective. You can do theis in the admin console.
I saw a bunch of threads in :
org.jivesoftware.openfire.archive.ArchiveIndexer.updateIndex(ArchiveIndexer.jav a:228), Although this isn’t causing your issue it could be contributing to it. Can you take a look at your archive settings?
I have a test AD server with a large user base that I should be able to reproduce and hopefully fix this issue with. The bad news is that it will have to be done when I get some spare time which might not be for a week or so. I’ll do my best to get to it this weekend.
I’m very happy you have a lead on a fix for this… I’ve attached a screenshot of my Cache Summary so you can take a look but I think all is well. I have tried clearing this in the past to see if it would help after making changes but no effect. If you have any suggestions on any tweaks I can do with Cache or any other just let me know.
I appreciate the time and effort troubleshooting this so no need to appologize for the ETA…