Possible resource leak?

Hey guys,

My Setup:

Openfire 3.8.2

Spark 2.7.0 (From TRUNK on june 27th) - customized with internal company branding + SPARK-1515 + SPARK-1538 - built with system’s oracle jdk 1.7.0_25 and embedded with jre 1.7.0_21 via install4j (i’ve also tried embedding 1.7.0_25) - plugins: Window Flashing, OTR, Roar, Spellcheck

I rolled this out to most of the office after testing it a few days on my system. all appeared to be ok. but then after a few days, users started to complain their system had suddenly become unresponsive - typing would have the letters show up one at a time slowly in whatever program you use such as notepad, system sluggish to respond to clicks, etc. Upon checking it out for a while, I eventually discovered that if I could get task manager to open (some systems slowed so bad, only could restart them) - and killed the Spark process, then the system immediately returned to normal.

Watching this a few weeks, it seems that once Spark passes the 55+ hour mark of continuous runtime, then things start to get a little weird. Memory usage measured via task manager on my system shows spark consuming almost 400MB of ram – I’ve tried setting a script on the openfire server to shutdown openfire for a period of time, forcing all attached clients to be DC’d and have to reconnect, but same thing still happens. Only solution I’ve found so far is to have users exit then re-open spark after a few days…

I’ve attached jprofiler to my spark running on my system, and I can see that after a while of runtime, the GC seems to get overwhelmed with excessive object instantiations. Althogh I’m stumped as to the actual root cause (if it’s the embedded jre, jre version, a plugin, one of the new patches for jtattoo, etc). Basically what i’m seeing is the GC will kick in – then immediately after it’s finished, a ton of new objects get instanciated and heap memory usage climbs right back up. When the latest lock happened on my system, The heap had gotten down to 5MB’s available, however shortly before the lock, it had gotten down to < 1MB available.

Anyone else experiencing this issue while running on embedded j7 for long periods of time? This could possibly be a bug in the newer jtattoo release as that’s really the main thing that has changed from the trunk i pulled from… ?

I see that both tickets are In Progress. Were they actually commited to svn? My users are currently on 598 build from Bamboo. Though we usually shutdown PCs at the end of the day, so the longest Spark run should be only around 8-12 hours. No such issues anyway. Spark is only occasionally locking up on some low resource systems, but the system remains responsive.

No such think in our internal build that is in use for several months.

hmm, thanks for the input fellas. i’m going to continue to troubleshoot this locally, as it’s appearing to be something i’ve added then (either one of those tickets that are “In Progress” or something else). I’m not concerned about the memory usage, all the systems over here have at least 2GB, while most have 4GB+. When this problem happens, systems are not running out of ram (as evident on my system with 8GB’s of ram), but simply stop responding normally and become really sluggish. It’s strange to say the least. Exacerbating the problem, it only occurs after so much tmie of continuous runtime, so it’s difficult to know if a small change has fixed this until a few days later. :-/ – I’ll post back if I figure anything out…

Were you able to figure out the issue? I am experiencing similar problems and have a similar set up as you. I have a customized Spark but I’ve taken things out and locked it down. Be nice to know which build is stable. Thanks!

Hello Vinh

Unfortunately I have not solved this as-of yet. I havne’t had much time the past few weeks to troubleshoot this much more than before, coupled with the extreme length of continous runtime before the issue occurs… it’s making for a difficult one to track down.

Assuming there is no issue with the TRUNK – the issue would have to reside with something either in Install4j (I’m using latest version from ej-technologies website) and/or the bundled jre version Install4j uses (from the automatic download within the install4j gui wizard). If it’s something install4j is causing, it could be something with the libs and runtimes they use to launch Spark.

My custom build was taken from TRUNK, branded with company logo and such, and locked down a bit so users don’t get too curious, Then compiled using ANT 1.9.x with JDK 1.7.0_25 x86 from Oracle on a 64 bit windows 7 machine, then packaged with Install4j 5.1.6 64 bit version (had to swap out Install4j’s bundled JRE with a 32 bit j7 JRE from oracle’s website since the included bundle is actually still JRE6 and I got some packaging errors initially). This produced an EXE which I pushed to everyone in the office after doing a full removal of all previous versions of spark and killed local user profiles so that everything was Fresh. My userbase spans Windows 2000, XP, and Win7 workstations with varied installed RAM … majority are 2GB+.

From using JProfiler - I can see huge volumes of object instantiations… then GC does it’s thing and wipes them out (causing heap memory usage to drop as well), but then almost imediately it goes back up. At it’s peak, there was over 1 million objects in memory, compared to when first launching spark it’s well below 100K. This happens for me after Spark has been running continuously for a long while (several days, seems to be about every 3+ days the problem happens and machines start to go sluggish and stuff until I kill Spark via Task Manager and relaunch it).

So something is indeed leaking, somewhere. The question is what’s causing it… if it’s TRUNK codebase related or if it’s something else (like packaging or compile tools, etc). Possibly it’s an issue existant in the TRUNK but only aggrevated and brought to light by something (such as company branding and/or locking spark settings down and/or bundled jre, etc etc etc). It’s really tough to say…

In the meantime, I’m going to roll most of my userbase back to release 2.6.3 which I know works perfect in our environment. Myself and a few select workstations will continue attempting to debug this (my users who I know won’t complain too much lol).

Thanks for the information. This is actually the first attempt to roll this out so I have nothing to roll back to unfortunately but we haven’t widely released yet either. I’m actually experiencing the same issues and symptoms on a modified version of 2.6.3. One difference though is that it was not compiled the same way. I used IzPack instead of Install4j. Pretty safe to assume it is not that then.

We have a logo’ed version that locks down features under preferences. All are on machines with 4 GB min with a 64 bit version of Windows 7 or Server 2003+. When it’s running ok, it uses about 100 MB of memory and when it starts acting up, it is using about 300 MB. Like you said, weird things start occuring after a period of time. Sometimes there is no Spark.exe process anymore and just a javaw.exe.

Anyways, appreciate if you could hit me up if you make any head way and I’ll do the same. Thanks!

The javaw.exe thing occurs sometimes when Spark get’s DC’ed from the server or is logged out… although for me it’s a more rarish thing when I notice that. Basically it has to do with how Spark get’s restarted… or at least I think.

By “mofified 2.6.3” I’m assuming you mean taking from the SVN codebase (then tweaked and compiled/packaged), which would be more-or-less closer to a 2.7.0 (next release) than 2.6.3 at this point (a ton of improvements and bugfixes have gone into the codebase since 2.6.3 was released).

Your memory consumption is exactly in-line with what I’m seeing, normal runtime is 100MB’ish… but then over time it gradually climbs and peaks out around 300MB+. A profiler shows GC starts to get overwhelmed when this happens, as it peaks and runs out of heap space, so GC kicks in – freeing a few MB’s (not a lot really), but then it climbs right back up.

I’m starting to wonder if it has something to do with the branding and/or locking down of settings-- and to be honest, I haven’t tried the same codebase (local copy pulled from the TRUNK) without the branding added in as well as it being locked down a bit. I’m branding it with our company logo’s and stuff just so it looks a little more “native” with our other internal business apps… we use spark internally only, no public-facing use.

I’ll try to experiment with this next week.

Yup, you are right. I am using the SVN and compiling it after some locking down of the features. Never really considered it 2.7.0 since it still says 2.6.3 but get your point.

I’ll try to take out the branding and see if that works today. Can’t really test it immediately since it takes a while for the issue to present itself though. Will let you know the results. Thanks!

welp… I ended up rolling some of my users to one of the beta builds instead of release 2.6.3

http://bamboo.igniterealtime.org/browse/SPARK-INSTALL4J-604/artifact/JOB1/Instal l4j/

and… today same issue happened. I cannot recall if I killed the profile or not… i’ll have to check when my users leave or before they get in tomorrow for the earliest transcript history date…

if I didn’t kill the Spark profile, then this was a bad sample case since something left-over from the branded/customized TRUNK version could be causing this. If I did kill the Spark profile, then i’d be worried this is something in the TRUNK causing the issue.

I’ll report back after I’ve had a chance to check the local machines…

Walter or Wroot, by any chance… are your users staying logged into Spark for excessive periods of time? like 3+ days without shutdown/reboot or logout… so 3+ days of continuous Spark runtime?

Some of my users do (do not shut down their PCs going home, so it sleeps al lthe night). But i ten to hunt such users down and teach them to turn off PCs I’ve had a few issues lately where Spark wasn’t reacting to mouse, showing super minimized window (titlebar only), so i had to kill it and restart, but i can’t say whether those PCs were running long and only Spark was hanging, other programs were fine. And this happens usually on old slow machines with Vista.

So I removed the branding but the issue still persists. You’re right, it’s right at about 3 days when the symptoms start exhibiting though. Oddly enough, I have 1 of the 5 that has been running for 10 days now though. Totally lost for things to try.

ya… some of my users refuse to turn off their machines (mostly management who has VPN access to the office incase they need to do work from home and what-not). I personally keep my machine on 24/7/365 so I can remote into the office and do stuff as needed.

The symptom you are describing where Spark is minimized and you try to bring it back up but only the menu bar shows IS actually one of the symptoms I am seeing (maybe Vinh can chime in on if he’s seeing this also). Wroot, next time that occurs to one of your users, can you check in Task Manager for Spark’s memory usage? In all my cases, it’s balooned to around 300MB+… which in itself is not an issue since there is plenty of available RAM in all my cases (my system has 8GB’s of ram), but it’s a symptom of what happens when the issue occurs.

oh, just noticed you said your systems Sleep when left on… my systems do not, they run normal with just windows locked (windows key + L). Perhaps you can test 1 system by disabling sleep and anything that will reboot it automatically, then login to Spark using a beta build, and let it sit for a few days then try to play around with it… ?

i think it’s starting to look like something in the codebase added recently… since the release 2.6.3 works for me: http://www.igniterealtime.org/downloadServlet?filename=spark/spark_2_6_3.exe

I guess we can only find this issue by way we are using Spark… since it seems not many users keep systems on 24/7 without sleeping the system or auto-reboot or shutdown…

If soemone else can confirm this in a beta build, we’ll need to go back to a build that works as expected and then examine all commits to the Trunk since then to try to find the source of the issue. This will be a possibly length process since we’ll have to compile then run for a few days before the issue is apparent, etc.

Yes, the mimized window is a symptom. To go into a little more detail, the actual window becomes frozen and can be transparent. Clicking the program in the task bar does nothing. To users, the status does not change and messages are not received. Like Jason mentioned, the memory goes to 300MB when normally it’s 100MB.

I have a machine that hibernates every night and has not been exhibiting the problem. The sleep case you bring up is interesting. It’s easier for me to place a group policy making everyone sleep their machine than to reboot. Please let me know your results. Thanks!

I’ve already planned to test long run with my own machine. Though i would have to disable sleep also. Or should i try with sleep enabled first? Anyway, will be able to do that after a week only.

hmm… maybe try without sleep first to see if you can first replicate the issue we are seeing. if so, then i think we can agree there’s something going on

A little word that might help pinpoint the issue.

I have the same experience as the one described earlier.

More or less 3 days running continuously (no sleep), removed my profile before last test, only the client running connected to the openfire server (no conversation). Memory bubbles up and the station becomes unresponsive … from ~60Mb at startup to ~200Mb 24 hours later to ~300Mb 36 hours later.

I have the only station in our office using the builds from the bamboo repository, so (to my knowledge) the only one experiencing the issue.

The setup is on a Windows 7 64 bit, I am using directly the Install4j artifact for Windows including the Java JRE (not the “_online” version), the revision I am currently using is 13681 (#596).

Now, I remember having this issue for awhile, I think I realized the sluggishness of my station being caused by the Spark client around April or May.

I do know the issue was not present with the artifacts at the beginning of the year (or I can’t say I experienced the symptoms before). My guess is that the issue popped up somewhere around February or March, so somewhere in the first quarter of the year.

I did not report this earlier since I had not seen any reports of the issue and no one else complained about it, so thought it was something specific with my setup, but it looks like it’s not the case.

Edit: I just logged off and logged back on again (without closing the Spark client) and the memory usage was reset from ~160Mb after about 13 hours of use to the initial launch size of 60Mb.