we are running an openfire cluster 3.8 with hazelcast distributed cache. We are noticing that the Client Session Info Cache grows non-stop. I have added cluster_wide_map_size to restrict the growth but are running into issues.
Does anyone know why this cache would grow continuously? It seems like data is not being evicted.
Does the latest version (3.8.2) resolve this issue?
what box you’re running it on,
We are running it on large Amazon ec2 instances (7.5gb of memory and 4ECU). We are starting our instances with 6GB of max memory
how you’re measuring the memory leak growth?
I’m just looking at the cache summary. it’s exceeding the max size and continuously growing. Seems entries are being written without eviction. It grows to >100MB per day.
at what rate’s the leak at (MB/min)
Not MB/min but over 1 day or so, it gets to be >100 MB
what level of XMPP activities (message/min) is the box at?
We have very low number of messages, but a large number of connected clients (100k users with 10k concurrent connected). However, the msg rate is less than 20/min.
how many clusters are there?
There are 4 machines in the cluster.
What’s the appropriate cluster_wide_map_size value for Client Session Info Cache? I have it currently set to 100000. Is that too high? I also had to tweak the eviction delay and eviction rate as well.
That seems pretty low in terms of activity and concurrency. Are the CPUs screaming and the amount of interrupts trending up? Hope you have munin / cacti to track trends in resource usage
With all due to respect, is there a possibility that it may be just the cache at work… caching? I wonder how long have u kept a machine running in its cluster? I’d let one climb up til it nearly maxes things out
Also, have you gone through the settings to tighten the spigot on client connection durations? Curious to know how many secs does it take to disconnect clients automatically?
The CPUs do trend up to exceed 50% utilitization. So I’m worried about that. I don’t think we have a lot of data, so 50% utilization seems high. I will check our monitoring servers to see if there are more details on resource usage.
My concern is that the Client Session Info Cache size does not seem to adhere to the max size in the cache summary page at all. It just exceed that limit and keeps on growing.
I’m not sure about client connection durations since we are using XMPP to send notifications to clients and need to maintain a connection when our clients are running, I will try with a lower value to see if it makes a difference.
Have you tried setting the cache value to -1 and let it be unlimited? It sounds like to me that you believe the cache is not being effective because it is too small and therefore impacting performance.