Connection to AD keeps dropping - HELP!

Hi,

I just installed Openfire 3.6.3 on Windows 2003 and it authenticates through AD on separate server. I have alos installed sparkweb on the same server on separate apache service. I have set the sparkweb to use HTTPS binding and set the Openfire server security setting to “Required”.

I have been tested all the features including FTP transfer and it worked fine until yesterday when more people (less than 10 people) tested the openfire. I had 3 people could not login and received “Not Authorize” messages and some people got kick out etc. At that time I was trying to login to admin console and I can not login either. So I restarted the openfire and after a while I was able to login and users alos got kicked out and were able to login again. But this didn’t last long. It happened again an hour later. So I manage to get the message from the log as below:

2009.04.30 03:54:02 [org.jivesoftware.util.log.util.CommonsLogFactory$1.error(CommonsLogFactory.java:88) ] Line=19 The content of element type "dwr" must match "(init?,allow?,signatures?)". 2009.04.30 08:23:11 [org.jivesoftware.openfire.ldap.LdapAuthProvider.authenticate(LdapAuthProvider.java:122) ] Error connecting to LDAP server javax.naming.CommunicationException: sample.com:389 [Root exception is java.net.UnknownHostException: sample.com] at com.sun.jndi.ldap.Connection.<init>(Unknown Source) at com.sun.jndi.ldap.LdapClient.<init>(Unknown Source) at com.sun.jndi.ldap.LdapClientFactory.createPooledConnection(Unknown Source) at com.sun.jndi.ldap.pool.Connections.getOrCreateConnection(Unknown Source) at com.sun.jndi.ldap.pool.Connections.get(Unknown Source) at com.sun.jndi.ldap.pool.Pool.getPooledConnection(Unknown Source) at com.sun.jndi.ldap.LdapPoolManager.getLdapClient(Unknown Source) at com.sun.jndi.ldap.LdapClient.getInstance(Unknown Source) at com.sun.jndi.ldap.LdapCtx.connect(Unknown Source) at com.sun.jndi.ldap.LdapCtx.<init>(Unknown Source) at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(Unknown Source) at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(Unknown Source) at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(Unknown Source) at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(Unknown Source) at javax.naming.spi.NamingManager.getInitialContext(Unknown Source) at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source) at javax.naming.InitialContext.init(Unknown Source) at javax.naming.ldap.InitialLdapContext.<init>(Unknown Source) at org.jivesoftware.openfire.ldap.LdapManager.getContext(LdapManager.java:480) at org.jivesoftware.openfire.ldap.LdapManager.findUserDN(LdapManager.java:684) at org.jivesoftware.openfire.ldap.LdapManager.findUserDN(LdapManager.java:637) at org.jivesoftware.openfire.ldap.LdapAuthProvider.authenticate(LdapAuthProvider.java:112) at org.jivesoftware.openfire.auth.AuthFactory.authenticate(AuthFactory.java:158) at org.jivesoftware.openfire.net.XMPPCallbackHandler.handle(XMPPCallbackHandler.java:87) at org.jivesoftware.openfire.sasl.SaslServerPlainImpl.evaluateResponse(SaslServerPlainImpl.java:112) at org.jivesoftware.openfire.net.SASLAuthentication.handle(SASLAuthentication.java:245) at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:161) at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandler.java:133) at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.messageReceived(AbstractIoFilterChain.java:570) at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:299) at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:53) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:648) at org.apache.mina.common.IoFilterAdapter.messageReceived(IoFilterAdapter.java:80) at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:299) at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:53) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:648) at org.apache.mina.filter.codec.support.SimpleProtocolDecoderOutput.flush(SimpleProtocolDecoderOutput.java:58) at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecFilter.java:185) at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(AbstractIoFilterChain.java:299) at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:53) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceived(AbstractIoFilterChain.java:648) at org.apache.mina.filter.executor.ExecutorFilter.processEvent(ExecutorFilter.java:239) at org.apache.mina.filter.executor.ExecutorFilter$ProcessEventsRunnable.run(ExecutorFilter.java:283) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) at java.lang.Thread.run(Unknown Source) Caused by: java.net.UnknownHostException: sample.com
at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.<init>(Unknown Source) at java.net.Socket.<init>(Unknown Source) at com.sun.jndi.ldap.Connection.createSocket(Unknown Source) ... 47 more =====
Notes: In the above log I changed the real AD server to sample.com

The openfire was unstable for the whole day.

Our openfire runs on VMware environment and it has 512MB and 1 CPU assigned to it.

So I increased the Java Virtual Memory from 64 MB to 200MB. Last night I tried and it worked fine although only me and my test user login. This morning I got report that one user uses Pidgin 2.5.5 had a hard time to login. She keeps getting “Not Authorized” and in the Info Log Isaw a lot of this kind of message: “User Login Failde. PLAIN Authentication failed”. After a while she can login without my intervention such as restarted the Openfire.

My questions:

  1. Does anybody knows whether this is coding issue, client issue or I need to get more Virtual Memory assigned to it?

  2. If it is Virtual Memory being not enough, what is the best formula to get the correct size of the VM? Is it 80% of Available Memory? I might have about 200 users who will use this jabber service.

3.From some forum people mentioned that new client, espeacially Pidgin 2.5, has “Heartbeat check up” capability which uses a lot of Virtual Memory on the openfire. They suggested to "Enable parallel garbage collectors “-XX:+UseParallelGC” and set xmpp.client.idle= -1

"

4.I also noticed in the debug log there are a lot of VCard error which I believe might burden to the openfire process. Should I just modify the VCard setting through the admin console so it reflex what our user AD profile?

Thank you in advance for your help.

regards,

Charlie

I hope that 8 months later the community has found something to fix this as i had to quit using Openfire after not being able to get on.

Though we had 50 users, and a bit beefer server then you. It was originally believed to be a resource issue.

Thanks Ryan for sharing

Last night the AD connection dropped again after I increased the Java memory to almost 1 GB from 200 MB in the afternoon. There were no users active at that time but for some reasons the connection just drop. Interestingly I noticed during that time I was not able to ping the AD host. I got unknown host. I tried to telnet to the AD host name on port 389 and got connection refuse due to unknown hostname. So I restarted the openfire and it worked again. I was able to ping and telnet the AD server also.

I believe something on the openfire (java) caused the DNS resolving issue but not the connection issue. I was able to rdp to the server. I added the AD in the hosts file and so far it has been up and running for about 20 hours.

I am out of idea. I might need to upgrade to the latest version 3.6.4.

Any idea?

Thanks.

Charlie

Hi Charlie,

Java 1.5 did set “networkaddress.cache.ttl” to -1 which means ns lookup results will be cached forever. Java 1.6 sets this to an implementation specific value (30 seconds for Sun). So if you have problems with your DNS server then the JVM can no longer lookup hostnames.

Adding the entry in your hosts file is a very good idea. If your server does not connect to other XMPP servers then you could set “networkaddress.cache.ttl” to -1 to make sure that the JVM / Openfire does lookup host names only one time.

LG

Hi LG,

Thank you for the info.

Since I added one of the AD hostname into the hosts file, we didn’t experience any AD connection loss anymore.

To add the “networkaddress.cache.ttl” with value “-1” can I do it through system properties" in Admin console?

This server is going to have around 100-150 people online concurrently. Previously the server only has 512 MB and I increased it to 1.5 GB with Java memory from 64 MB to 900 MB (80% of available free memory). Do you think it is necessary to have this size of memory assigned to it?

Some people suggested that I added garbage collector parameter. Do you think it will help the performance?

Thanks again for the advice.

regards,

Charlie

Hi Charlie,

you need to set the parameter as a JVM option as you set “-Xmx1500m” with “-Dnetworkaddress.cache.ttl=-1”.

As long as you did not already read JVM Settings and Debugging you may want to read it and set at least “-XX:+PrintGCDetails -Xloggc:/tmp/gc.log”. Adjust the Xloggc path and filename and make sure that Openfire can write it.

As you have 80% free memory you should start with “-Xms64m” and monitor the memory Openfire is using during normal workload. Setting Xms to this value is usually a good idea. Then you can decide how to size Xmx, 2*Xms may be a good idea unless Xms is already very high.

LG

Hi LG,

Thanks again for your suggestion. I will try later.

regards,

Charlie