Since severals years our application running in a tomcat is connected to ejjaberd using smack 4.3.4 (4.3.4-4.3 2019-05-27).
Since few days, tomcat sometime freeze at start up a a kill -3 gives the following trace :
at org.jivesoftware.smack.SmackConfiguration.getVersion(SmackConfiguration.java:93)
at org.jivesoftware.smack.provider.ProviderManager.<clinit>(ProviderManager.java:122)
at com.XXX.chat.xmpp.messages.XmppExtensionManagerConfiguration.initializeExtensionProvider(XmppExtensionManagerConfiguration.java:28)
....
The code is blocked on the getVersion call and never exit.
I can’t understand why it suddenly happen.
In our code we are using spring to add ExtensionProvider :
@Configuration
public class XmppExtensionManagerConfiguration {
private static final Logger LOGGER = LoggerFactory.getLogger(XmppExtensionManagerConfiguration.class);
/**
* Post-construction
*/
@PostConstruct
public void initializeExtensionProvider() {
ProviderManager.addExtensionProvider(ChatSystemPropertyExtension.ELEMENT,
ChatSystemPropertyExtension.NAMESPACE, new ChatSystemPropertyExtension.Provider());
ProviderManager.addExtensionProvider(EmitterExtension.ELEMENT,
EmitterExtension.NAMESPACE, new EmitterExtension.Provider());
}
}
During that call getVersion is called but never returns.
Does any one have an Idea of the cause of this lock ?
Curious. That getVersion call should simply return a static field value. I’m not immediately seeing synchronization blocks that should block this call.
Are any other errors being logged (particularly around SmackInitialization)?
Do you have any indication why this started happening only in the last few days? Did something change?
You’re right I forget to say that an other thread is waiting on that stack :
"xmppTransaction-1" #53 prio=5 os_prio=0 cpu=315.58ms elapsed=216270.05s tid=0x00007f1e37bce800 nid=0x4d67 in Object.wait() [0x00007f1d931d1000]
java.lang.Thread.State: RUNNABLE
at org.jivesoftware.smack.initializer.UrlInitializer.initialize(UrlInitializer.java:53)
at org.jivesoftware.smack.SmackInitialization.loadSmackClass(SmackInitialization.java:212)
at org.jivesoftware.smack.SmackInitialization.parseClassesToLoad(SmackInitialization.java:173)
at org.jivesoftware.smack.SmackInitialization.processConfigFile(SmackInitialization.java:143)
at org.jivesoftware.smack.SmackInitialization.processConfigFile(SmackInitialization.java:128)
at org.jivesoftware.smack.SmackInitialization.<clinit>(SmackInitialization.java:93)
at org.jivesoftware.smack.SmackConfiguration.getVersion(SmackConfiguration.java:93)
at org.jivesoftware.smack.ConnectionConfiguration.<clinit>(ConnectionConfiguration.java:64)
at com.xxx.chat.xmpp.XmppConnectionServiceImpl.createConnection(XmppConnectionServiceImpl.java:222)
at com.xxx.chat.xmpp.XmppConnectionServiceImpl.lambda$connect$2(XmppConnectionServiceImpl.java:245)
at com.xxx.chat.xmpp.XmppConnectionServiceImpl$$Lambda$1769/0x00000001012fd840.apply(Unknown Source)
In our log4j log or tomcat logs there are no errors visible.
Any way, may be I missed something but I did not find any synchronisation mechanism in the getVersion method so I don’t understand why getVersion never returns.
Both stack traces seem to be related, as they’re touching the same initialization routines. I’m guessing that initialization of one Smack provider … somehow fails?
Have you tried bumping up the verbosity of the logging system? The URL initializer logs what it’s doing, just before it ends up on that line where it gets stuck:
Loading providers for providerUri [" + providerUri + "]
Maybe those logs will give you some kind of clue.
Also, I found that it often pays off to double-check if the logging configuration also captures std-out and std-err. If those get dumped in something like syslog, then it’s easy to miss important log statements.
As we are using log4j and don’t have JUL bridge logs are not written to output file.
With the debugger I’ve found that provider loading is : “classpath:org.jivesoftware.smack.extensions/extensions.providers”
But the debugger gave me other informations.
Recently someone I our company added a home made threading mechanism to perform some execution on XMMP in an asynchronous way. I suspect it to be the root cause.