SSO stopped working...sorta

I’ve had SSO working on a Windows AD domain for over a year now without issue. It’s still working for the most part but as of yesterday one client will not connect with SSO and I can’t figure out why.

• Openfire server: 2008 Server R2

• Clients: mostly Win7x64, a few XP 32 - all use SSO and are working fine

• I do not use krb5.ini, solely DNS

• DNS is setup properly, PTRs all intact, all other clients connect fine using DNS

• clients DNS servers are set properly

• spark.config is 1 master file that is copied via group policy to all clients every time they login in order to standardize settings and delete any changes they make per session

• client java is up to date, no other changes made to AD, client, server etc

• Spark is latest version

The particular client not working is a 2003 terminal server. It’s worked fine for a year until yesterday. No changes have been made. When we start the spark client, the Username, Account and Server all populate correctly. When we login we get “Unable to connect using Single Sign-On. Please check your principal and server settings.”

The warn.log file populates with:

###########################

WARNING: Exception in Login:

SASL authentication failed:

– caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]

at org.jivesoftware.smack.sasl.SASLMechanism.authenticate(SASLMechanism.java:121)

at org.jivesoftware.smack.sasl.SASLGSSAPIMechanism.authenticate(SASLGSSAPIMechanis m.java:86)

at org.jivesoftware.smack.SASLAuthentication.authenticate(SASLAuthentication.java: 319)

at org.jivesoftware.smack.XMPPConnection.login(XMPPConnection.java:203)

at org.jivesoftware.LoginDialog$LoginPanel.login(LoginDialog.java:1014)

at org.jivesoftware.LoginDialog$LoginPanel.access$1200(LoginDialog.java:219)

at org.jivesoftware.LoginDialog$LoginPanel$4.construct(LoginDialog.java:730)

at org.jivesoftware.spark.util.SwingWorker$2.run(SwingWorker.java:141)

at java.lang.Thread.run(Unknown Source)

Nested Exception:

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]

at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(Unknown Source)

at org.jivesoftware.smack.sasl.SASLMechanism.authenticate(SASLMechanism.java:117)

at org.jivesoftware.smack.sasl.SASLGSSAPIMechanism.authenticate(SASLGSSAPIMechanis m.java:86)

at org.jivesoftware.smack.SASLAuthentication.authenticate(SASLAuthentication.java: 319)

at org.jivesoftware.smack.XMPPConnection.login(XMPPConnection.java:203)

at org.jivesoftware.LoginDialog$LoginPanel.login(LoginDialog.java:1014)

at org.jivesoftware.LoginDialog$LoginPanel.access$1200(LoginDialog.java:219)

at org.jivesoftware.LoginDialog$LoginPanel$4.construct(LoginDialog.java:730)

at org.jivesoftware.spark.util.SwingWorker$2.run(SwingWorker.java:141)

at java.lang.Thread.run(Unknown Source)

Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))

at sun.security.jgss.krb5.Krb5Context.initSecContext(Unknown Source)

at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source)

at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source)

… 10 more

Caused by: KrbException: Server not found in Kerberos database (7)

at sun.security.krb5.KrbTgsRep.(Unknown Source)

at sun.security.krb5.KrbTgsReq.getReply(Unknown Source)

at sun.security.krb5.internal.CredentialsUtil.serviceCreds(Unknown Source)

at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(Unknown Source)

at sun.security.krb5.Credentials.acquireServiceCreds(Unknown Source)

… 13 more

Caused by: KrbException: Identifier doesn’t match expected value (906)

at sun.security.krb5.internal.KDCRep.init(Unknown Source)

at sun.security.krb5.internal.TGSRep.init(Unknown Source)

at sun.security.krb5.internal.TGSRep.(Unknown Source)

… 18 more

###########################

Any ideas? It may be resolved with a reboot but I can’t do so at the moment as it’s in production. Anything else I can look into until I can reboot?

update: reboot didn’t fix it

is it one user or all users on the termserver?

All users. After the reboot yesterday I was the only one on, full admin rights. I can telnet to the openfire server on 5222 and it connects. I can connect if I turn off SSO and manually enter the password. I uninstalled and re-installed spark. Manually copied the known good .config client file. re-installed java.

did you double check the registry key? Also just to rule it out, did you disable any AV running on the server. If its working for other workstations and users…its likely a specific issue with the terminal server and not your SSO setup/AD

yes forgot to mention I verified the reg key, checked permissions of the key as well. Virus scan disabled. I’m sure it’s something with this specific machine but for the life of me I can’t figure out what’s happening. SSO is still perfect on 100 other machines but half the users are on this TS so now that everyone is accustomed to Spark, it’s kind of a big deal. You know users, too much hassle to enter a password yet again eyeball roll

does sso work for a domain admin account? if so, then the issue would point to a break in permisisons somewhere. also…what if you used a krb5.ini on that machine instead of dns? does it work?

Same ordeal regardless of account or permissions. All I’ve been doing is as a domain admin. If I clear out all the settings, open spark, all fields are blank, and set it to SSO and DNS, it immediately and properly pulls in the username and account as it should. I never got it to work via krb5.ini files but I can manually specify the realm and kdc and it pulls the user and account properly as well. When you login it still throw a “Login error: Unable to connect using Single Sign-On. Please check your principal and server settings.”

RESOLVED

So it turns out the solution was very simple. Sometimes we learn and experience so much, we tend to overlook the simple things. At one point in my troubleshooting process I was searching for all files on the TS that had been modified since the problem was reported. No one admitted to adding or changing anything but we know how that goes. “Hmm, why has the hosts file been modified recently? Let’s have a look.” sure enough there’s two entries redirecting facebook.com and www.facebook.com to the IP of the KDC. I asked him why he did that and he said he wanted to block facebook and he didn’t have access to the UTM server so he did it that way. Grr

The reason I didn’t pick it up earlier was because nslookup returned everything properly. lookup on the host returned the IP. Lookup on the IP returned the correct host. So, I guess kerberos did a RDNS on the hosts file and used the last kdc IP entry and not the first/correct one…weird.

Anyway, it’s back to normal now.

Thanks for your help and input speedy. Your posts helped me immensely when I set up SSO last year! I shall now go collect on a 12 pack a nameless co-worker owes me