Unicode (UTF-16) vs UTF-8 in database

Hi,

as I’‘m using the embedded DB it could be the cause of the problem. It has an entry like this one (other UTF-8 JID’'s also fail, but I want to keep this post simple):

INSERT INTO JIVEUSER VALUES(’’\u20ac’’,’‘test’’,’’’’,’’ ‘’,’‘001139750982769’’,’‘001139750982769’’)

INSERT INTO JIVEUSER VALUES(’‘b\u20acn’’,’‘test’’,’’’’,’’ ‘’,’‘001139753299991’’,’‘001139753299991’’)

I did create it using Spark but I can not login. Using the adminconsole to edit this users causes an exception. A short application using Smack can login, so it seems to be a Spark problem which will hopefully not be discussed here.

u20ac is the unicode representation for the ? “Euro Sign”. I wonder why unicode instead of UTF-8 (“0xE2 0x82 0xAC”) is used to store the user name (Unicode/UTF-8 reference: http://de.wikipedia.org/wiki/UTF-8#Beispiele ).

Something to be discussed:

Is encoded unicode the right format to store JID’'s?

Why are some characters stored as is and some characters as encoded unicode? Is this a feature or a bug?

I’'d like to store everything as UTF-8.

Maybe mysql or other lame databases need a special treatment on Wildfire side.

LG

Message was edited by[/b]

A note just for myself (http://www.xmpp.org/specs/rfc3920.html#nodeprep)

An XMPP node identifier is the optional portion of an XMPP address that precedes a domain identifier and the ‘’@’’ separator; it is often but not exclusively associated with an instant messaging username.

These processing rules are intended only for XMPP node identifiers and are not intended for arbitrary text or any other aspect of an XMPP address.

In this case: username=XMPP node identifier[/b]

The character repertoire that is the input and output to stringprep: Unicode 3.2 (UCS-4, 32 Bit)

One may store these values in every format (UCS-2 if possible, UCS-4, UTF-8, …)

How will one be able to export values which are stored like this? I’'ll test the export/import plugin one day.

Message was edited by[/b]

The admin console uses UTF-8 encoding for the URL (/user-properties.jsp?username=b%E2%82%ACn) for “bEn” while the value is stored as unicode in the database.

Doing some research I can update this thread:

Is encoded unicode the right format to store JID’'s? Why are some characters stored as is and some characters as encoded unicode?

Wildfire / Java works internally with Unicode, so it’'s the JDBC driver or the HSQL database which stores it like this.

The admin console uses UTF-8 encoding for the URL (/user-properties.jsp?username=b%E2%82%ACn) for “bEn” while the value is stored as unicode in the database.

My browsers sends URL’'s as UTF-8.

Users and groups which contain unicode characters cause trouble, the admin console can not edit or delete them. Is there someone who can confirm this?

still no JM-issue, still the same with Wifi 2.5.1

ping

Unfortunately, I can’'t find any information about character encoding issues with hsqldb. Any chance you could try this with an external database? For example, MySQL using UTF-8 encoding?

Also, does this issue only happen when creating the user from Spark or also if you create the user directly from inside the admin console?

Regards,

Matt

Hi Matt,

I just wonder if one can edit users and groups which contain unicode characters using the web admin console. Can you edit the user b€n@jivesoftware.com using the web admin console (/user-properties.jsp?username=b%E2%82%ACn) ?

Is there someone who can or can not edit users or groups with unicode characters using the admin console and post the result here, together with the used database?

LG

PS: I don’'t want to discuss the Spark login issue here (it works on your server, so it seems you are not using the embedded database).

LG,

Yep, I just confirmed that I can edit that user on our server. We’'re using MySQL as the database. So, it appears this is a problem specific to the embedded database?

Regards,

Matt

I’'m able to edit to edit the properties of username داى (which escapes as user-properties.jsp?username=%D8%AF%D8%A7%D9%89 ) through the admin panel. I created the user using our inhouse client.

In contrast to what I reported in JM-497 I am now able to login, subscribe to other users and send messages.

I’'m running a SVN version that I pulled March 13th. Our database is Postgres 8.0.

instant update[/b]

At my end, the username b€n (bravo, euro-sign, november) doesn’'t cause any problems either.

Back to the “The requested user (null) was not found.” exception. Maybe it is a Tomcat problem, I see it with the embedded db and with Oracle. So I did edit DefaultUserProvider.java lines100ff (#loadUser):

Log.debug("Username: "+ username);

Exception e = new Exception();

e.printStackTrace();

/code

Clicking on “b€n” writes this to the debug log:

2006.04.15 17:21:25 Username: b€n /user-summary.jsp[/i]

some delay before clicking on /user-properties.jsp[/i]

2006.04.15 17:21:28 Username: b€n

2006.04.15 17:21:28 Username: b€n

So the first time the user is right /user-summary.jsp[/i], but the next two calls cause trouble. The three exceptions are found in Tomcats stdout_20060415.log:

1st:

java.lang.Exception

at org.jivesoftware.wildfire.user.DefaultUserProvider.loadUser(DefaultUserProvider .java:108)

at org.jivesoftware.wildfire.user.UserManager.getUser(UserManager.java:171)

at org.jivesoftware.wildfire.admin.user_002dproperties_jsp._jspService(user_002dpr operties_jsp.java:101)

at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)

2nd + 3rd:

java.lang.Exception

at org.jivesoftware.wildfire.user.DefaultUserProvider.loadUser(DefaultUserProvider .java:108)

at org.jivesoftware.wildfire.user.UserManager.getUser(UserManager.java:171)

at org.jivesoftware.wildfire.user.UserCollection$UserIterator.getNextElement(UserC ollection.java:94)

at org.jivesoftware.wildfire.user.UserCollection$UserIterator.hasNext(UserCollecti on.java:57)

at org.jivesoftware.wildfire.admin.user_002dsummary_jsp._jspService(user_002dsumma ry_jsp.java:220)

at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:94)

LG

+update:[/b] does happen with the .war version, not with the stanalone one

–> may be an encoding problem of the URL while using Tomcat

Message was edited by: it2000

One can track the progress, if any, here: JM-685