TDS Protocol error

https://issues.igniterealtime.org/browse/OF-1202

I’ve checked in a fix for OF-1202, but in any case, a query that takes longer than 5 minutes is very undesirable.

From the stack trace that was posted in the first message of this thread, I think we can conclude that this is the query that is giving you problems:

SELECT roomID, creationDate, modificationDate, name, naturalName, description,
       lockedDate, emptyDate, canChangeSubject, maxUsers, publicRoom, moderated, membersOnly,
       canInvite, roomPassword, canDiscoverJID, logEnabled, subject, rolesToBroadcast,
       useReservedNick, canChangeNick, canRegister, allowpm
FROM ofMucRoom WHERE serviceID=? AND (emptyDate IS NULL or emptyDate > ?

You mentioned that you had several thousands of MUC rooms. This query tries to load them all.

Please try to run this query on your database directly, and use the debugging tools provided by your database vendor to see why it is taking so long. Perhaps we should add an index, or perhaps you simply need stronger hardware?

Yes above mentioned query is giving timeout. Round Trip time from openfire server to MSSQL DB is approx 19 milliseconds. Is this causing issue?

Running the above query directly on SQL is showing the results instantaneously on the same millisecond. Also DB trace on openfire query returns instantaneously, but DB shows wait on ASYNC_NETWORK_IO.

I have no experience with MSSQL. I don’t know if the ASYNC_NETWORK_IO state is an indication of a problem. A quick Google query gives me some results that describe network-related issues. Is the network connection between Openfire and you database of good stability and throughput?

Is there a way i get detailed log on application side to get more clue? Network is stable except the round trip delay of 19ms.

this sounds like an interesting issue. I wish I had a way to recreate your issue to play around with it. I wouldn’t think 2000 rows would be a problem. if you run wireshark, do you see any network issues? I wish I had a way to test this on my end to see if I could reproduce your issue.

red color indicates the SQL Server and green color indicates the openfire server

Sql batch looks like

But in case of working trace I have seen something like below (doesnt have any Sql batch at all)

im not a sql person or a java person at all. I did do some googling, it looks like perhaps setFetchSize isn’t being used here, and defaults to 10? what does wireshark look like right before connection times out?

Here is the observation:

  1. we have chat rooms growing everyday at the rate of 50.
  2. every time when chat room is loading,openfire performs the following
  • Fetch all rooms in the ResultSet
  • for each Row -
    • it creates new instance of LocalMUCRoom
    • Load serviceID for the given subdomain from DB (why everytime?)
    • Load properties from ofMucServiceProp table (why everytime?)
    • Do the above 2 for each of the 12 room properties every time (why everytime for each of the properties?)
  • if 19 ms for each of the query RTT then if I have 3000 chat rooms in the table, rough time estimate to complete loading the room is 3000 x 19 x 12 x 2 = ~23 minutes
  1. 23 minutes to load 3000 MUC rooms are high I believe. Can we do something to reduce this time? Also does it make sense to cache serviceID and properties first time instead of loading from DB everytime?

Speedy, Default fetch size is 128. I have tried change it various values including the larger value to hold the entire muc rooms from DB. it doesn’t help out.