I’m not sure if this is a ‘Smack’ issue or an OpenFire server issue but I’m seeing it when using the PubSub Extension with a very simple client application. I’m just testing to see what I can do with PubSub so my use case is extremely simple. I have the latest versions of OpenFire (3.7.1) and of Smack (3.2.2).
The client application does the following:
Logs into the OpenFire server successfully with a valid user name and password
Creates a “PubSubManager” to use
Gets an existing leaf node which was created earlier
Fills in and submits a ‘ConfigureForm’ with the following data:
ConfigureForm form = new ConfigureForm(FormType.submit);
Sends the form over via the leaf node’s “sendConfigurationForm” method.
10 times a second, “sends” a message to the server (basic publishing)
For a while it will work without problems. I have written another client app that subscribes and listens for the messages and it receives them. I also see the messages flowing in the debug window. However, after a while errors will occur. The information shown is something like ("error code=“500” type=“wait” internal-server-error). This will continue for a while but then it will start working again… for a while. It flips between working and not working without restarting any processes.
So, does anyone know why it would work for a while and the oscillate between not working and working? Is there some limit that is being hit as far as throughput? Could it have anything to do with my server using the internal database engine (HSQL) or the fact my server is setup on “localhost” and not a separate machine? Any insight would be appreciated.
Are you continuously sending 10 messages per second? Do you have payload included (curious about the message size). Since your setup does work, I can only guess that it is possible that the server is overworked (although that isn’t that much throughput). The more likely cause is that you are running out of memory, as there is a known memory leak in pubsub, and if you are continuosly publishing 10 messages per second, you are probably running into that issue.
Here are a few things you can do to see what is happening:
- Monitor your memory usage on the server.
- Turn on the debug log in your server (via the admin console) and check for errors and stacktraces. I am curious as to where the problem may be occurring.
- The other thing to do is to increase the message timeout on the client to see if this helps.
- Change your test app to slow down delivery and see how this affects your problem.
Shouldn’t make a difference since he is using a small test app that is posting to pubsub, not via pep, unless there there is other usage of the server besides what was posted.
I am pretty sure that the pep problem is actually the exact same thing, which will hopefully be fixed shortly.
Thanks for the input and sorry for the delayed response. I have been trying lots of scenarios writing a test app and got side tracked. I had switched over from HSQL to MySQL and I was thinking that might have fixed it. However, I just switched back to HSQL to make sure it was still broken but now the test program (modified quite a bit) is working. :-/ So, I think it must have been something with the payload or other ‘form’ settings I was using. It was an odd scenario where it would work and then not work and then start working again all in mid stream without restarting any server or apps…
I’m testing throughput from a test publishing app to a subscriber app with small messages (like the “book payload examples” on the web site). I’m seeing somewhere around 2k messages per second. I have turned off debugging and also set setPersistentItems to false. This is all running on my local machine which has 12 gigs of Ram (I7 Extreme running at 3.6 Ghz). I’m sending or publishing the messages one by one since I want them to stay in the order they were sent (I had tried sending batches of 100 at a time but it seems the order in which they are received is not guaranteed). Anyway, does 2k messages a second seem like that is what I should expect the maximum throughput to be (assuming no more hardware, etc.)?
Lastly, I’m seeing an odd problem in that some times not all the messages are received by the subscriber. For example, if I send over 10k messages, the publisher sends them all (verified in my test code and the debug logs) but the subscriber may only receive the first 9600 or so (also seen in the debugger). I was thinking it might be that the debugger’s screen buffer might need to be ‘flushed’ but I added test code and verified that the subscriber is not getting all the messages. However, it isn’t 100% of the time as sometimes it receives them all. I will continue working on these these programs and if I can’t figure it out I would like to post my code so that someone else could run it and see if they get the same behavior.
Oh, and I did check the “Java Memory” in the OpenFire Admin Console and it seems to be ‘ok’ if I’m sending over 10k messages. At one point I was sending over 500k or more and the Java Memory did hit the maximum and then the server locked up. I’m not sure how to handle sending too much data from my end or how to find out programmatically if the server over taxed…
If you have persist items set to false, then the memory issue that I was mentioning will not be an issue.
Seems like you are doing some good performance testing, you may also want to try some more real world scenarios, such as multiple subscribers and multiple publishers (same or multiple nodes) assuming this matches with your end goal. Multiple subscribers will have some impact on your throughput. I couldn’t say what expectations you should have, as I have only done functional testing against the server.
I should warn you though, you should not code based on the assumption messages will be delivered in order. That would happen to be the case at this point in time simply due to the current implementation, but will probably change in the future to make the pubsub service more scalable.
As for your other problem, I can’t think of any reason for all the messages to not be delivered.
Good luck and you’re welcome.
As far as messages being delivered in order goes (for a particular node), I assumed that ‘publish()’ being asynchronous would not guarantee order but ‘send()’ being synchronous would… is that not correct? If not, the client application would have to add a sequencing field (order number, time stamp, etc.) and also add logic to ‘re-order’ the messages (assuming it cares about message order). This would be important for a ‘multi-player games’, time and sales / trading application, etc. where order of messages is important. The PubSub 0060 specification does not mention sequencing (as far as I can see) so it may be implementation specific.
I have a question about setting ‘persistent messages’ to false. When I publish a message the most recent value is recorded in the SQL table (I saw it on MySQL anyway) which is probably to ensure that if the server goes down that it can re-publish the last known value for a node. I was wondering for performance reasons if this behavior could be “turned off” so that the server would not write the value to SQL but would just keep it in a memory cache (transient mode only). I’m assuming this would have a good performance benefit as there would be no disk IO. The side effect would be that if the server went down it would not have a value when it came back up but that could be addressed by making publishers (in this mode) re-publish their data if they saw the server just came up. Anyway, that is just my two cents…
I will continue working on some test programs to see how far I can push the server.
The send vs. publish actually do the exact same thing. In the send case, the client makes a blocking call while in the publish case it does not. This means that if the publish fails there is no indication that it didn’t work. This is totally contained within Smack and has nothing to do with the server. The spec makes no guarentee of order, and it isn’t really practical to implement.
Think of this scenario.
- Client publishes message 1 to node A which has 100 subscribers.
- Client publishes message 2 to node B which has 1 subscriber.
You don’t want to force the second message to wait for the first one to be delivered as this would cripple a scalable solution.
Similar situations exist if the second message is sent to node A as well. There is no quarentee that all subscribers will recieve message 1 before message 2. In all likelihood a server will probably disseminate the work for delivery to subscribers across multiple worker threads, and there is nothing in the spec to enforce some kind of order or coordination between those threads to ensure one message gets delivered first.
Throw the whole thing into a cluster and it complicates matters even more. Any kind of ordering will have to occur at the application level.
I wouldn’t worry about the last item being persisted. Persistence is actually done on a timer, so it doesn’t actually persist every time the item is published.
Thanks for the detailed answer and for clearing all that up.