Bug in PubSub ItemProvider in context of escaped XML

The current implementation of the ItemProvider incorrectly handles items that contain escaped XML sequences. This causes the toXML method to return invalid XML content. Subsequently, the validation of the returned XML may fail.

E.g., when receiving an item that looks like the following

<escapedTag/>

the toXML() method returns

which is wrong.

It seems that the problem is in the ItemProvider class, which parses the received content using the XML pull parser and reassembles the XML in a StringBuffer.

…

else if (parser.getEventType() == XmlPullParser.TEXT)

{

payloadText.append(parser.getText());

}

…

The parser.getText() fetches the text content of the mywrapper element and transforms the escape sequences. However, the content is not properly escaped again when added to the StringBuffer.

If I get you correctly, then changing

payloadText.append(parser.getText());

to

payloadText.append(StringUtils.escapeForXML(parser.getText()));

should fix this. Or am I misreading your bug report?

1 Like

Yes, this should fix this issue. However, I think it is also necessary to escape the attribute values in the same way.

I wonder if there is no better way to implement this ItemProvider like storing the position of the parser at the starting and the closing tag of the item element and substringing the content in between in one step. In my application I parse the string that is returned by toXML() using a DOM parser (That’s why I noticed that the escape characters were discarded during the processing). Having this implementation in the background it seems that I am parsing the content of the item twice.

I just realized that the escapeForXML() should be done when the String is transformated to XML. Therefore SimplePayload should make the escapeForXML() call, ideally in the constructor. Then it’s also less like “parsing it twice”.

Logged as SMACK-546

I am not sure if this will fix the problem. You’ll have to escape only the content of the XML elements and attribute values, but not the whole payload of the item.

On a second look, it seems that you are right. I’ve pinged robin, the author of ItemProvider. Let’s hear what he thinks. But I assume we have to go with the escapeForXML(parser.getText()) combo.

I think Flow’s proposal should fix the problem. Escaping the getText() call in the provider makes sense and I believe this should always return the string as it was delivered.