Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML deserialization for fully whitespaced strings loads empty strings #109

Closed
volo-zyko opened this issue Jul 28, 2014 · 7 comments
Closed
Milestone

Comments

@volo-zyko
Copy link
Contributor

When I deserialize the following XML "<value> </value>" to std::string I get empty string. This happens because RapidXML ignores whitespaces. This could be fixed with "<value xml:space='preserve'> </value>" during serialization but this is not supported by RapidXML during parsing. What RapidXML supports is CDATA. So, during serialization std::strings could be saved as "<value><![CDATA[ ]]></value>" and RapidXML can parse this construct.

volo-zyko added a commit to volo-zyko/cereal that referenced this issue Jul 28, 2014
@volo-zyko
Copy link
Contributor Author

The first commit is invalid

volo-zyko added a commit to volo-zyko/cereal that referenced this issue Jul 31, 2014
@volo-zyko
Copy link
Contributor Author

After some more thinking on this issue I realized that the initial fix was not complete. Since XML CDATA doesn't escape '>' character then if an input string contains "]]>" then the resulting XML will become non-well-formed. This is restriction of CDATA and there is nothing we can do about it. However, we can escape '>' manually before constructing CDATA. In this case we serialize to a string "&gt;" in XML. Note: this manual processing also requires that we escape '&' since if user string contains "&gt;" it should not become ">" after deserialization.

Test is also added.

So, shall I create a new pull request or you want 2 commits collapsed to 1?

@AzothAmmo
Copy link
Contributor

No need to make a new pull request. I'll test this on all of our compilers and likely merge in the near future.

Note to self: add some documentation to relevant places for this change.

@volo-zyko
Copy link
Contributor Author

This change only partially fixes the problem. It doesn't work for serialization/deserialization of char type if char is any of whitespaces. :( Probably it is not worth merging to main repo. Support of xml:space attribute should be fixed in RapidXML instead. This will fix all problems with whitespace in any types. I think I can do this but not immediately.

Will you accept changes to RapidXML?

@AzothAmmo
Copy link
Contributor

Yes we've already made some small changes to RapidXML as well as RapidJSON internally - we are ideally looking to move away from RapidXML to something that also supports streaming, but in the interim any fixes can be directly applied to it.

@AzothAmmo AzothAmmo added this to the v1.2.0 milestone Aug 12, 2014
volo-zyko added a commit to volo-zyko/cereal that referenced this issue Aug 18, 2014
volo-zyko added a commit to volo-zyko/cereal that referenced this issue Aug 18, 2014
@volo-zyko
Copy link
Contributor Author

The fix I committed today should work. I added parse_trim_whitespace to parse flags in cereal which should make XML parsing a bit more flexible but I don't insist on having it. Concerning parse_normalize_whitespace, it also works but since the logic for adding xml:space=preserve in saveValue() is rather naive (it checks only the first and last characters) then for strings like "some   text" (with 3 spaces) it will not add xml:space attribute and thus after loading we obtain "some text" (with one space character).

Concerning moving away from RapidXML. From my outsider's point of view what RapidXML does is just enough for cereal and taking into account its speed and simplicity it would be a pity if you change to something else. If cereal is only missing a streaming support then it should be possible to add it to RapidXML. In particular right now I'm thinking on how to add to RapidXML loading of XML from std::istream. I need this because in the project that I work on in some cases I need to deserialize 100MB+ XML files. Well, this is not the same to serialization streaming (as you were discussing it in other threads) but it also can be useful.

@AzothAmmo AzothAmmo modified the milestones: v1.2.0, v1.1.0 Aug 19, 2014
@AzothAmmo
Copy link
Contributor

Accepted your changes to develop, see 8efbc0c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants