-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML deserialization for fully whitespaced strings loads empty strings #109
Comments
…L with CDATA node.
The first commit is invalid |
After some more thinking on this issue I realized that the initial fix was not complete. Since XML CDATA doesn't escape '>' character then if an input string contains "]]>" then the resulting XML will become non-well-formed. This is restriction of CDATA and there is nothing we can do about it. However, we can escape '>' manually before constructing CDATA. In this case we serialize to a string ">" in XML. Note: this manual processing also requires that we escape '&' since if user string contains ">" it should not become ">" after deserialization. Test is also added. So, shall I create a new pull request or you want 2 commits collapsed to 1? |
No need to make a new pull request. I'll test this on all of our compilers and likely merge in the near future. Note to self: add some documentation to relevant places for this change. |
This change only partially fixes the problem. It doesn't work for serialization/deserialization of char type if char is any of whitespaces. :( Probably it is not worth merging to main repo. Support of xml:space attribute should be fixed in RapidXML instead. This will fix all problems with whitespace in any types. I think I can do this but not immediately. Will you accept changes to RapidXML? |
Yes we've already made some small changes to RapidXML as well as RapidJSON internally - we are ideally looking to move away from RapidXML to something that also supports streaming, but in the interim any fixes can be directly applied to it. |
…ntain whitespaces
The fix I committed today should work. I added parse_trim_whitespace to parse flags in cereal which should make XML parsing a bit more flexible but I don't insist on having it. Concerning parse_normalize_whitespace, it also works but since the logic for adding xml:space=preserve in saveValue() is rather naive (it checks only the first and last characters) then for strings like "some text" (with 3 spaces) it will not add xml:space attribute and thus after loading we obtain "some text" (with one space character). Concerning moving away from RapidXML. From my outsider's point of view what RapidXML does is just enough for cereal and taking into account its speed and simplicity it would be a pity if you change to something else. If cereal is only missing a streaming support then it should be possible to add it to RapidXML. In particular right now I'm thinking on how to add to RapidXML loading of XML from std::istream. I need this because in the project that I work on in some cases I need to deserialize 100MB+ XML files. Well, this is not the same to serialization streaming (as you were discussing it in other threads) but it also can be useful. |
Accepted your changes to develop, see 8efbc0c. |
When I deserialize the following XML "<value> </value>" to std::string I get empty string. This happens because RapidXML ignores whitespaces. This could be fixed with "<value xml:space='preserve'> </value>" during serialization but this is not supported by RapidXML during parsing. What RapidXML supports is CDATA. So, during serialization std::strings could be saved as "<value><![CDATA[ ]]></value>" and RapidXML can parse this construct.
The text was updated successfully, but these errors were encountered: