Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emitter support for integer bases and block strings #10

Merged

Conversation

cfrantz
Copy link

@cfrantz cfrantz commented Apr 22, 2022

  1. Add a Meta node type for emitter options which are not
    round-trip-able.
  2. Support emitting nodes with different integer bases.
  3. Support emitting nodes with different string formatting options.

Signed-off-by: Chris Frantz frantzcj@gmail.com

1. Add a `Meta` node type for emitter options which are not
   round-trip-able.
2. Support emitting nodes with different integer bases.
3. Support emitting nodes with different string formatting options.

Signed-off-by: Chris Frantz <frantzcj@gmail.com>
@cfrantz
Copy link
Author

cfrantz commented Apr 22, 2022

This ports some of my formatter extensions on top of your comment support. As I mentioned in dtolnay/serde-yaml#234, I thought I should refactor my extensions to the yaml-rust library a bit.

I don't have a use case for round-tripping these things through the library as you do for comments, so I didn't bother to try to make the parser preserve this information. The Meta ops are strictly for customizing output formatting.

@alevinval
Copy link
Owner

alevinval commented Apr 22, 2022

Hey @cfrantz , thanks for the PR. I'll try to merge stuff on Sunday when I come back. On another note... I was checking serde-yaml for the first time now because I was thinking about how to integrate the comment support from my branch. But I'm a bit discouraged because I've realized that either it's very hard, or impractical to do. For instance, I was thinking on starting very basic, having a struct with field headline: Yaml and make it of type Yaml::Comment, however when we are here it would need to not call to_yaml but directly insert into the hashmap and put the headline value as the key of the map (and discard the key). But this is basically impossible right now since all we have is a &T and no way to determine if it's an instance of Yaml. If you have more knowledge about serde-yaml... do you think there's a way to integrate my branch there?

@cfrantz
Copy link
Author

cfrantz commented Apr 23, 2022

I investigated extending serde-yaml quite a bit when I implemented my first attempt in dtolnay/serde-yaml#234. My conclusion was that I had to create a new trait to express what I wanted to do and then implement that trait on every Serializable struct for which I also wanted my comments/blocks/alt integer bases. I did this by writing a procedural macro to automatically implement the new trait (which I called YamlFormat).

For comments, I had implemented three options:

#[derive(Serializable, YamlFormat)]
struct Foo {
  #[yaml(comment="A fixed string")]
  pub a: u32,

  #[yaml(comment=_another_field)]
  pub b: u32,
  #[serde(skip)]
  _another_field: String,

  #[yaml(comment=_some_function()]
  pub c: u32,
}

impl Foo {
  fn _some_function(&self) -> String { "some computable whatever".to_string() }
}

This would yield the following output when serialized:

---
# A fixed string
a: 12345
# The contents of _another_field as this comment.
b: 67890
# some computable whatever
c: 54321

In order for the serializer in serde-yaml to know that it can call members of my trait, I need either:

  • Some form of trait specialization (Tracking issue for specialization (RFC 1210) rust-lang/rust#31844) so that I can provide a blanket do-nothing default implementation of YamlFormat and specialized implementations for structs with comment/format info.
  • A way to identify which types implement YamlFormat so I can safely cast to &dyn YamlFormat and invoke the functions.

The former is not in stable yet, so I had to opt for the latter. The biggest problem I had was that std::any::TypeId::of<T>() requires the type T to be 'static, but serde-yaml does not. I ended writing my own "type-id" function (see here), but I fear that it will run afoul of the optimizer.

Assuming my "type-of" does run afoul of the optimizer, the way attaching comments during serialization works is to have the proc-macro derive YamlFormat on appropriately annotated structs and then to register their type-ids at program start-up. I then modified serde-yaml to query the database (hashmap) of registered structs that implement YamlFormat when it serializes. If the struct implements YamlFormat, it can be safely cast to &dyn YamlFormat and the comment information can be extracted and inserted into the Yaml document as it is constructed.

I'd planned on reworking my serde-yaml PR to use your fork for yaml-rust. There may be a few more mods I want to make to yaml-rust. One that comes to mind is an Integer variant which can hold a 128-bit number (probably in the Meta enum). I don't think the Yaml Spec makes any strong statements about the size of integers other than "In general, integers representable using 32 binary digits should safely round-trip through most systems." As you know, yaml-rust represents integers as i64 and serde-yaml has special code to deal with 128-bit integers.

I'll send you an update for 128-bit support and when I refactor my fork of serde-yaml onto your fork.

@alevinval alevinval merged commit 789c554 into alevinval:feature-comment-support Apr 23, 2022
@alevinval
Copy link
Owner

alevinval commented Apr 23, 2022

Yeah, I was trying Any and downcasting but when I saw the 'static requirement I kinda gave up...
Ok, if you can pull that off, it will be amazing. And we are only talking serialization right? serde-yaml deserialization with comments will be a totally different beast, much, much harder, since it will need to somehow keep the yaml-rust document loaded (which will contain the Yaml::Comment nodes at the appropriate places, as map keys or array entries) and then somehow manage to re-introduce those when serializing a struct, but at the moment serialization is not re-using the original loaded YAML. And since you don't know where the user may add comments, you really need to see the original loaded YAML document to re-introduce the comments during serialization. Food for thought.

I've merged the two outstanding PRs. Now that I know you are working on the fork, let's work on master.

For the moment comment support is only available when cfg(test) is true.
I wonder if this fork should have it enabled by default, and perhaps add some builder pattern to YamlLoader to tune things, and have like disable_comments() etc...

I was thinking also of adding some configurations on the emitter to support human-friendly outputs... for example, adding a line break between top-level nodes or something like that, so you can easily navigate by paragraph.

@cfrantz
Copy link
Author

cfrantz commented Apr 24, 2022

Yes, currently my changes apply only to Serialization.

I'm curious about your use case for preserving comments on deserialzation. Are you writing a formatter, or do you have some other case, such as meta-information being conveyed in comments?

@alevinval
Copy link
Owner

Now that you ask me this question... You're making me wonder if it's really worth it. Let's say you have a tool that manipulates a configuration file. The user may annotate with a comment a section, for other people reading the file. If you were to run the tool such that it has to write the config again, you will lose these comments (eg https://github.com/alevinval/vendor/blob/master/example/.vendor.yml)

To be honest... thinking well about this, it's a seriously complicated problem... just having a vec of entries that you keep sorted, if a new entry is added you'd need to re-sort the entries but also keep into account the comments. Maybe this is just a terrible idea after all.

If you were to support such a case, then you'd probably be writing a custom converter from your struct to the Yaml tree rather than relying onserde-yaml lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants