-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support semantic error messages by supporting deserializing spans #302
Comments
(I rolled by this issue because my project uses toml-edit for it's higher-quality serialization, but its low-quality spans for deserialization means I've been experimenting with preferring toml-rs for the other side of the equation, even though toml-edit/easy is explicitly designed to preserve more info, alas!) I've been using this API in toml-rs in anger a lot this week, so I figured I'd chime in with some details on how it works, and some issues it has: Spanned defines four magic strings: pub(crate) const NAME: &str = "$__toml_private_Spanned";
pub(crate) const START: &str = "$__toml_private_start";
pub(crate) const END: &str = "$__toml_private_end";
pub(crate) const VALUE: &str = "$__toml_private_value"; And then uses a custom visitor and does the following: let visitor = SpannedVisitor(::std::marker::PhantomData);
static FIELDS: [&str; 3] = [START, END, VALUE];
deserializer.deserialize_struct(NAME, &FIELDS, visitor) The deserializer then, at various points in the code, checks for that exact state and wraps itself in a SpannedDeserializer that emits the 3 magic fields, the first 2 being the span info: if name == spanned::NAME && fields == [spanned::START, spanned::END, spanned::VALUE] {
let start = 0;
let end = self.input.len();
let res = visitor.visit_map(SpannedDeserializer {
phantom_data: PhantomData,
start: Some(start),
value: Some(self),
end: Some(end),
});
return res;
} That the SpannedVisitor expects: fn visit_map<V>(self, mut visitor: V) -> Result<Spanned<T>, V::Error>
where
V: de::MapAccess<'de>,
{
if visitor.next_key()? != Some(START) {
return Err(de::Error::custom("spanned start key not found"));
}
let start: usize = visitor.next_value()?;
... I've been trying to use this in anger a fair bit this week and it seems to largely work for "normal" derived types but is more of a mess with custom deserialization. Because it relies on random places in the code detecting the magic fields, if you ever invoke it in a way it wasn't expecting (like variants on the "string or struct" pattern in serde's docs) then you get nasty "expected spanned" from it not creating the SpannedDeserializer and emitting the magic fields. I'm not sure if those problems are "I'm bad at serde" or "this design is buggy" or "serde makes this impossible to do robustly". The actual Spanned type also has a pretty suboptimal API. It's very opaque, and you have to explicitly call a method to get the wrapped value, so retroactively adding spans to your existing types is extremely disruptive to code. Thankfully because it communicates with magic fields, you're able to "fork" just Spanned. I did this to make it more like I am... incredibly bad at serde stuff so I'm not sure if I could help with bringing this stuff up, but here's a brain dump. |
Thanks for the brain dump! These details will be important when implementing this! |
Some extra notes from landing this:
|
I'm pretty sure the answer is "serde makes this impossible to do robustly". I looked into span support for serde things a bit ago in serde-rs/serde#1811 (comment), and as far as I can tell there's no way to propagate span info through serde without resorting to wild hacks like what It's possible to extend serde's API to better support spans. Unfortunately, with the designs I've found so far, you generally have two choices:
|
I wonder if it'd make sense to check that the magic strings are the same instead of checking if they are equal, that means comparing the address rather than contents. Unless I'm missing something this should make it more robust (if anyone for whatever reason uses these strings in the input it shouldn't confuse the parser). If there was some way to prevent the linker from merging same symbols they could all be empty strings which would be even nicer but I didn't find how to do that. |
Note that I'm keeping the scope of this issue fairly limited to "do what |
See toml::Spanned for inspiration
The text was updated successfully, but these errors were encountered: