-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce 'quip' data building helpers. #134
Conversation
Yeah, that's a nice pattern - opt-in error checking. I suppose it does impose an additional burden on the API consumer to ensure that success-dependent logic doesn't get invoked if there's an error part-way, but that can be clarified with documentation and a big 'ol caveat emptor I suppose. |
…rors correctly. If there's any furhter action to take after the callback, it's necessary to recheck the error slot before taking that action; and of course we definitely don't want to overwrite the error if one had been set during the callback. I wonder if it would ever be useful to have a variant of these functions which *does* attempt to call `Finish`, etc, and thus still build a partial tree at the end even if it stopped on error midway. (Right now, if something stops midway, the final `Build` call will panic, and tell you you haven't finished all the assemblers.) It would be a bit more complicated though: we'd potentially need to accumulate more than one error. And in practice, when working on schema'd data, it would often still result in invalid results (anything other than optional fields, or type-level maps and lists, will fail some other logical rule if not fully filled in), which makes me wonder how often this would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this at length in chat - I personally think this is better than fluent, but there's still room to improve. I have an altered version in a branch at https://github.com/ipld/go-ipld-prime/tree/quip-mvdan.
We've agreed to merge both as separate packages, sort of like how quip will sit next to fluent even though both serve the same purpose. Longer term, we'll see which one we like best, and delete the others.
Once this is merged, I can post my PR and add my API to the benchmarks.
To add a bit more context - what I'm not a fan of in this API is:
|
Lots of individual things: - removed the "Begin*" functions; no need to expose that kind of raw operation when the callback forms are zero-cost. - renamed functions for consistent "Build" vs "Assemble". - added "Assign*" functions for all scalar kinds, which reduces the usage of "AbsorbError" (but also, left AbsorbError in). - renamed the ListEntry/MapEntry functions to also have "Assemble*" forms (still callback style). - while also adding Assign{Map|List}Entry{Kind} functions (lets you get rid of another callback whenever the value is a scalar). - added Assign{|MapEntry|ListEntry} functions, which shell out to fluent.Reflect for even more convenience (at the cost of performance). - moved higher level functions like CopyRange to a separate file. - new benchmark, for the terser form of working with scalars. (It's also evidently slightly faster, because fewer small function calls. Slightly surprising considering how much inlining we might expect, but, huh. Alright, surprise bonus; acceptable.) - example function updated to use the terser form. With these terseness improvements to handling of scalars, the overall SLOC count for using the quip system is now exactly on par with fluent. Varations on map key arguments (which could be PathSegment or even Node, in addition to string) still aren't made available this. Perhaps that's just okay. If you're really up some sort of creek where you need that, you can still just use the MapAssembler.AssembleKey system directly, which can do everything.
I've finished adding the rest of the helper methods that seem most important. (This includes making handling of scalar and leaf values take one function call instead of two, and no extra callbacks -- which saves a lot of keystrokes, and is a double boon because one of those two calls used to be that rather ugly general-purpose error gathering function.) I've also done a naming consistency pass. There are also now functions for working with NodeAssembler, but also a few for top-level usage that deal with the NewBuilder and Build dance for you, which shaves a few more lines off common usages. The extra methods for handling scalars probably is the biggest impact. The overall example usage is now this:
This still isn't perfect -- lots of Like @mvdan said, we're also still going to pursue other approaches and experiments in parallel. Hopefully something shakes out as the most preferred idiom after some time of usage, but there's no need to rush that process. As long as things begin and end in |
I developed a new pattern for building IPLD structures. This started as a little experiment, but worked so dang well I instantly want to ship it.
The basic idea is that every function in the package takes an
*error
as its first parameter, and gathers any errors during the function's operation in there, instead of returning them. If the*error
is already non-nil, it just returns without doing anything. This lets you write very linear code without a lot of fuss, and tends to "do the right thing" in over all flow control.A function that's the equivalent of each of the methods on
NodeAssembler
exists. (I've also sprinkled in a few other useful ones like "CopyRange" for lists.) These generally take aNodeAssembler
to work on as a parameter. For those functions that work on recursive kinds, I've done callback arguments, which causes tree structures to syntactically indent like trees (and also saves you the calling and error checking ofFinish()
functions), which I tend to think is good.Check out the example function around https://github.com/ipld/go-ipld-prime/pull/134/files#diff-46ca8eb0f742995be52debd745cb616db55f094a46c9182058da8b1d3b548ed9R15-R42 for a demonstration of how using these helpers looks.
Syntactically, this comes out a lot shorter than using the
NodeAssembler
manually and doing all the error checks in longhand. It's not quite as short as some of our other syntaxes (yet?) either, but it's most of the way there, and there's still room for improvement. Here's the linecounts summary, based on the examples that can be found in the benchmarks in this diff:18 SLOCupdate: now down to ~12 SLOC(
I think we can get this 18 down to the 12 of the others; there's some comments in the diff about avenues to explore. And then 12 is about as far as I'd push it on this example, apparently (on this example data, any less implies you're collapsing more than one tree structure element into each line of source, which is of dubious desirability for readability).update: Indeed we did get it down to 12!)("unmarshal" is arguably an odd thing to include in this comparison table, but I thought to myself: well, okay, what if we take that as an approximation for "how good could we get if we invented a whole DSL for this?". It's just interesting to ponder.)
The really good news about all this? It's also working with zero overhead. And I mean immeasurable. Here's a sample benchmark result:
Using the
quip
helpers is literally free. In fact, evidently, it's so amenable to compiler inlining and other compile-time optimizations that it's actually faster than the longhand direct usage ofNodeAssembler
. (Borderline hard to believe, I know, but, I think it's not even a sampling artifact, and the benchmark membench results also support this. I suspect the extra nested functions in thequip
style are letting the escape analyzer get quite clever and shift more things onto the stack, whereas my longhand function held onto more assemblers at once, in... ways that were harder to optimize. I dunno. Haven't looked that gift horse in the mouth too deeply, honestly.)By comparison, our other previous attempts at helper functions for assembling IPLD data had a 30% cpu time tax, and something like a 50% garbage tax. Those taxes meant you really couldn't have nice syntax and ideal performance at the same time... and that meant, in practice, a lot of code would get written twice: once the terse way, and a second time the fast way as soon as it showed up on an application level benchmark. Having to decide which of two styles to use based on whether you estimated you could stomach the performance hit was just an awful choice to have to make. With
quip
, that choice is gone: you can have nice syntax and fast at the same time.I've skimmed a little of the assembler output to see why its so fast. It seems that quip functions are generally inlinable. I suspect this means the "if err" branches can often be unrolled into one jump target per function in assembly. Also importantly, the anonymous functions used to make the syntactic tree structures (and save us from needing to make
Finish()
calls) get compiled into anonymous functions without closure state -- which is important because: it means they're not dynamically dispatched; and they don't do a sneaky allocation to create an anonymous struct that holds closure state (which can be one of the sources of allocs that can be the most syntactically difficult to track down). I suspect it's possible to write more complicated callbacks which would cause the closure alloc cost, but... if the most common usage doesn't, I'm happy enough with that.Worth noting that of the remaining allocations shown in the benchmark... most of them are probably from the fact we're using
basicnode
nodes to hold this data.basicnode
implementations are decidedly allocation-happy since they don't know anything about the structure of the data they're holding. If we implemented another benchmark against a codegen'd node that does know about the structure of the data, I suspect the alloc count would be in the single digits. Might even be able to get it down toone ortwo (which is about as low as I can imagine for a corpus like this one that involves two variable sized lists).There's probably more work to do here:
More helpers (especially around leaf values; see comments in diff) could make this even terser.update: There are now more helpers for scalar and leaf values.CopyRange
to keep adding, I'm sure!... but I think this is a pretty solid foundation to build more on.