Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trustfall_core: Optimize serde_type_serializer #297

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

nvzqz
Copy link
Contributor

@nvzqz nvzqz commented Jun 1, 2023

Rather than collect a new BTreeMap with Strings for Types, this instead will:

  • Serialize over the existing BTreeMap
  • Create shorter-lived Strings

Rather than collect a new `BTreeMap` with `String`s for `Type`s, this
instead will:
- Serialize over the existing `BTreeMap`
- Create shorter-lived `String`s
@nvzqz nvzqz force-pushed the pr/serde_variables_serializer branch from 1b3656e to a8ba954 Compare June 1, 2023 14:42
Copy link
Owner

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@obi1kenobi obi1kenobi enabled auto-merge (squash) June 1, 2023 18:55
@nvzqz
Copy link
Contributor Author

nvzqz commented Jun 1, 2023

Is there a benchmark that exercises this code path? Just curious to know what the numbers might be.

@obi1kenobi obi1kenobi merged commit eeeafe5 into obi1kenobi:main Jun 1, 2023
@nvzqz nvzqz deleted the pr/serde_variables_serializer branch June 1, 2023 19:04
@obi1kenobi
Copy link
Owner

Serializing IR is a rare use case, so this code is primarily used in the test suite. It does the usual (and clever!) compilers trick that allows it to run end-to-end tests but with all stages in parallel: the repo contains input queries and snapshots of the behavior/output produced by every stage (parser, frontend, interpretation, query outputs).

Each snapshot is the output of one stage and the input to the next, so we have tests for "each stage when fed proper input produces proper output" and the composition of those tests is an end-to-end correctness guarantee as well.

This is the primary place where we serialize and deserialize IR right now. The serialization path is used when generating a new IR file either for a new test input or to update an existing one.

@obi1kenobi
Copy link
Owner

The best thing to benchmark is the contents of ./scripts/recreate_all_ir.sh but the variables portion of that is relatively small: there are 0-4 variables per query and a lot more other IR content. So even that isn't a great benchmark target because it's noisy and large compared to the effect being measured.

But I'm sure there's more low-hanging fruit to be picked.

For example, DataContext in the interpreter gets cloned a lot, and I suspect that making its vertices / folded_contexts / folded_values / imported_tags maps be copy-on-write and structurally shared instead of deep-cloned would produce a serious perf improvement. It's been on my todo list for a while but I've never gotten around to it.

I wouldn't touch the other fields, the rest are either about to be changed to a different representation altogether or removed entirely or are almost-always empty so cloning is cheap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants