Introduce `HashCompressor` for name compression #396

bal-e · 2024-09-23T12:37:14Z

I noticed that 'TreeCompressor' will likely have a lot of overhead, as it has to copy each label into its internal tree and it maintains a separate hashmap for every known name (where each name likely has one or two parents at the most, even though hashmaps have a large capacity).

'HashCompressor' is an alternative name compressor built on top of the 'hashbrown' crate, which offers lower-level hash table access that lets entries indirectly reference the built message for hashing and equality checks. It maintains a single hashmap, it should be faster, and it does not require as much memory.

The implementation here makes it clear that domain names should have been stored in reverse order, at least in the wire format. A decent chunk of the logic goes into correctly reversing the domain name. In fact, the implementation likely quadratic runtime because of this (at least when 'Name' or 'RelativeName' are used, as backward iteration on them has quadratic runtime).

While 'std' essentially exposes the same data structures as 'hashbrown', it does not currently offer a way to perform low-level hash table access (although there is a nightly "raw entry" feature). This may change in the future.

If not for the added 'hashbrown' dependency, I would suggest deprecating 'TreeCompressor' entirely.

Cargo.toml

src/base/message_builder.rs

bal-e · 2024-10-07T14:53:23Z

@ximon18: Do you think this is a breaking change? It doesn't remove any existing APIs, we just discussed that TreeCompressor could be replaced by this for the next breaking-change release. I'd like to get it merged soon.

ximon18 · 2024-10-07T15:12:48Z

I'll take a look as soon as I can.

ximon18 · 2024-10-07T21:10:18Z

@ximon18: Do you think this is a breaking change? It doesn't remove any existing APIs, we just discussed that TreeCompressor could be replaced by this for the next breaking-change release. I'd like to get it merged soon.

It seems to be purely additive so I don't see why it would be a breaking change. The only observation I had when scanning it is that I saw an assert!() call which is not documented in the RustDoc as a known way the code can panic... and ideally it wouldn't panic on user input, especially for a function that is able to error out. I can't tell if hashbrown as an added dependency in some way limits the build targets compared to now, but given that hashbrown is used by the Rust standard library I assume it is fine. I don't know how you chose the feature flags to use for hashbrown, I don't see them documented in the crate docs.

Ah, I see you are asking because @partim and and I added the breaking label when looking through the list of PRs, and we at the time were talking about it replacing the current TreeCompressor but I see from your comment that is not the plan. I'll remove the label.

bal-e · 2024-10-08T11:11:41Z

@ximon18 the assert can only fail if somebody implements ToName incorrectly, which is a programming error and not something external users can cause. The hashbrown feature flags are there to disable some unnecessary default features to speed up compilation.

I noticed that 'TreeCompressor' will likely have a lot of overhead, as it has to copy each label into its internal tree and it maintains a separate hashmap for every known name (where each name likely has one or two parents at the most, even though hashmaps have a large capacity). 'HashCompressor' is an alternative name compressor built on top of the 'hashbrown' crate, which offers lower-level hash table access that lets entries indirectly reference the built message for hashing and equality checks. It maintains a single hashmap, it should be faster, and it does not require as much memory. The implementation here makes it clear that domain names should have been stored in reverse order, at least in the wire format. A decent chunk of the logic goes into correctly reversing the domain name. In fact, the implementation likely quadratic runtime because of this (at least when 'Name' or 'RelativeName' are used, as backward iteration on them has quadratic runtime). While 'std' essentially exposes the same data structures as 'hashbrown', it does not currently offer a way to perform low-level hash table access (although there is a nightly "raw entry" feature). This may change in the future. If not for the added 'hashbrown' dependency, I would suggest deprecating 'TreeCompressor' entirely.

The feature flag to enable it has been changed to be more specific.

bal-e added the enhancement New feature or request label Sep 23, 2024

bal-e requested review from ximon18 and tertsdiepraam September 23, 2024 12:37

bal-e self-assigned this Sep 23, 2024

bal-e force-pushed the hash-name-compressor branch 3 times, most recently from b09d0e8 to f31e229 Compare September 23, 2024 13:08

partim reviewed Oct 2, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

src/base/message_builder.rs Outdated Show resolved Hide resolved

bal-e force-pushed the hash-name-compressor branch from f31e229 to baab897 Compare October 3, 2024 07:51

bal-e requested a review from partim October 3, 2024 08:22

ximon18 added breaking A PR that includes a breaking change. and removed enhancement New feature or request labels Oct 7, 2024

ximon18 removed the breaking A PR that includes a breaking change. label Oct 7, 2024

bal-e added the needs-review label Oct 10, 2024

bal-e requested a review from a team October 16, 2024 07:33

bal-e added 4 commits October 16, 2024 09:39

[base/message_builder] Mention 'HashCompressor' in API docs

b1028b4

Add internal docs for HashCompressor

c8034cf

The feature flag to enable it has been changed to be more specific.

Move 'HashCompressor' under the 'std' feature

072474a

bal-e force-pushed the hash-name-compressor branch from b918f97 to 072474a Compare October 16, 2024 07:41

partim approved these changes Oct 21, 2024

View reviewed changes

partim merged commit 97c0036 into main Oct 21, 2024
26 checks passed

partim deleted the hash-name-compressor branch October 21, 2024 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `HashCompressor` for name compression #396

Introduce `HashCompressor` for name compression #396

bal-e commented Sep 23, 2024

bal-e commented Oct 7, 2024

ximon18 commented Oct 7, 2024

ximon18 commented Oct 7, 2024 •

edited

Loading

bal-e commented Oct 8, 2024

Introduce HashCompressor for name compression #396

Introduce HashCompressor for name compression #396

Conversation

bal-e commented Sep 23, 2024

bal-e commented Oct 7, 2024

ximon18 commented Oct 7, 2024

ximon18 commented Oct 7, 2024 • edited Loading

bal-e commented Oct 8, 2024

Introduce `HashCompressor` for name compression #396

Introduce `HashCompressor` for name compression #396

ximon18 commented Oct 7, 2024 •

edited

Loading