-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent bug when reusing type ids in hashing #1075
Prevent bug when reusing type ids in hashing #1075
Conversation
What I meant was that And then "internally" (and not exposed to the rest of the crate) one could use a Maybe some stuff would end up in a different module or something to support that. And the point of doing this would just be to guarantee that But anyways, maybe that's irrelevant now and I'll have a gander at the approach you took! |
metadata/src/utils/validation.rs
Outdated
} | ||
|
||
cache.insert(id, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, I like the idea of caching the hashes like this in general; good thinking!
There's maybe a small chance that the first 4 bytes of type_hash
are actually [0,0,0,0]
, which would make that type look like the special recursive type that leads to an early return.
I also wonder; is it better to store the full [u8; HASH_LEN]
type hash in the cache? We use more memory with this, but then don't need to hash
it, and it means that the cache is truly caching the computed hashes for various types rather than caching something that's not actually the original hash we computed for a type. It feels "safer" to reuse the cache in other places if we know that it is storing the actual computed hashes over something a bit different.
Combining the two thoughts above, I wonder whether the cache should look like:
type Cache = HashMap<u32, CachedValue>;
enum CachedValue {
Recursive,
Hash([u8; HASH_LEN])
}
Where we insert CachedValue::Recursive
initially and update that to a final hash when done. (if get_type_hash
looks in the cache and sees a CachedValue::Recursive
it can return with eg MAGIC_RECURSIVE_HASH_VALUE
, which is functionally the same as right now where it would return with hash(0)
).
I'm suggesting a new enum above over using Option
just because it would be a little more self-documenting, but either works really!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could definitely do that, thought about something like that as well, it is more secure at the cost of a little bit more memory. I will change it! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be great, and then perhaps also we could add a test which uses the cache across a couple of invocations of hashing the same type to comfirm that indeed you get the same hash back both times (first time caches the hash, second time just returns it straight off). This should even work if the type has a recursive definition in which is cool :)
Ultimately whether we pass a new cache or the same cache in to multiple calls, the results should all be the same.
// if the cache has an entry for the id, just return a hash derived from it. | ||
// if the type has not been seen yet, mark it with `CachedHash::Recursive` in the cache and proceed to `get_type_def_hash()`. | ||
// -> During the execution of get_type_def_hash() we might get into get_type_hash(id) again for the original id | ||
// -> in this case the `CachedHash::Recursive` provokes an early return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it's more like "in this case, the CachedHash::Recursive
returns a fixed "magic" hash rather than recursing into the same type again" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just one tiny suggestion to add a test showing that the cache is cahcing actual hashes and thus can safely be reused across multiple calls to hash the same types or whatever, but otherwise looks good to me!
let CachedHash::Hash(a_cache_hash) = cache[&a_type_id] else { panic!() }; | ||
let CachedHash::Hash(b_cache_hash) = cache[&b_type_id] else { panic!() }; | ||
|
||
assert_eq!(a_hash, a_cache_hash); | ||
assert_eq!(b_hash, b_cache_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check that a_hash == b_hash
; imo that's more iomportant than looking into the cached stuff :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want the hashes to be the same, because A
and B
are different structs, with different fields:
struct A { pub b: Box<B> }
struct B { pub a: Box<A> }
Maybe I understood you wrong, what did you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry yeah, I mean, it'd be nice to have a test like:
let a_type_id = registry.register_type(&meta_type::<A>()).id;
let a_hash1 = get_type_hash(®istry, a_type_id, &mut cache);
let a_hash2 = get_type_hash(®istry, a_type_id, &mut cache);
assert_eq!(a_hash1, a_hash2);
ie, we want to check that sharing the cache across different uses doesn't change the hashes that are produced for some type.
* practice TDD * implement a hashmap 2-phases approach * use nicer types * add test for cache filling * adjust test --------- Co-authored-by: James Wilson <james@jsdw.me>
* practice TDD * implement a hashmap 2-phases approach * use nicer types * add test for cache filling * adjust test --------- Co-authored-by: James Wilson <james@jsdw.me>
* remove config, doc tests are expected to fail (book not adjusted yet) * make doc tests pass * Prevent bug when reusing type ids in hashing (#1075) * practice TDD * implement a hashmap 2-phases approach * use nicer types * add test for cache filling * adjust test --------- Co-authored-by: James Wilson <james@jsdw.me> * remove the unnecessary intos --------- Co-authored-by: James Wilson <james@jsdw.me>
* routing and signing example * cliipy fix * submitting extrinsics * change order of lines * Skip call variants if there aren't any (#980) Co-authored-by: Niklas Adolfsson <niklasadolfsson1@gmail.com> * Tidy up some metadata accessing (#978) * Reduce some repetition when obtaining metadata pallets/runtime_traits * make them pub * fix docs and clippy * Bump tokio from 1.28.1 to 1.28.2 (#984) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.1 to 1.28.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.28.1...tokio-1.28.2) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump regex from 1.8.2 to 1.8.3 (#986) Bumps [regex](https://github.com/rust-lang/regex) from 1.8.2 to 1.8.3. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](rust-lang/regex@1.8.2...1.8.3) --- updated-dependencies: - dependency-name: regex dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump quote from 1.0.27 to 1.0.28 (#983) Bumps [quote](https://github.com/dtolnay/quote) from 1.0.27 to 1.0.28. - [Release notes](https://github.com/dtolnay/quote/releases) - [Commits](dtolnay/quote@1.0.27...1.0.28) --- updated-dependencies: - dependency-name: quote dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump proc-macro2 from 1.0.58 to 1.0.59 (#985) Bumps [proc-macro2](https://github.com/dtolnay/proc-macro2) from 1.0.58 to 1.0.59. - [Release notes](https://github.com/dtolnay/proc-macro2/releases) - [Commits](dtolnay/proc-macro2@1.0.58...1.0.59) --- updated-dependencies: - dependency-name: proc-macro2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * restrict sign_with_address_and_signature interface (#988) * changing js bridge * dryrunresult ok * submitting extrinsic working * tiny up code and ui * formatting * remove todos * support tip and mortality * Prevent bug when reusing type ids in hashing (#1075) * practice TDD * implement a hashmap 2-phases approach * use nicer types * add test for cache filling * adjust test --------- Co-authored-by: James Wilson <james@jsdw.me> * small adjustment * Merge branch 'master' into tadeo-hepperle-browser-extension-signing-example * fix lock file * tell users how to add Alice account to run signing example * adjust to PR comments * fmt --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: James Wilson <james@jsdw.me> Co-authored-by: Niklas Adolfsson <niklasadolfsson1@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
fixes #1066.
I first thought a simple restructuring approach like @jsdw mentioned in #1066 would work, but correct me if I'm wrong, I think it cannot solve the recursive type problem, because creating a new HashSet in
get_type_id()
will eventually run into an infinite loop.Instead of using a
HashSet
for keeping track of the visited ids, we can use aHashMap
that saves some info about if the type has been visited yet and also if a complete hash has been calculated yet.I added a test to make sure it works (Without my code changes, it fails).