-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Why #[deriving(IterBytes)], not #[deriving(Hash)]? #8038
Comments
Because deriving(Hash) won't be deriving ... a hash function. It'll be deriving a way-to-iterate-over-your-fields to feed to the hash function. The hash function is in a central location and very carefully chosen. That said, we could probably rename it somehow. A lot of the ugliness in the way hash is implemented had to do with missing features in default methods and inheritence, which I think all work now. |
As an aside, I still think |
They're certainly related. Not sure if identical? I guess encoding is supposed to hit all the "meaningful" bytes and skip the "meaningless" ones. Maybe we could build a default method on the encode trait that funnels all the other encoding methods (including a prefix byte for each tycon) into a byte stream? Essentially a "default encoder" that delegates to an inner function with a byte slice? Or, I guess, hash could provide that encoder itself. But I bet it's useful elsewhere. |
To put it another way, it seems like iterbytes could be trivially implemented in terms of encodable, but encodable provides higher-level information. The only reason I see to separate them is that perhaps there are bits of data that one wants to encode but which do not affect the value when used as a hash; but I think this is subtle and a bit error-prone. I would personally be inclined to separate out the "hash key" value from the rest, in such cases. Also, this last bit presumes that IterBytes is only used for hashing, which is currently true afaik, but not obvious from its name. I guess I personally would rather just have one way of serializing types into bytes (the Encodable trait) and then use that for multiple purposes than to have many distinct serialization traits. |
By implementing |
@nikomatsakis: It shouldn't be that hard to reimplement the However, it would be nice for us to add a raw binary |
@erickt I'm not sure what you mean by "doesn't guarantee ordering"? In my head, at least, most of the methods are in fact ordered (i.e., a tuple must try to encode each of its elements in order, a struct each of its fields, etc), but I guess the higher-order methods for emitting a map might not be. I was envisioning creating a "HashEncoder" that just hashed data in response to calls to the encoder methods. Still, now that we have deriving modes, it doesn't hurt to have them be separate interfaces, and I can imagine it might allow for looser or more efficient encoding around maps and other data structures that are not obviously ordered. I cared about this more when there was no deriving for iterbytes. |
@nikomatsakis: Yeah, I'm talking about the higher ordered methods for emitting a map. For example:
That hypothetical If we wanted to prevent someone from using a
We could then either use
Then to use it, we just write:
This would save us from having to write another deriving syntax extension. |
(fwiw, writing new |
@huonw: It's true, but I worry that |
They're normal syntax extensions, so #1762 should cover that (if/when that happens). |
This is pretty messy. Should be fixed. |
I started playing around with a fix for this last night. Since we only seem to use
All hashable traits need to implement The other approach I had was to have a much lighter weight
This is a much smaller change, but we do end up duplicating a lot of work that Here is a prototype implementation of both schemes. Performance is identical for an optimized build. For non-optimized, I expect that this approach would also allow us to fix both #9075 and #5195. |
The |
@huonw: I got an updated version of my
I'm going to start working on a PR to switch over to this approach. |
@erickt (Also, some of your casts seem to have suffered from copy-paste. (That looks macro-able, fwiw).) Alsoalso, @pcwalton was saying that the IterBytes closures only get inlined very rarely, but this scheme has static dispatch and so could feasibly be (much) faster. :) |
I anticipate a lot of "GEE I WONDER WHY I DIDN'T THINK OF THAT" when newcomers come into the channel asking how to make their types hashable, since the hash trait has some weirdo type signature, and the answer is "Derive this other typeclass you've never heard of before."
Why can't we just make it be #[deriving(Hash)]? It also makes already-written code more legible.
The text was updated successfully, but these errors were encountered: