-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaner bitfields #1019
Cleaner bitfields #1019
Conversation
This draft PR try to fix a longstanding wart (see #196, #371, #667) regarding bitfields. We define bitfields as lists or vectors of `bool`s, as opposed to `bytes`. Benefits: * Remove ugly-as-hell helper function `verify_bitfield` * No more bit manipulation with `justification_bitfield` * Merkleisation-friendly `justification_bitfield` (i.e. `justification_bitfield` does not change unless a new epoch is justified). * Easy parametrisation of the number of bits in `justification_bitfield` (can be more or fewer than 64—more may be useful for dApps and light clients) * Semantically cleaner typing This requires a minor SSZ change (Merkleisation not affected) where lists and vectors of `bool`s should be packed, similar to packing of `uintN`. We could alias `bool` to `uint1` so that the packing logic only needs to be defined once for `uintN`.
I'm in favour of this because it adds clarity (bit manipulations should ideally account for as little as possible of the spec, as they are very hard to read). |
I still prefer the "bool" name as it is a functional description (and "bit" an implementation detail) but on the other hand, "bitfield" feels more natural than "boolfield". |
Wouldn't this also require a special case in SSZ? It makes the phase 0 spec cleaner but the SSZ serialization more complicated by about the same amount... Though you could argue that adding the bigint type to SSZ has a similar effect, and I've advocated for that :) |
Yes, the substantive change involves packing lists and vectors of bits similarly to packing lists and vectors of |
I think I would argue that even if it is not a complexity reduction, we should just do it because SSZ is the right place to deal with data types, and this is clearly a question of data types. Not having to worry about bitfields at higher levels of the spec makes it more readable; basically it should be possible to read the Eth2.0 spec without reading SSZ and just assuming that SSZ will do the right thing when it comes to bools/bits. |
Probably a bit more than 2 lines. Here's my attempt:
Note however that this is insufficient because unlike all other SSZ data structures, where we can deduce the number of items from the length, here we cannot do so, because an N byte list could correspond to a [8N .... 8N+7] size bitfield. So we'd need a custom length prefix (we don't need that today because we're feeding in the length from the committee size, so in some sense we have a "static" data type that's not really "static"). So perhaps we'd have to do something like My instinct is still that bitfields in SSZ are on net a good thing, but there definitely are costs to doing it.
Fully agree with this! |
Ooh, I have another idea for how to serialize variable-sized bitfields: use the generalized index scheme from https://github.com/ethereum/eth2.0-specs/blob/dev/specs/light_client/merkle_proofs.md#generalized-merkle-tree-index. As I describe here, there is a natural correspondence between generalized indices and bitfields, in that a generalized index is a bitfield, with 0s representing when to go left and 1s representing when to go right, but with an extra 1 bit to signify where the bitfield starts from. For example, 9 is the generalized index for "the right child of the left child of the left child of the root", 9 in binary is 1001, where the first 1 is the start marker, and then the 001 is the path from root to leaf. One could take this exact scheme and use it to re-interpret any bigint as a bitfield for any other purpose, and then we'd just need SSZ to support bigints. |
Serialisation is not part of consensus :) Serialisation-specific implementation details (such as a custom length prefix—which is a good point) do not contribute towards the "strict" line count. This PR is a natural continuation to #924. In the SSZ rewrite in progress (where the short Merkleisation-specific code is to be incorporated in
Interesting!
Can we simply alias |
We could, though I would argue that
The big problem here is that in pretty much every computer language, and the underlying architectures, things can't be N bits long for N % 8 != 0. The very output of
Everything I said above applies to tree-hashing too :) |
This is neat :) The main question is whether we want 0-based or 1-based indexing of generalised indices (#1008). An argument in favour of 0-based indexing is that bigints have support for 0, so with 0-based indexing the concepts of "bigint" and "generalised index" become equivalent.
It's still not clear to me why adding an SSZ type is necessary given we can use existing types.
Right, the current def chunkify(byte_string):
byte_string += b'\x00' * (-len(byte_string) % 32)
return [byte_string[i:i + 32] for i in range(0, len(byte_string), 32)] with something like def chunkify(bits):
bits += [0b0] * (-len(bits) % 256)
byte_string = bytes([sum([byte[b] << b for b in range(0, 8)]) for byte in zip(*(iter(bits),) * 8)])
return [byte_string[i:i + 32] for i in range(0, len(byte_string), 32)] |
I'm really inclined to say that making bits rather than bytes be the base unit of SSZ just for this single use case is overkill.
To preserve the norm that there's only one valid way to serialize something. If we have an injective map from bitfields to bigints, then we want there to be only one valid way to serialize a bigint, which means that deserializing eg.
That is interesting. So if we offset everything by 1 then we get a fully bijective map. Though that's only a really strong desideratum if the goal is to make |
Late to the party but very much agree with this. |
Are we closing this in favor of the bigint PR? |
I guess we want to merge the two. The bigint PR current only touches |
@dankrad Can we close this in favor of your coming PR? |
Closing in favour of #1224 |
This draft PR tries to fix a longstanding wart (see #371, #667) regarding bitfields. We define bitfields as lists or vectors of
bit
s (as opposed tobytes
anduint64
respectively). Benefits:verify_bitfield
andget_bitfield_bit
(save 28 lines)justification_bitfield
—more readablejustification_bitfield
(i.e.justification_bitfield
does not change unless a new epoch is justified)justification_bitfield
in case dApps and light clients find it useful (set toEPOCHS_PER_HISTORICAL_VECTOR == 8192
)bytes
anduint64
were hacks)bit
type forcustody_bit
This requires two minor SSZ changes:
bit
as an alias tobool
(cosmetic)bit
s similar to packing ofuintN
(substantive)