Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOS style layout #47

Merged
merged 2 commits into from
Apr 24, 2019
Merged

Conversation

pipermerriam
Copy link
Member

@pipermerriam pipermerriam commented Mar 15, 2019

WIP with the SOS style layout as specified in ethereum/consensus-specs#787

Depends on #58

What was wrong?

The spec for SSZ has been updated to a new serialization layout.

How was it fixed?

Updated the serialization and deserialization code for all Composite types to comply with the spec.

Some API updates:

  • Container sedes no longer operate on dictionaries or have any concept of fields names
  • Converted to using io.ByteIO streams for some decoding rather than tracking the current content index and end index.

Cute Animal Picture

eglu_classic_green_keeping_the_foxes_at_bay

pytest.ini Outdated
@@ -1,5 +1,5 @@
[pytest]
addopts= --showlocals --durations 10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revert this.

)

TSerializable = TypeVar("TSerializable")
TDeserialized = TypeVar("TDeserialized")
TCanonical = TypeVar("TCanonical")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revert this

@pipermerriam pipermerriam requested a review from jannikluhn April 5, 2019 19:22
@pipermerriam
Copy link
Member Author

@jannikluhn can you give this a preliminary review. Not really a merge candidate until ethereum/consensus-specs#787 has been merged.

Copy link
Contributor

@jannikluhn jannikluhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like that Container, Vector, and List have separate implementations of deserialize, even though they all more or less do the same thing. Is there a reason why we can't just put it in CompositeSedes, maybe relying on some abstract methods implemented in the subclasses such as _iter_sedes?

ssz/sedes/base.py Outdated Show resolved Hide resolved
ssz/sedes/base.py Outdated Show resolved Hide resolved
ssz/sedes/boolean.py Outdated Show resolved Hide resolved
ssz/sedes/base.py Show resolved Hide resolved
ssz/sedes/container.py Show resolved Hide resolved
ssz/utils.py Outdated
def read_exact(num_bytes: int, stream: IO[bytes]) -> bytes:
data = stream.read(num_bytes)
if len(data) != num_bytes:
raise DeserializationError(f"Tried to read {num_bytes}. Only got {len(data)} bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise DeserializationError(f"Tried to read {num_bytes}. Only got {len(data)} bytes")
raise DeserializationError(f"Tried to read {num_bytes}. Only got {len(data)} bytes")

ssz/utils.py Outdated
f"Content size is too large to be encoded in a {SIZE_PREFIX_SIZE} byte prefix",
)
def encode_offset(offset: int) -> bytes:
return offset.to_bytes(4, 'little')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return offset.to_bytes(4, 'little')
return offset.to_bytes(OFFSET_SIZE, 'little')

ssz/utils.py Outdated Show resolved Hide resolved
if len(value):
raise ValueError("Cannot compute trie hash for non-empty value using `EmptyList` sedes")
# TODO: This can just be a constant value, no need to compute at runtime
return merkleize(pack([]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return merkleize(pack([]))
return mix_in_length(merkleize(pack([])), 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this resolved on purpose?

#
# Tree hashing
#
def hash_tree_root(self, value: Sequence[Any]) -> bytes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is only suitable for Vector, but not for List and it would make more sense to have this in Vector directly.

@pipermerriam pipermerriam force-pushed the piper/sos-style-layout branch 2 times, most recently from 22bc497 to dd709d6 Compare April 9, 2019 18:08
@pipermerriam
Copy link
Member Author

@jannikluhn I addressed everything with the following exceptions.

  1. You suggested trying to combine things into a single generic deserialize and have subclass hooks. I'm 👎 on that approach. Each of the different algorithms have sufficient differences that I don't see combining them as being an improvement because I feel it would decreased readability with very marginal reduction in code duplication.
  2. You didn't like the mixing of values and offsets in Container deserialization. I segregated this into a small stand-alone function and have updated it so that the separation of values from offsets is no longer happening inline. I tinkered with an approach that didn't store them alongside each other at any point, but it added complexity at what I assessed to be no added safety since the segregation happens based on the sedes.is_fixed_size flag and any mechanism I thought up to keep them separate was still ultimately depending on that flag.

Copy link
Contributor

@jannikluhn jannikluhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good

There seem to be a few comments from the previous review that have been marked as resolved, presumably by accident? I reopened them just for double checking, most of them are minor, but at least the mix_in_length one for empty_list seems to be worth fixing.

ssz/sedes/byte.py Outdated Show resolved Hide resolved
ssz/sedes/byte.py Outdated Show resolved Hide resolved
def _validate_serializable(self, value: Sequence[Any]) -> bytes:
if len(value) != len(self.field_sedes):
raise SerializationError(
f"Not enough elements. Expected {len(self.field_sedes)}, Got {len(value)}"

This comment was marked as resolved.

if len(value):
raise ValueError("Cannot compute trie hash for non-empty value using `EmptyList` sedes")
# TODO: This can just be a constant value, no need to compute at runtime
return merkleize(pack([]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this resolved on purpose?

ssz/utils.py Outdated Show resolved Hide resolved
@pipermerriam
Copy link
Member Author

I'll update those. I think I force pushed over top of the commits that were made by the github UI in accepting your recommended changes.

@pipermerriam pipermerriam force-pushed the piper/sos-style-layout branch from 5293899 to 34a330c Compare April 18, 2019 21:51
@pipermerriam
Copy link
Member Author

@jannikluhn I've revived the fixes that were squashed in my force pushes.

I believe this is ready for final review and merge once any issues have been addressed.

@hwwhww
Copy link
Contributor

hwwhww commented Apr 19, 2019

@pipermerriam

For syncing the spec versioning, the procedure we need:

  1. Merge Serializable metaclass cleanups #57, Root and SignedRoot #58
  2. Release py-ssz (sync with spec v0.5.1)
  3. Merge this PR (SOS style layout #47) (spec dev branch)

if not offset_pairs:
return fixed_size_values

dynamic_size_values = self.deserialize_section_1(offset_pairs, stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable v.s. dynamic

I'm fine with both! But we can sync it by either changing the spec or changing py-ssz side.

if next_field_start_index <= field_start_index:
raise Exception("Invariant: must always make progress")
field_start_index = next_field_start_index
# Verify that bothe iterables have been fully consumed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Suggested change
# Verify that bothe iterables have been fully consumed.
# Verify that both iterables have been fully consumed.

pass


def _compute_section_0_size(element_sedes: Iterable[TSedes]) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO section_0 and section_1 are confusing names. fixed_parts and variable_parts from the current ethereum/consensus-specs#787 PR are better.

raise Exception("Invariant: must always make progress")
element_start_index = next_element_start_index
num_remaining_offset_bytes = first_offset - stream.tell()
if num_remaining_offset_bytes % 4 != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jannikluhn mentioned: https://github.com/ethereum/py-ssz/pull/47/files#r272979440
Should we use OFFSET_SIZE instead of 4?

Suggested change
if num_remaining_offset_bytes % 4 != 0:
if num_remaining_offset_bytes % OFFSET_SIZE != 0:

num_remaining_offset_bytes = first_offset - stream.tell()
if num_remaining_offset_bytes % 4 != 0:
raise DeserializationError(
f"Offset bytes was not a multiple of 4. Got {num_remaining_offset_bytes}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Suggested change
f"Offset bytes was not a multiple of 4. Got {num_remaining_offset_bytes}"
f"Offset bytes was not a multiple of {OFFSET_SIZE}. Got {num_remaining_offset_bytes}"


yield element
num_remaining_offsets = num_remaining_offset_bytes // 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Suggested change
num_remaining_offsets = num_remaining_offset_bytes // 4
num_remaining_offsets = num_remaining_offset_bytes // OFFSET_SIZE

@pipermerriam pipermerriam force-pushed the piper/sos-style-layout branch from 34a330c to 4ee2470 Compare April 19, 2019 16:18
@pipermerriam pipermerriam changed the title [WIP] SOS style layout SOS style layout Apr 19, 2019
@pipermerriam
Copy link
Member Author

THis has now been rebased on top of #58

@pipermerriam pipermerriam force-pushed the piper/sos-style-layout branch 2 times, most recently from ea4e18c to 69f09ea Compare April 19, 2019 16:27
@pipermerriam
Copy link
Member Author

This PR should be ready to merge once the two dependent PR's are merged.

@pipermerriam pipermerriam force-pushed the piper/sos-style-layout branch from 8574b7b to 3a871a4 Compare April 23, 2019 19:39
@pipermerriam
Copy link
Member Author

@jannikluhn I've rebased. All tests are passing. This should get one more review pass to be sure everything is fine with you and/or @hwwhww as I had to resolve a number of non-trivial merge conflicts and then resulting test failures.

Copy link
Contributor

@jannikluhn jannikluhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpicks and a potential bug in list deserialization, other than that LGTM.

ssz/constants.py Outdated
@@ -9,3 +9,9 @@
MAX_CONTENT_SIZE = 2 ** (SIZE_PREFIX_SIZE * 8) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like SIZE_PREFIX_SIZE and MAX_CONTENT_SIZE could be removed.

TDeserializedElement = TypeVar("TDeserializedElement")


class HomogenousSequence(CompositeSedes[Sequence[Any], Tuple[Any]]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class HomogenousSequence(CompositeSedes[Sequence[Any], Tuple[Any]]):
class HomogenousSequence(CompositeSedes[Sequence[Any], Tuple[Any, ...]]):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this class doesn't seem to be used (Vector, List, and EmptyList are just CompositeSedes).

raise ValidationError(
f"The following fields are duplicated {','.join(sorted(duplicate_field_names))}"
)
from ssz.sedes.base import BaseSedes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

)

TSerializable = TypeVar("TSerializable")
TDeserialized = TypeVar("TDeserialized")

EMPTY_LIST_HASH_TRIES_ROOT = mix_in_length(merkleize(pack([])), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EMPTY_LIST_HASH_TRIES_ROOT = mix_in_length(merkleize(pack([])), 0)
EMPTY_LIST_HASH_TREE_ROOT = mix_in_length(merkleize(pack([])), 0)

As it's just a normal Merkle tree, not a trie.

Also, this could be moved to constants.py (but I guess here would be fine too as it's very specific).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it here because I don't want to import merkleize or pack from within constants.py


def hash_tree_root(self, value: Sequence[TSerializable]) -> bytes:
if len(value):
raise ValueError("Cannot compute trie hash for non-empty value using `EmptyList` sedes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Cannot compute trie hash for non-empty value using `EmptyList` sedes")
raise ValueError("Cannot compute tree hash for non-empty value using `EmptyList` sedes")

element_size = self.element_sedes.get_fixed_size()
data = stream.read()
if len(data) % element_size != 0:
raise DeserializationError("TODO: INVALID LENGTH")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover TODO

yield self.element_sedes.deserialize(segment)
else:
try:
first_offset = s_decode_offset(stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also fail if there are more than zero but less than 4 bytes in which case we should not return the empty list but raise an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added a test for this case.

@pipermerriam pipermerriam force-pushed the piper/sos-style-layout branch from b5849cc to 2cb7311 Compare April 24, 2019 13:12
@pipermerriam pipermerriam force-pushed the piper/sos-style-layout branch from 2cb7311 to 9cafc75 Compare April 24, 2019 13:14
@pipermerriam pipermerriam merged commit 91f0928 into ethereum:master Apr 24, 2019
@pipermerriam pipermerriam deleted the piper/sos-style-layout branch April 24, 2019 13:16
@pipermerriam
Copy link
Member Author

@hwwhww I just realized I merged when this had "DO NOT MERGE" label on it... feel free to revert or force push master back to previous state.

@hwwhww
Copy link
Contributor

hwwhww commented Apr 26, 2019

Whoops sorry, I need to revert it and add #62 bugfix and #60 tool into the next py-ssz release (mapping to spec v0.5.1).

@hwwhww hwwhww mentioned this pull request Apr 26, 2019
carver added a commit that referenced this pull request Aug 18, 2022
Better error if bump missing in make notes/release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants