Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unixfs: integrate pb.Data into FSNode to avoid duplicating fields #5098

Merged
merged 1 commit into from
Jun 13, 2018
Merged

unixfs: integrate pb.Data into FSNode to avoid duplicating fields #5098

merged 1 commit into from
Jun 13, 2018

Conversation

schomatis
Copy link
Contributor

@schomatis schomatis commented Jun 8, 2018

unixfs: integrate `pb.Data` into `FSNode`

To avoid duplicating fields and making the code easier to follow.

Remove all of `FSNode` previous fields in favor on a single `pb.Data` structure
that is not exported. Accessor methods are added only for the necessary internal
fields. This takes up more memory, `pb.Data` is always created inside `FSNode`
and it stays there instead of just being created and destroyed during the
(un)marshal operations.

The removed fields `Data`, `blocksizes` and `Type` had a direct counterpart in
the embedded `pb.Data` structure, in contrast (only) the `subtotal` field
doesn't have one, it was used as a temporary accumulator to track the
`Filesize`, which is now being kept updated on every modification (to ensure the
entire `FSNode` is always at a valid state), so `subtotal` could just be removed
without the addition of any other field (this temporary accumulator  was
obscuring how `Filesize` was computed).

To keep `Filesize` up to date a method was added (`UpdateFilesize()`) to adjust
its value in the two places where the file size could be modified, when changing
its data (in `SetData()`, accessor method added) and when adding or removing
child nodes (in `AddBlockSize()` and `RemoveBlockSize()`).

A constructor method was added (`NewFSNode()`) to initialize the required
fields, like `Type` which is explicitly set, this deprecates the previous
methodology of just calling `new(FSNode)` and relying in the default value of
`pb.Data_DataType` (`Data_Raw`) to avoid an explicit assignment. Also,
`Filesize` is initialized to avoid being left with a `nil` value before
marshaling empty nodes, which would result in a different hash from previous
versions, to be backwards compatible. Previous versions of `GetBytes()` always
set the `Filesize` value, even though it is reflected as an `optional` field in
the `.proto` file (this may be an inaccurate field rule).

Without the duplicated fields the functions `GetBytes()` and `FSNodeFromBytes()`
are now reduced to simple `Marshal()` and `Unmarshal()` operations respectively.

@schomatis schomatis requested a review from Kubuxu as a code owner June 8, 2018 15:32
@ghost ghost assigned schomatis Jun 8, 2018
@ghost ghost added the status/in-progress In progress label Jun 8, 2018
@schomatis
Copy link
Contributor Author

My main question is exactly how to integrate the pb.Data structure into FSNode:

  1. Whether like an anonymous structure providing direct access to its fields without the Format intermediary name. I prefer this one, although it would lead to ugly names like n.Data.Data (because the protobuf message is named Data but also one of its fields).

  2. Like a named structure (current PR) and check what name would be more suited, Format was the first thing that came to mind.

@Stebalien

@schomatis schomatis added the topic/files Topic files label Jun 8, 2018
@schomatis schomatis added this to the Files API Documentation milestone Jun 8, 2018
@Stebalien
Copy link
Member

I'd like to just embed it however, that would expose a lot of internal state. I believe we may want to eventually expose these datastructures as a library so that may not be the best idea.


Are you planning on reusing the Type field as well? As it stands, I'm not convinced this code is any simpler and it has a few drawbacks:

  1. The Format field isn't really valid until you call GetBytes (potential foot gun).
  2. GetBytes is no longer thread-safe (probably not an issue, but getters that modify always weird me out a bit).

@schomatis
Copy link
Contributor Author

@Stebalien This was an incomplete PoC PR to get an idea of what it would look like refactoring these structures together, but I'm seeing now that leaving it halfway done actually creates more noise than signal. I'll give it another iteration and I'll ping you back. Thanks for the feedback.

that would expose a lot of internal state.

Yes, you're right, it would be better not to export the embedded format structure (leaving it as FSNode.format) and providing only the necessary accessor methods.

Are you planning on reusing the Type field as well?

Yes, this PR would remove all of the previous FSNode fields (they were all redundant), Type has a direct counterpart also in pb.Data and subtotal would also be replaced by keeping an up to date FSNode.format.Filesize every time the file is modified (the data is updated or a child is added).

  1. The Format field isn't really valid until you call GetBytes (potential foot gun).
  2. GetBytes is no longer thread-safe (probably not an issue, but getters that modify always weird me out a bit).

Yes, that's the haflway (not) done part, format.Filesize (and the entire format) would be kept updated on every modification so it would always be at a valid state, and hence GetBytes would be just a read-only function.

@whyrusleeping
Copy link
Member

Seems like we could just UpdateFilesize(0) in the constructor.

@schomatis
Copy link
Contributor Author

Agreed.

I was doubting about doing that, I added it in GetBytes() because it assures me that I'm always going to have the Filesize initialized before marshaling, but since I'm making the constructor mandatory anyways (as it's the only way to set the Type, otherwise GetBytes() would err) I can move all the responsibility of the FSNode integrity there.

To avoid duplicating fields and making the code easier to follow.

Remove all of `FSNode` previous fields in favor on a single `pb.Data` structure
that is not exported. Accessor methods are added only for the necessary internal
fields. This takes up more memory, `pb.Data` is always created inside `FSNode`
and it stays there instead of just being created and destroyed during the
(un)marshal operations.

The removed fields `Data`, `blocksizes` and `Type` had a direct counterpart in
the embedded `pb.Data` structure, in contrast (only) the `subtotal` field
doesn't have one, it was used as a temporary accumulator to track the
`Filesize`, which is now being kept updated on every modification (to ensure the
entire `FSNode` is always at a valid state), so `subtotal` could just be removed
without the addition of any other field (this temporary accumulator  was
obscuring how `Filesize` was computed).

To keep `Filesize` up to date a method was added (`UpdateFilesize()`) to adjust
its value in the two places where the file size could be modified, when changing
its data (in `SetData()`, accessor method added) and when adding or removing
child nodes (in `AddBlockSize()` and `RemoveBlockSize()`).

A constructor method was added (`NewFSNode()`) to initialize the required
fields, like `Type` which is explicitly set, this deprecates the previous
methodology of just calling `new(FSNode)` and relying in the default value of
`pb.Data_DataType` (`Data_Raw`) to avoid an explicit assignment. Also,
`Filesize` is initialized to avoid being left with a `nil` value before
marshaling empty nodes, which would result in a different hash from previous
versions, to be backwards compatible. Previous versions of `GetBytes()` always
set the `Filesize` value, even though it is reflected as an `optional` field in
the `.proto` file (this may be an inaccurate field rule).

Without the duplicated fields the functions `GetBytes()` and `FSNodeFromBytes()`
are now reduced to simple `Marshal()` and `Unmarshal()` operations respectively.

License: MIT
Signed-off-by: Lucas Molas <schomatis@gmail.com>
@schomatis
Copy link
Contributor Author

@Stebalien Could you please take another look?

@schomatis schomatis changed the title [WIP] unixfs: integrate pb.Data into FSNode to avoid duplicating fields unixfs: integrate pb.Data into FSNode to avoid duplicating fields Jun 11, 2018
@schomatis schomatis removed the status/in-progress In progress label Jun 11, 2018
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely prefer this.

@kevina kevina self-requested a review June 12, 2018 04:33
Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@whyrusleeping whyrusleeping merged commit 9f9ddd5 into ipfs:master Jun 13, 2018
@schomatis schomatis deleted the feat/unixfs/fsnode-include-pb branch June 13, 2018 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/files Topic files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants