Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap: make snapshot reproducible #50983

Closed
wants to merge 3 commits into from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Nov 30, 2023

src: return non-empty data in context data serializer

For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

src: make sure that memcpy-ed structs in snapshot have no padding

To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

src: add utilities to help debugging reproducibility of snapshots

  • Print offsets in blob serializer
  • Add a special node:generate_default_snapshot ID to generate
    the built-in snapshot.
  • Improve logging
  • Add a test to check the reproducibilty of the snapshot

Refs: nodejs/build#3043

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup
  • @nodejs/v8-update

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Nov 30, 2023
@haraldh
Copy link

haraldh commented Jan 31, 2024

great work! keep it up!

@joyeecheung joyeecheung force-pushed the context-slot branch 2 times, most recently from 6df2ede to 8dc86b6 Compare March 22, 2024 19:07
@joyeecheung joyeecheung changed the title WIP: reproducible snapshot bootstrap: make snapshot reproducible Mar 22, 2024
@joyeecheung joyeecheung marked this pull request as ready for review March 22, 2024 19:08
@joyeecheung joyeecheung added request-ci Add this label to start a Jenkins CI on a PR. and removed request-ci Add this label to start a Jenkins CI on a PR. labels Mar 22, 2024
@joyeecheung joyeecheung force-pushed the context-slot branch 2 times, most recently from c92d201 to 2d911be Compare March 22, 2024 19:11
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Mar 22, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Mar 22, 2024
@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

Somehow this is reproducible locally (macOS + x64 & arm64), and in the CI, on FreeBSD and SmartOS, but not on others. There are still 3-4 words of differences in the snapshot on those platforms. I'll investigate.

@joyeecheung joyeecheung force-pushed the context-slot branch 2 times, most recently from 57b8d25 to f03a7ba Compare May 20, 2024 07:38
@joyeecheung
Copy link
Member Author

Split the context data callback part to #53217 - at least that makes the context data slots serialized with some non-gibberish (empty values), otherwise V8 might run into issues serializing the gibberish pointer values.

@legendecas
Copy link
Member

legendecas commented May 31, 2024

Is this PR still useful after #53217 being splitted?

@joyeecheung
Copy link
Member Author

joyeecheung commented May 31, 2024

Yes, in this PR are still some utilities for checking reproducibility and a test for that (but it's not currently passing because a few other bytes are still not reproducible, I will work on those bytes here).

@joyeecheung
Copy link
Member Author

joyeecheung commented Jun 12, 2024

Some discoveries:

  1. We need to return non-empty data for pointer values in context data otherwise V8 would still serialize them verbatim (that was what this PR used to do, in src: use v8::(Des|S)erializeInternalFieldsCallback #53217 I changed it to return empty data and that added back some more variance)
  2. The MemoryMode in EmbedderTypeInfo and the EmbedderObjectType in InternalFieldInfoBase are padded. We need to get rid of the padding - the easiest way to do so is to just make the enums uint64_t (a harder way would be to implement custom serialization for all the wrapper structs but I feel that's an overkill for a couple of bytes of overhead for now).

After fixing the two on Linux I am getting reproducible snapshots. Will update the branch after I clean it up..

For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot
@joyeecheung joyeecheung added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. labels Jun 14, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Jun 14, 2024
nodejs-github-bot pushed a commit that referenced this pull request Jun 14, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
@nodejs-github-bot
Copy link
Collaborator

Landed in 922feb1...1872167

nodejs-github-bot pushed a commit that referenced this pull request Jun 14, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
nodejs-github-bot pushed a commit that referenced this pull request Jun 14, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

written_total += w.Write<EnvSerializeInfo>(env_info);
w.Debug("Write code_cache\n");
w.Debug("0x%x: Write CodeCacheInfo\n", w.sink.size());
written_total += w.WriteVector<builtins::CodeCacheInfo>(code_cache);
w.Debug("SnapshotData::ToBlob() Wrote %d bytes\n", written_total);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth check that DCHECK_EQ(written_total, w.sink.size()).

targos pushed a commit that referenced this pull request Jun 20, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit that referenced this pull request Jun 20, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit that referenced this pull request Jun 20, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: #50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
eliphazb pushed a commit to eliphazb/node that referenced this pull request Jun 20, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
eliphazb pushed a commit to eliphazb/node that referenced this pull request Jun 20, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
eliphazb pushed a commit to eliphazb/node that referenced this pull request Jun 20, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
For pointer values in the context data, we need to return
non-empty data in the serializer so that V8 does not
serialize them verbatim, making the snapshot unreproducible.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
To make the snapshots reproducible, this patch updates the size
of a few types and adds some static assertions to ensure that
there are no padding in the memcpy-ed structs.

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
bmeck pushed a commit to bmeck/node that referenced this pull request Jun 22, 2024
- Print offsets in blob serializer
- Add a special node:generate_default_snapshot ID to generate
  the built-in snapshot.
- Improve logging
- Add a test to check the reproducibilty of the snapshot

PR-URL: nodejs#50983
Refs: nodejs/build#3043
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
nodejs-github-bot pushed a commit that referenced this pull request Jul 6, 2024
…ding"

This reverts commit 4e58cde.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
nodejs-github-bot pushed a commit that referenced this pull request Jul 6, 2024
To prevent padding from making the snapshot unreproducible,
zero-initialize the data that are copied into the snapshot
so that the padding copied are all zeros. This is better
than enlarging the enums to align the fields since it doesn't
make the snapshot bigger than necessary, and it removes the
need of using static assertions to ensure alignment.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
aduh95 pushed a commit that referenced this pull request Jul 12, 2024
…ding"

This reverts commit 4e58cde.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
aduh95 pushed a commit that referenced this pull request Jul 12, 2024
To prevent padding from making the snapshot unreproducible,
zero-initialize the data that are copied into the snapshot
so that the padding copied are all zeros. This is better
than enlarging the enums to align the fields since it doesn't
make the snapshot bigger than necessary, and it removes the
need of using static assertions to ensure alignment.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
aduh95 pushed a commit that referenced this pull request Jul 16, 2024
…ding"

This reverts commit 4e58cde.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
aduh95 pushed a commit that referenced this pull request Jul 16, 2024
To prevent padding from making the snapshot unreproducible,
zero-initialize the data that are copied into the snapshot
so that the padding copied are all zeros. This is better
than enlarging the enums to align the fields since it doesn't
make the snapshot bigger than necessary, and it removes the
need of using static assertions to ensure alignment.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
@marco-ippolito marco-ippolito added the dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. label Jul 19, 2024
marco-ippolito pushed a commit that referenced this pull request Aug 19, 2024
To prevent padding from making the snapshot unreproducible,
zero-initialize the data that are copied into the snapshot
so that the padding copied are all zeros. This is better
than enlarging the enums to align the fields since it doesn't
make the snapshot bigger than necessary, and it removes the
need of using static assertions to ensure alignment.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
marco-ippolito pushed a commit that referenced this pull request Aug 19, 2024
To prevent padding from making the snapshot unreproducible,
zero-initialize the data that are copied into the snapshot
so that the padding copied are all zeros. This is better
than enlarging the enums to align the fields since it doesn't
make the snapshot bigger than necessary, and it removes the
need of using static assertions to ensure alignment.

PR-URL: #53563
Refs: #50983
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Daniel Lemire <daniel@lemire.me>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants