Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bytes): Deprecate BytesList class #3589

Merged
merged 1 commit into from
Aug 31, 2023

Conversation

aapoalas
Copy link
Collaborator

@aapoalas aapoalas commented Aug 30, 2023

As part of std cleanup, this PR deprecates the BytesList class and suggests plain Uint8Array[] usage instead. The two non-deprecated users of this class are also transformed to use a plain Uint8Array[].

These are:

  1. msgpack: Here the transformation is trivial and the bytes/concat function can be utilized directly.
  2. DelimiterStream: Here the transformation is very much nontrivial.

DelimiterStream

Switching to a plain Array based approach made quite a few optimisations more directly "visible" in the stream implementation. These are in no particular order:

  1. The generally expected case is that a chunk will include a delimiter. It's generally thus the case that the list of previous chunks is relatively short, and that at the end of a chunk's iteration all previous chunks will usually have been fully "forgotten" by the stream. This makes it a bit easier to reason about what is going on in the stream.
  2. All chunks are always fully iterated, meaning that the iteration index is always a local property: At the start of a chunk's iteration it is at the beginning of that chunk and at the end it is at its end. There is no need to keep it in-between iterations.
  3. Match index only needs to be "persisted" at the end of the chunk iteration. Otherwise it can be a purely local variable.
  4. Controversial / uncertain / API contract change: If a particular slice of data is coming from a single chunk, then there is no strict need to reallocate memory if we trust that the buffers we've been given are "ours" to give. This means that the buffers that our DelimiterStream outputs may be sub-views of (or direct references to) the buffers that it has been given, and any consumers should consider themselves free to touch or mutate only those sub-views they've been given. This is likely a fair performance boost, but it does mean that the API contract is much more strict suddenly.

closes #3549
part of #3489

@CLAassistant
Copy link

CLAassistant commented Aug 30, 2023

CLA assistant check
All committers have signed the CLA.

const delimitedChunkEnd = disposition === "suffix"
? inspectIndex
: delimiterStartIndex;
if (delimitedChunkEnd <= 0 && bufs.length === 0) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: Kind of an obvious downside here: I've written out all the possible cases in excruciating detail to optimise each of them individually, and added a fair bit of comments. The number of lines has gone way up.

@aapoalas aapoalas force-pushed the feat/bytes/cleanup branch from ae4b660 to 6768a50 Compare August 30, 2023 20:48
Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lower level utilities like DelimiterStream should be optimized for performance rather than ergonomics. The changes in it look fair and reasonable to me. LGTM

@kt3k kt3k merged commit 6d75c4e into denoland:main Aug 31, 2023
@johnspurlock
Copy link

Heads up - this seems to have broken DelimiterStream: see #3609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up bytes module
4 participants