Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(archive): UntarStream and TarStream #4548

Merged
merged 104 commits into from
Sep 2, 2024

Conversation

BlackAsLight
Copy link
Contributor

@BlackAsLight BlackAsLight commented Apr 4, 2024

Following on from this pull request, #4538, I didn't know how to revert the changes on that branch so just made a new one since the idea is that I'm creating a new API and not replacing one.

Closes #1658

Usage Examples

import { TarStream, type TarStreamInput } from "@std/archive/tar-stream"

await ReadableStream.from<TarStreamInput>([
  { path: "potato/" },
  {
    path: "deno.json",
    size: (await Deno.stat("deno.json")).size,
    readable: (await Deno.open("deno.json")).readable,
  },
])
  .pipeThrough(new TarStream())
  .pipeThrough(new CompressionStream("gzip"))
  .pipeTo((await Deno.create("archive.tar.gz")).writable)
import { UntarStream } from "@std/archive/untar-stream"
import { dirname, normalize } from "@std/path"

const scope = "./archive/"
for await (
  const entry of (await Deno.open("archive.tar.gz"))
  .readable
  .pipeThrough(new DecompressionStream("gzip"))
  .pipeThrough(new UntarStream())
) {
  const path = normalize(scope + entry.path)
  await Deno.mkdir(dirname(path), { recursive: true })
  await entry.readable?.pipeTo((await Deno.create(path)).writable)
}

@BlackAsLight BlackAsLight requested a review from kt3k as a code owner April 4, 2024 11:02
@iuioiua iuioiua requested a review from crowlKats April 4, 2024 21:58
@iuioiua
Copy link
Contributor

iuioiua commented Apr 8, 2024

From a quick skim, this PR shows promise, but there are opportunities to simplify and make the implementation more straightforward to understand. @crowlKats, can you see this as a viable alternative to #1985? If so, I'll go ahead and help polish this up.

Copy link
Member

@crowlKats crowlKats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why this PR is wroking and mine is not is because this one does not use byte streams. Not sure if this should be a blocking action or not, but its definitively a downgrade when comparing to the non-stream API and doesnt line up with other streams we do in CLI that do file or io handling.

That aside, as I discussed with @kt3k, we are a bit concerned that since this PR implements tar from scratch, this will require a lot more exhaustive & thorough tests

@BlackAsLight
Copy link
Contributor Author

... when comparing to the non-stream API and doesnt line up with other streams we do in CLI that do file or io handling.

With regards to this comment, I'd have assumed there was an intention to remove all Deno references, making it compatible with other runtimes and therefore not offer file or io handling.

Although that was 2y ago now so things may have changed.

@crowlKats
Copy link
Member

that has nothing to do with what I have said. I am talking about streams behaviour being consistent across similar purpouses, in this case being byte streams instead of normal streams

@iuioiua
Copy link
Contributor

iuioiua commented Apr 10, 2024

@BlackAsLight, we're still determining whether to pursue this PR or #1985. We'll come back to you once a decision has been made.

Copy link

codecov bot commented Apr 12, 2024

Codecov Report

Attention: Patch coverage is 63.54916% with 152 lines in your changes are missing coverage. Please review.

Project coverage is 91.13%. Comparing base (8c87b7f) to head (dacaea5).

Files Patch % Lines
archive/tar_stream.ts 54.59% 83 Missing and 1 partial ⚠️
archive/untar_stream.ts 70.68% 68 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4548      +/-   ##
==========================================
- Coverage   91.44%   91.13%   -0.31%     
==========================================
  Files         480      482       +2     
  Lines       37324    37741     +417     
  Branches     5320     5391      +71     
==========================================
+ Hits        34132    34397     +265     
- Misses       3136     3287     +151     
- Partials       56       57       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BlackAsLight
Copy link
Contributor Author

Looking into what bytes streams are and how they work, I do think I could implement them into both streams. At least for when they're serving content out. I don't think I'd run into the same problem you're facing in implementing byte streams with it closing early or something. Although I am yet to actually test it.

On a side note the byte streams examples on MDN make use of both the byteRequest and enqueue for pushing, but my implementation only works on a pulling method so it would only ever be doing one or the other based off what the user provided in the options.

@BlackAsLight
Copy link
Contributor Author

I don't really understand why, but for some reason when I added byte support for TarStream it consumed the text variable from the tests instead of copying them. Even when { mode: 'byob' } wasn't provided.

Adding byte support for the UnTarStream does seem doable from my understanding of it, but will be more complex.

Even though you haven't decided if you want to pursue my implementation or not. I'm still going to try and make the improvements in the mean time as I have the time.

@kt3k kt3k changed the title refactor(archive): An implementation of Tar as streams feat(archive): An implementation of Tar as streams Apr 22, 2024
Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leo and I looked at this and we're mostly happy with it. Few things we'd like to see:

  1. Reverting UntarStream behavior to prioritize ergonomics.
  2. Throwing with appropriate error class instances. E.g. RangeError or TypeError, etc.
  3. Turning validation functions into assertion functions.

@iuioiua iuioiua changed the title feat(archive): An implementation of Tar as streams feat(archive): UntarStream and TarStream Aug 29, 2024
Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small bits remaining. Once they're addressed, this will LGTM! @kt3k @crowlKats, PTAL.

Comment on lines 6 to 37
export interface TarStreamFile {
/**
* The path to the file, relative to the archive's root directory.
*/
path: string;
/**
* The size of the file in bytes.
*/
size: number;
/**
* The contents of the file.
*/
readable: ReadableStream<Uint8Array>;
/**
* The metadata of the file.
*/
options?: TarStreamOptions;
}

/**
* The interface required to provide a directory.
*/
export interface TarStreamDir {
/**
* The path of the directory, relative to the archive's root directory.
*/
path: string;
/**
* The metadata of the directory.
*/
options?: TarStreamOptions;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crowlKats, did you still want to add a type field to these interfaces?

Copy link
Member

@crowlKats crowlKats Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, to easily distinguish in code what entry is what kind

@iuioiua iuioiua enabled auto-merge (squash) September 2, 2024 07:41
Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few tweaks, including adding a type: "file" | "directory" field for TarStream. Now, LGTM! Thank you for your work, @BlackAsLight. And thank you to all that reviewed and contributed to this PR! Let's test and continue to iterate on this experimental API.

@iuioiua iuioiua merged commit 9298ea5 into denoland:main Sep 2, 2024
17 checks passed
@BlackAsLight BlackAsLight deleted the archive_tar_stream branch September 5, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modernize std/archive/tar
8 participants