fix: properly parse tar header numbers as octals #6376

Yohe-Am · 2025-02-02T16:49:17Z

According to https://www.gnu.org/software/tar/manual/html_node/Standard.html

The name, linkname, magic, uname, and gname are null-terminated character strings. All other fields are zero-filled octal numbers in ASCII. Each numeric field of width w contains w minus 1 digits, and a null. (In the extended GNU format, the numeric fields can take other forms.)

According to https://www.gnu.org/software/tar/manual/html_node/Standard.html > The name, linkname, magic, uname, and gname are null-terminated character strings. All other fields are zero-filled *octal* numbers in ASCII. Each numeric field of width w contains w minus 1 digits, and a null. (In the extended GNU format, the numeric fields can take other forms.)

CLAassistant · 2025-02-02T16:49:23Z

All committers have signed the CLA.

BlackAsLight · 2025-02-02T18:19:05Z

While the numbers inside JavaScript were being considered as base 10, they were only taken in a literal sense letting people write 720 instead of 0o720. The reason was that if you console logged this information it would be easier to understand than seeing 464

Yohe-Am · 2025-02-02T23:07:46Z

@BlackAsLight This will break any other recipient APIs for these values (Deno.open for example). Shouldn't that use case be prioritized over "printing clarity"?

BlackAsLight · 2025-02-02T23:33:52Z

@BlackAsLight This will break any other recipient APIs for these values (Deno.open for example). Shouldn't that use case be prioritized over "printing clarity"?

I guess maybe. A compromise may be to just pass it around as a string instead?

If this change does happen though then the tests, and documentation will need to be updated, as well as these three lines here on for the encoder: https://github.com/denoland/std/blob/main/tar/tar_stream.ts#L225-L227

Yohe-Am · 2025-02-02T23:48:41Z

Fair enough but note, I'm also seeing some strange values on real archives. For example, files in https://github.com/rui314/mold/releases/download/v2.36.0/mold-2.36.0-x86_64-linux.tar.gz seem to have modes like 75 and 64 instead of what I assume are 750 and 640. I fear there might be more bugs in there and maybe rewriting the tests to also make use of artifacts from other tar impls might be in order. The current tests just use the companion tar implementation to validate itself.

Edit: yeah, it seems to be missing the last octal digit (group section). I.e. in the case of this archive, it was 0o755 that was truncated to 75.

Yohe-Am · 2025-02-03T00:08:33Z

Ah, I think I see the issue, it seems to be cutting out the last two bytes as can be seen here.

Edit: I'm guessing the tests are working is because the tar_stream has an extra empty space prefix in the string meant to add the \0 byte at the end.

BlackAsLight · 2025-02-03T00:24:51Z

Several specs of tar differ slightly in how the octals should be suffixed, some adding a space at the end as only 6 octals are actually needed, while others push it back and just have an extra padded 0 at the start.

GitHub seems to produce two types of tar formats when it creates the releases for repos, so some of those could be used to create additional tests to validate other implementations.

The current tests first work on the premise of validating that the encoding of tarballs are correct and then using that premise to validate the decoding is also correct.

Yohe-Am · 2025-02-03T00:53:14Z

Ah, I see. Sorry, I was just going of the GNU tar spec. And fair point I suppose needing only to validate the encoder.

And on the mode values, I personally think it might be preferable to use POSIX ustar's approach (zero padding prefix) as opposed to the current implementation (which the free bsd docs describe as the "Old-Style Archive") since parsing this doesn't lose information (you just get a 0 at the end in the wrong case). But yes, this is an unstable API I suppose and this PR is might be the wrong place for this.

kt3k · 2025-02-04T05:16:37Z

I'm in favor of this change because FS API in Deno runtime does the same (They decode octal numbers as octal, and accept mode value like 0o666 ( https://docs.deno.com/api/deno/~/Deno.chmod ))

@Yohe-Am Can you also work on encoder side and test cases?

Yohe-Am added 2 commits February 2, 2025 19:45

fix: typo

a8f1dbd

Yohe-Am requested a review from kt3k as a code owner February 2, 2025 16:49

github-actions bot added the tar label Feb 2, 2025

Yohe-Am changed the title ~~fix: properly parse tar header numbers of octals~~ fix: properly parse tar header numbers as octals Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: properly parse tar header numbers as octals #6376

fix: properly parse tar header numbers as octals #6376

Yohe-Am commented Feb 2, 2025

CLAassistant commented Feb 2, 2025 •

edited

Loading

BlackAsLight commented Feb 2, 2025

Yohe-Am commented Feb 2, 2025 •

edited

Loading

BlackAsLight commented Feb 2, 2025 •

edited

Loading

Yohe-Am commented Feb 2, 2025 •

edited

Loading

Yohe-Am commented Feb 3, 2025 •

edited

Loading

BlackAsLight commented Feb 3, 2025

Yohe-Am commented Feb 3, 2025

kt3k commented Feb 4, 2025

fix: properly parse tar header numbers as octals #6376

Are you sure you want to change the base?

fix: properly parse tar header numbers as octals #6376

Conversation

Yohe-Am commented Feb 2, 2025

CLAassistant commented Feb 2, 2025 • edited Loading

BlackAsLight commented Feb 2, 2025

Yohe-Am commented Feb 2, 2025 • edited Loading

BlackAsLight commented Feb 2, 2025 • edited Loading

Yohe-Am commented Feb 2, 2025 • edited Loading

Yohe-Am commented Feb 3, 2025 • edited Loading

BlackAsLight commented Feb 3, 2025

Yohe-Am commented Feb 3, 2025

kt3k commented Feb 4, 2025

CLAassistant commented Feb 2, 2025 •

edited

Loading

Yohe-Am commented Feb 2, 2025 •

edited

Loading

BlackAsLight commented Feb 2, 2025 •

edited

Loading

Yohe-Am commented Feb 2, 2025 •

edited

Loading

Yohe-Am commented Feb 3, 2025 •

edited

Loading