Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly parse tar header numbers as octals #6376

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Yohe-Am
Copy link

@Yohe-Am Yohe-Am commented Feb 2, 2025

According to https://www.gnu.org/software/tar/manual/html_node/Standard.html

The name, linkname, magic, uname, and gname are null-terminated character strings. All other fields are zero-filled octal numbers in ASCII. Each numeric field of width w contains w minus 1 digits, and a null. (In the extended GNU format, the numeric fields can take other forms.)

According to https://www.gnu.org/software/tar/manual/html_node/Standard.html
> The name, linkname, magic, uname, and gname are null-terminated character strings. All other fields are zero-filled *octal* numbers in ASCII. Each numeric field of width w contains w minus 1 digits, and a null. (In the extended GNU format, the numeric fields can take other forms.)
@Yohe-Am Yohe-Am requested a review from kt3k as a code owner February 2, 2025 16:49
@CLAassistant
Copy link

CLAassistant commented Feb 2, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the tar label Feb 2, 2025
@Yohe-Am Yohe-Am changed the title fix: properly parse tar header numbers of octals fix: properly parse tar header numbers as octals Feb 2, 2025
@BlackAsLight
Copy link
Contributor

While the numbers inside JavaScript were being considered as base 10, they were only taken in a literal sense letting people write 720 instead of 0o720. The reason was that if you console logged this information it would be easier to understand than seeing 464

@Yohe-Am
Copy link
Author

Yohe-Am commented Feb 2, 2025

@BlackAsLight This will break any other recipient APIs for these values (Deno.open for example). Shouldn't that use case be prioritized over "printing clarity"?

@BlackAsLight
Copy link
Contributor

BlackAsLight commented Feb 2, 2025

@BlackAsLight This will break any other recipient APIs for these values (Deno.open for example). Shouldn't that use case be prioritized over "printing clarity"?

I guess maybe. A compromise may be to just pass it around as a string instead?

If this change does happen though then the tests, and documentation will need to be updated, as well as these three lines here on for the encoder: https://github.com/denoland/std/blob/main/tar/tar_stream.ts#L225-L227

@Yohe-Am
Copy link
Author

Yohe-Am commented Feb 2, 2025

Fair enough but note, I'm also seeing some strange values on real archives. For example, files in https://github.com/rui314/mold/releases/download/v2.36.0/mold-2.36.0-x86_64-linux.tar.gz seem to have modes like 75 and 64 instead of what I assume are 750 and 640. I fear there might be more bugs in there and maybe rewriting the tests to also make use of artifacts from other tar impls might be in order. The current tests just use the companion tar implementation to validate itself.

Edit: yeah, it seems to be missing the last octal digit (group section). I.e. in the case of this archive, it was 0o755 that was truncated to 75.

@Yohe-Am
Copy link
Author

Yohe-Am commented Feb 3, 2025

Ah, I think I see the issue, it seems to be cutting out the last two bytes as can be seen here.

Edit: I'm guessing the tests are working is because the tar_stream has an extra empty space prefix in the string meant to add the \0 byte at the end.

@BlackAsLight
Copy link
Contributor

Several specs of tar differ slightly in how the octals should be suffixed, some adding a space at the end as only 6 octals are actually needed, while others push it back and just have an extra padded 0 at the start.

GitHub seems to produce two types of tar formats when it creates the releases for repos, so some of those could be used to create additional tests to validate other implementations.

The current tests first work on the premise of validating that the encoding of tarballs are correct and then using that premise to validate the decoding is also correct.

@Yohe-Am
Copy link
Author

Yohe-Am commented Feb 3, 2025

Ah, I see. Sorry, I was just going of the GNU tar spec. And fair point I suppose needing only to validate the encoder.

And on the mode values, I personally think it might be preferable to use POSIX ustar's approach (zero padding prefix) as opposed to the current implementation (which the free bsd docs describe as the "Old-Style Archive") since parsing this doesn't lose information (you just get a 0 at the end in the wrong case). But yes, this is an unstable API I suppose and this PR is might be the wrong place for this.

@kt3k
Copy link
Member

kt3k commented Feb 4, 2025

I'm in favor of this change because FS API in Deno runtime does the same (They decode octal numbers as octal, and accept mode value like 0o666 ( https://docs.deno.com/api/deno/~/Deno.chmod ))

@Yohe-Am Can you also work on encoder side and test cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants