-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: properly parse tar header numbers as octals #6376
base: main
Are you sure you want to change the base?
Conversation
According to https://www.gnu.org/software/tar/manual/html_node/Standard.html > The name, linkname, magic, uname, and gname are null-terminated character strings. All other fields are zero-filled *octal* numbers in ASCII. Each numeric field of width w contains w minus 1 digits, and a null. (In the extended GNU format, the numeric fields can take other forms.)
While the numbers inside JavaScript were being considered as base 10, they were only taken in a literal sense letting people write 720 instead of 0o720. The reason was that if you console logged this information it would be easier to understand than seeing 464 |
@BlackAsLight This will break any other recipient APIs for these values (Deno.open for example). Shouldn't that use case be prioritized over "printing clarity"? |
I guess maybe. A compromise may be to just pass it around as a string instead? If this change does happen though then the tests, and documentation will need to be updated, as well as these three lines here on for the encoder: https://github.com/denoland/std/blob/main/tar/tar_stream.ts#L225-L227 |
Fair enough but note, I'm also seeing some strange values on real archives. For example, files in https://github.com/rui314/mold/releases/download/v2.36.0/mold-2.36.0-x86_64-linux.tar.gz seem to have modes like Edit: yeah, it seems to be missing the last octal digit (group section). I.e. in the case of this archive, it was |
Ah, I think I see the issue, it seems to be cutting out the last two bytes as can be seen here. Edit: I'm guessing the tests are working is because the |
Several specs of tar differ slightly in how the octals should be suffixed, some adding a space at the end as only 6 octals are actually needed, while others push it back and just have an extra padded 0 at the start. GitHub seems to produce two types of tar formats when it creates the releases for repos, so some of those could be used to create additional tests to validate other implementations. The current tests first work on the premise of validating that the encoding of tarballs are correct and then using that premise to validate the decoding is also correct. |
Ah, I see. Sorry, I was just going of the GNU tar spec. And fair point I suppose needing only to validate the encoder. And on the mode values, I personally think it might be preferable to use POSIX ustar's approach (zero padding prefix) as opposed to the current implementation (which the free bsd docs describe as the "Old-Style Archive") since parsing this doesn't lose information (you just get a 0 at the end in the wrong case). But yes, this is an unstable API I suppose and this PR is might be the wrong place for this. |
I'm in favor of this change because FS API in Deno runtime does the same (They decode octal numbers as octal, and accept mode value like 0o666 ( https://docs.deno.com/api/deno/~/Deno.chmod )) @Yohe-Am Can you also work on encoder side and test cases? |
According to https://www.gnu.org/software/tar/manual/html_node/Standard.html