Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reversible decompression? #98

Closed
nitro322 opened this issue Sep 12, 2021 · 16 comments · Fixed by #131
Closed

Reversible decompression? #98

nitro322 opened this issue Sep 12, 2021 · 16 comments · Fixed by #131
Assignees
Labels
enhancement New feature or request file format Issues with the NSZ file format
Milestone

Comments

@nitro322
Copy link

nitro322 commented Sep 12, 2021

I've recently been experimenting with compressing my XCI dumps with nsz. It yields both a better compression ratio and shorter time than my current solution (parallel XZ encoding using pixz), which is quite the feat.

However, decompressed files do not match the original, which is a big deal for me as I keep all my games in a state that is verifiable against a reference source like no-intro. If I compress with nsz, there's seems to be no way to verify again in the future.

Am I perhaps decompressing wrong, or missing an important flag? If so, could you let me know what I'm missing.

If not, would you consider adding an option for reversible decompression as an enhancement? I'd gladly sacrifice file size to be able to decompress back to a bit-for-bit identical copy of my original XCI.

Thanks for the consideration.

@nicoboss
Copy link
Owner

nicoboss commented Oct 8, 2021

As described in https://gbatemp.net/threads/nsz-homebrew-compatible-nsp-xci-compressor-decompressor.550556/post-8866470 the NCA files are hash identical. The only thing that makes the NSP/XCI files to sometimes have a different hash is the order in which you pack the NCA files into the container. If you want your XCI files to be hash identical just make sure to pack the files in the same order. Maybe consider hashing the NCA files instead which are already are sha256 hashed and signed. NCAs not being hash identical would break the signature and file integrity check so obviously they are. Maybe I should store the file order in a file and then restore it similar to how I ended up doing it at the very end of nsZip's life. This won't have any impact on compression ratio and is exactly what you want. This definitely would work for NSP and likely also work for XCI. I just don't see much reason to do so as in the end NSP/XCI hash doesn't matter as they are just a container storing the actual NCA files. Feel free to implement this by your own and create pull request. I might do it myself if I ever find time for it. If you can’t wait maybe look at the long abandoned nsZip Project https://github.com/nicoboss/nsZip which should be much nearer at recreating XCIs with the exact same hash but I might never ended up completing that feature as I abandoned it in favor of NSZ file format but maybe I did. But in any way nsZip shows what approach would be required to keep NSP/XCI hash identical.

@nitro322
Copy link
Author

nitro322 commented Oct 8, 2021

Thanks for the comments. Didn't realize you had already addressed this issue, so I appreciate the heads up on that. Will just say that, from a functional perspective, your arguments about only needing to validate the NCAs are certainly valid, but there's obviously more data in NSPs and especially XCIs than just the NCAs, and from a larger preservation perspective being able to replicate and prove the validity of a ROM image is very useful. If you could eventually work something like that in that'd be great, but I understand that isn't a core requirement if your focus is only on the game data in the NCAs.

Also appreciate the pointer to nsZip. Will give that a try and see how it works.

As for implementing myself... will probably poke around at this, but python's really not my jam. I couldn't even figure out the decompression path issue in that other bug I commented on, so this is likely beyond me, but would be interesting to get a better understanding of how NSZ works if nothing else.

@blawar
Copy link
Collaborator

blawar commented Oct 12, 2021

I think all of the current XCI's are bad for preservation, because they mislead people into thinking they are exact copies and can be backed up to a game cart if they ever become available. This is not true, all of the XCI's people believe to be "exact copies" with "matching hashes" are missing significant data, which means the XCI's only true purpose was playing game backups on SXOS, which is hardly preservation.

@nitro322
Copy link
Author

all of the XCI's people believe to be "exact copies" with "matching hashes" are missing significant data

This is getting outside the scope of the request, but if you don't mind sharing, what's missing? I know there was some data in the initial/key area of the game cart that wasn't read by earlier versions of nxdumptool, but that's since been corrected. I'm not aware of other data that's missing.

@blawar
Copy link
Collaborator

blawar commented Oct 12, 2021

The certificate is the obvious part. Everyone is autistic about exact data until it comes to that apparently 😂

The big thing that is the missing lotus data for the handshake, which no one knows how to dump. It is not a true copy without that. Without that, its basically just a bunch of NCA's which is exactly what NSZ treats them as. I find it kind of funny people obsess over gigabytes of padded 0's at the end of the file, NCA order, but then they dont have the lotus data.

@mid-kid
Copy link

mid-kid commented Oct 25, 2022

When it comes to preservation, data that differs for each separate cartridge hardly matters. Since that'll be different for every cartridge, meaning none of the hashes will ever match.
Everything that would otherwise be the exact same ought to be preserved, however.

EDIT: Can we have a clear overview of what file types the compressor is able to restore exactly in the readme? I think that'd be helpful.

@nitro322
Copy link
Author

Right. To clarify, No-Intro records hashes with all unique data stripped out, so if I compare my own stripped hash with someone else's then the the results will match ((assuming all other bits are correct). So I still feel there's value in reversible decompression.

@nicoboss
Copy link
Owner

nicoboss commented Nov 13, 2022

I just had a great and in hindsight quite obvious idea how to solve this. We could remember the order of the files stored inside the NSP. When we repack the compressed files inside the NSZ we could do so in the same order. When we now do the same during extraction all the files should stay inside the same order inside the recreated NSP file as well. No need to store the file order separately. Beside the order we still have to worry zero padding when recreating the NSP as described in #101 and #116. Hash identical recreation should indeed be possible to implement without much work. I guess enough requested this to be implemented for it to justify this effort. I still find it quite insane users care about the order in which files are stored inside the file PFS0 system and how much zero padding there is.

@nicoboss nicoboss self-assigned this Nov 13, 2022
@nicoboss nicoboss added enhancement New feature or request file format Issues with the NSZ file format labels Nov 13, 2022
@nicoboss nicoboss added this to the v4.2 milestone Nov 13, 2022
@alucryd
Copy link
Contributor

alucryd commented Feb 12, 2023

I would love to see this come to fruition so I can integrate nsz into oxyromon. I am focused on archival, so reversible conversion is key.

@nicoboss
Copy link
Owner

I would love to see this come to fruition so I can integrate nsz into oxyromon. I am focused on archival, so reversible conversion is key.

We don't really talk about reversible compression because the NCA => NCZ => NCA compression/decompression already fully reversible and byte/hash identical. We only talk about the ability of packing the decompressed NCAs the same way to an NSP file as they originally where packed. To easier understand the issue a great analogy to NSP files are common tar files which booth are file containers that contain files. What this issue is about is to keep the order and padding between the files inside the file container identical to the original so the NSP files before compression and after decompression are hash identical.

This feature now got requested enough times that this will be the next feature I work on. I will start working on this during the next weekend if noting more urgent pops up.

@alucryd
Copy link
Contributor

alucryd commented Feb 12, 2023

Yeah I understood all that already, and I'm familiar with the NSP file format (as of earlier today after reading your posts here and there). Actual dumps from nxdumptool are either XCIs or NSPs, not individual NCAs, I doubt No-Intro will elect NCA as the go-to format when they reintroduce Switch DAT files. Since XCI brings almost nothing to the table and firmwares are readily available on the internet, I'm focusing on NSP hoping No-Intro will do the same, so being able to go back and forth between NSP and NSZ is what's important to me and I believe everyone in this thread. I'm glad this is now top of the list, thanks for the great piece of software :)

@nicoboss
Copy link
Owner

nicoboss commented Jul 7, 2023

@nitro322 and @alucryd This is fixed in the latest NSZ v4.3.0 release. The padding of the source NSP is now kept during compression/decompression making them always bit-identical (especially if --keep-delta is specified). The option --remove-padding was added to remove the padding of existing NSP/NSZ files to make them nxdumptool/no-intro compliant. This only fixes the issue for NSP/NSZ files. If there is interest to implement the same for XCI/XCZ please create a separate issue for it. XCZ is currently quite abandoned but with the plan of finally implementing the ability to mount block compressed NSZ/XCZ and so will likely spend some time working on XCZ anyways.

@nitro322
Copy link
Author

nitro322 commented Jul 7, 2023 via email

@alucryd
Copy link
Contributor

alucryd commented Jul 7, 2023

Awesome, thank you very much! Dumping NSPs here so that covers my needs. Will see if no-intro decides to use XCI for their future DAT files, in that case reversible XCZ would be nice to add to my rom manager as well.

@mid-kid
Copy link

mid-kid commented Jul 8, 2023

Chiming in to thank you for this! I agree with the notion that the only things worth preserving are the NCA containers, but it's not a very useful format without good tools to convert them back to installable packages. Given this lack of support currently, this is the next best thing, so thanks a lot!

@nitro322
Copy link
Author

@nicoboss - Have been testing on my NSPs and this new feature generally works great. Compared to the max XZ compression I was using previously, this brings the total file size of the set I've tested so far from 39 GB down to 34 GB, giving me roughly a 12% increase in compression over XZ. That's great!

However, of the 48 files I've tested so far, only 47 decompressed identically back to the original NSP. I have one update NSP that decompresses back to the same file size, but with a different checksum. Tested twice with the same result.

It seems to verify properly, and I don't see any indications of a problem other than the mismatched checksums. I used these arguments for compression:

nsz -C -l 22 -L -B -s 25 -K -t $(nproc) -V file.nsp

But I get the same result even with the defaults (just -C).

Not sure what information is appropriate to post here for debugging. Will reach out to you on gbatemp with more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request file format Issues with the NSZ file format
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants