Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZIP support spec #2270

Merged
merged 12 commits into from
Aug 23, 2022
Merged

ZIP support spec #2270

merged 12 commits into from
Aug 23, 2022

Conversation

ryfu-msft
Copy link
Contributor

@ryfu-msft ryfu-msft commented Jun 22, 2022

Related to:
#140

Changes:

  • Added a spec doc for detailing how ZIP installers will be supported.
Microsoft Reviewers: Open in CodeFlow

@ryfu-msft ryfu-msft requested a review from a team as a code owner June 22, 2022 18:31

2. If a ZIP archive file contains nested archive files, we will support decompressing up to a maximum of 3 layers (2 additional layers including the initial ZIP file).

If either of these checks fail, the process will terminate and a warning will be displayed to the user. The user can include `--force` to bypass this check and continue installation at their own risk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If either of these checks fail, the process will terminate and a warning will be displayed to the user. The user can include `--force` to bypass this check and continue installation at their own risk.
If either of these checks fail, the process will terminate and a warning will be displayed to the user. The user can include `--override-security-check` to bypass this check and continue installation at their own risk.

As has been brought up by @JohnMcPMS , we should not be using a ubiquitous argument such as --force for security related overrides. Although my suggestion may not be the exact right argument, I feel that avoiding the use of force here minimizes future rework if/when the hash override parameter is changed

Copy link

@RokeJulianLockhart RokeJulianLockhart Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryfu-msft, why solely 3 layers? That decision appears arbitrary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why solely 3 layers? That decision appears arbitrary.

Although it is likely "arbitrary", it is actually an embodiment of good practices. While there are times that putting zipped files inside of zipped files does make sense, it is extremely rare that it is necessary to go more than one layer beyond that. In fact, in this thread Nosredna talks about how compressing a file too many times actually leads to the archive being bigger than it originally was.

So a first layer of compression, which is putting a file inside of a zip, makes a lot of sense. In some cases, a second layer of compression, which is putting a zip inside of a zip, also makes sense, especially when files have a lot of duplicate data. From there, it doesn't really save much extra space to go to that third layer of compression. It can be useful in some situations, but doesn't generally do much. Fourth, fifth, and even more layers of compression are actually more likely to be counterproductive than anything.

Because one layer is common, two layers is not unheard of, and three layers is rare, it doesn't make sense for winget to support more than 3 layers. Additionally, supporting a maximum of 3 layers will force developers to not go overboard on the compression if they want their package to be available in winget

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BEEDELLROKEJULIANLOCKHART, it is relatively arbitrary. I decided to opt for a conservative number as it doesn't really make sense to support more than a 3rd layer as @Trenly has explained. If the community strongly feels that 3 is too restrictive, we can increase it but for now I feel that 3 is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated suggested argument for overriding zip malware security checks. We also decided to leverage a zip file malware checker for scanning zip files rather than attempting to write our own.


1. For any archive file, we will start by using Windows Shell APIs which treat ZIP files as a Folder object that can be utilized for viewing the contents (we are not extracting to a specified location, yet). We will iterate through each item and determine the cumulative sum of the sizes of all items contained in the zip and compare that value with the size of the ZIP file. **The check will verify that the compression ratio is no greater than 5:1 (20%)**. Any file that has a larger compression ratio (<20%) will be deemed as "suspicious" and a warning will be displayed to the user. Lossless compression of common data files tend to achieve compression ratios of ~2:1 (50%). Any extraction or copying of files will not take place until this check is passed.

2. If a ZIP archive file contains nested archive files, we will support decompressing up to a maximum of 3 layers (2 additional layers including the initial ZIP file).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this recursive decompressing talking only about the threat detection, or is it part of the installation?

If part of the installation, how do you intend to use decompressing multiple nested archives?
For example:
Would I be able to specify in the manifest that I want to run the installer /archive.zip/setup.exe? Where would you decompress the nested archives to? What if I gave you a zip containing setup.exe and data.zip, and setup.exe used the compressed data.zip; would you still decompress it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be part of installation. If the zip file is safe, all nested archives would be extracted within the same folder.

For example, if foo.zip contained bar.zip which contained setup.exe. I would expect the full decompression to be performed on foo.zip to result in the folders /foo/bar/setup.exe. There are no plans to add support for an installer that depends on a compressed zip in our initial implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Just to be annoying bringing up edge cases...
Imagine you have foo.zip contains bar.zip which contains setup.exe. But foo.zip also contains bar/setup.exe (setup.exe inside a directory called bar). Both would be decompressed to bar/setup.exe. What do we do then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setup.exe with data.zip is a valid scenario. Maybe we should only unzip the top layer as default and add some manifest entries to let us know we should unzip additional layers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Just to be annoying bringing up edge cases... Imagine you have foo.zip contains bar.zip which contains setup.exe. But foo.zip also contains bar/setup.exe (setup.exe inside a directory called bar). Both would be decompressed to bar/setup.exe. What do we do then?

Not annoying :) I like Yao's suggestion to include a manifest entry like fullDecompression = true to indicate whether we decompress additional layers. If there is already an existing folder with the same name, maybe the decompressed folder would append a string to the end of the name like "bar (1)" to signify that there is a duplicate folder. This might be slighly confusing but my hope is that there wouldn't be any nested zips that conflict with nested folders that contain the same files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a use case for extracting nested archives? It seems more likely to break things than help, in the event that a data.zip file exists. I would say that until a concrete example exists, we should start with only extracting the top level archive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added explanation to specify that initial implementation will only extract the top-level archive to support the scenarios stated above

2. **Single-Layered (non-recursive)**
A ZIP bomb that expands fully after a single round of decompression. The extremely high compression ratio of the ZIP bomb is achieved by overlapping files within the ZIP container.

In order to protect our users from these possible threats, we will need to implement several basic checks to verify whether it is safe to proceed with the contents of the ZIP archive file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to prevent these threats on other installer types? For example, imagine I create an EXE with a ZIP bomb embedded and have the EXE decompress it. Would we have the same issues we are trying to avoid here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to prevent these threats on other installer types? For example, imagine I create an EXE with a ZIP bomb embedded and have the EXE decompress it. Would we have the same issues we are trying to avoid here?

Thats why we have the pipeline validation, isn’t it? If an exe were a zip bomb, it would hang the VM running the validation and trigger an Internal-Error, assuming it made it past the AV scan in the first place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's part of it, but we would like to be able to offer some protection for users accessing third party REST sources as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That brings up a good point, this scenario would need to additionally scan/verify whether the provided EXE has an embedded zip file, and once that is determined, we could apply the checks here. I do feel like scanning installers (as opposed to scanning a zip file before extracting to obtain an installer) seems to be out of the scope of winget.

Copy link

@RokeJulianLockhart RokeJulianLockhart Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryfu-msft, scanning what winget's default repository provides is certainly not out of its scope, because every Linux distribution's repositories do that, and all of the software that is available via the msstore repository has been, too.

Additionally, Microsoft's threat-detection services are so brilliant, not utilizing them would be really weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryfu-msft, scanning what winget's default repository provides is certainly not out of its scope, because every Linux distribution's repositories do that, and all of the software that is available via the msstore repository has been, too.

Additionally, Microsoft's threat-detection services are so brilliant, not utilizing them would be really weird.

Thats not what Ryan was saying at all. The winget-pkgs repository does have all of the packages scanned before they can be added. What Ryan was saying is that private sources and third-party REST sources are outside of the scope of winget in terms of scanning the installer files they provide for security.

You’re also conflating “Package manager” with “Package source”. On Linux especially, apt and snap are package managers, but they do not scan the entire distribution network as you said. Rather, they only fetch the package from the source - and in order to get into the source the package has to be scanned. The difference with apt, snap, and msstore is that they also host the binary files. Winget-pkgs does not host the binaries, instead it hosts the manifests which point to where to download the binaries and how to install them.

Apt and snap also allow users to host their own private repositories, and similar to winget, there is no guarantee that packages from those repositories are safe. This is because, like winget, apt and snap are just package managers - they get information about the package and have the ability to install, uninstall, and upgrade the package. It is, and almost always has been, the responsibility of the package source, or package repository, to maintain the integrity of their packages. The package managers do very little to ensure security outside of verifying that the installer is the correct installer and that the user has permission to perform the action which is being requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will now leverage a zip malware checking library to scan zip files rather than implementing our own malware detection.

@github-actions

This comment has been minimized.

@JohnMcPMS JohnMcPMS mentioned this pull request Aug 9, 2022
@ryfu-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@ryfu-msft ryfu-msft merged commit 45a3867 into microsoft:master Aug 23, 2022
@ryfu-msft ryfu-msft deleted the zipSpec branch June 7, 2023 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants