Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are different catalogers disabled (by default) in different scenarios? #1776

Closed
jsquyres opened this issue May 3, 2023 · 5 comments
Closed
Labels
question Further information is requested

Comments

@jsquyres
Copy link
Contributor

jsquyres commented May 3, 2023

This is a (perhaps naïve) question, not a feature request.

We discovered what was admittedly documented (but we had missed this part in the docs): that container images and filesystem scans have a different default set of catalogers enabled. It's an easy-enough issue for us to resolve: we can use the --cataloger CLI option to get the behavior that we want.

But we're curious: why is there a difference in the default set of catalogers between these two scenarios? As a concrete example: the RPM file cataloger is disabled by default for images, but enabled by default for directory scans.

Granted, some catalogers are obviously not relevant in some scenarios (e.g., it's unlikely that there will be *.rpm files in docker images). But is there harm -- e.g., performance degradation / longer wall-clock execution run times -- in having all catalogers enabled for all scenarios?

Thanks for any enlightenment you can provide!

@jsquyres jsquyres added the enhancement New feature or request label May 3, 2023
@tgerla tgerla added this to OSS May 4, 2023
@tgerla tgerla added question Further information is requested and removed enhancement New feature or request labels May 18, 2023
@jsquyres
Copy link
Contributor Author

Bump. Can anyone share some knowledge here? Thanks!

@kzantow
Copy link
Contributor

kzantow commented May 19, 2023

@jsquyres apologies for the delay getting back to you. The reason there are different catalogers enabled is that there are some expectations about what we would find in a source scan vs. an image scan. An example is: during a source scan we'll find a package-lock.json and also a lot of package.json files in the node_modules directory. In order to try to avoid adding a bunch of packages that don't make a lot of sense, we have tried to organize the catalogers in such a way that things unexpected to find in directory scans or image scans are not included by default.

That said, we are also actively pursing a couple of improvements. The first of which is adding a "tagging" mechanism (part of PR #1383). The second of which is adding some functionality to the --catalogers flag/configuration that allows prefixing a cataloger name with + or - to include, or exclude catalogers respectively. Would this solve the problem(s) for you?

@jsquyres
Copy link
Contributor Author

An example is: during a source scan we'll find a package-lock.json and also a lot of package.json files in the node_modules directory.

I hear what you're saying, but let me ask my question in a slightly different way: just because you don't expect to find things via specific catalogers, is there a reason to disable them (by default)? E.g., is wall-clock execution time a concern? My assumption is that you would want to find everything that is in the filesystem / image, especially the things that you do not expect to be there. Is that a naïve perspective?

To be clear: I'm not debating the value of having a robust cataloger selection mechanism for those who want/need to have a specific set of catalogers. Having such functionality seems to be an obvious Good Thing.

@jsquyres
Copy link
Contributor Author

@kzantow Ping.

@tgerla
Copy link
Contributor

tgerla commented Jun 1, 2023

Hi @jsquyres, when we're scanning an image, we are assuming that package install steps are executed and so we're using a more narrow criteria to list packages that have been installed, for example, only reporting Python packages that have egg or wheel metadata files under a site-packages directory.

But when we scan a directory, we don't want to assume that the install steps have been run (we might be scanning a source repository), so we additionally include catalogers that will return results based on declared dependencies, for example, from Python requirements.txt.

This is the general philosophy we've used to come up with the default catalogers for images and for directories, but it may not always be the right set of catalogers for every circumstance. It is a bit of a judgement call on our part.

Hopefully this helps. If you'd like to discuss further, we'd be happy to chat with you on our Slack (https://get.anchore.com/join-anchore-community/) or our every-other-week community meeting (https://github.com/anchore/syft/#join-our-community-meetings).

tgerla added a commit to tgerla/syft that referenced this issue Jun 14, 2023
Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to anchore#1776.
@tgerla tgerla closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2023
@github-project-automation github-project-automation bot moved this to Done in OSS Jun 14, 2023
tgerla added a commit that referenced this issue Jun 20, 2023
Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776.
tgerla added a commit that referenced this issue Jun 20, 2023
Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776.

Signed-off-by: Timothy Gerla <tim@gerla.net>
wagoodman pushed a commit that referenced this issue Jun 20, 2023
…es (#1887)

Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776.

Signed-off-by: Timothy Gerla <tim@gerla.net>
GijsCalis pushed a commit to GijsCalis/syft that referenced this issue Feb 19, 2024
…es (anchore#1887)

Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to anchore#1776.

Signed-off-by: Timothy Gerla <tim@gerla.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Archived in project
Development

No branches or pull requests

3 participants