Why are different catalogers disabled (by default) in different scenarios? #1776

jsquyres · 2023-05-03T11:03:45Z

This is a (perhaps naïve) question, not a feature request.

We discovered what was admittedly documented (but we had missed this part in the docs): that container images and filesystem scans have a different default set of catalogers enabled. It's an easy-enough issue for us to resolve: we can use the --cataloger CLI option to get the behavior that we want.

But we're curious: why is there a difference in the default set of catalogers between these two scenarios? As a concrete example: the RPM file cataloger is disabled by default for images, but enabled by default for directory scans.

Granted, some catalogers are obviously not relevant in some scenarios (e.g., it's unlikely that there will be *.rpm files in docker images). But is there harm -- e.g., performance degradation / longer wall-clock execution run times -- in having all catalogers enabled for all scenarios?

Thanks for any enlightenment you can provide!

The text was updated successfully, but these errors were encountered:

jsquyres · 2023-05-19T17:05:50Z

Bump. Can anyone share some knowledge here? Thanks!

kzantow · 2023-05-19T17:32:23Z

@jsquyres apologies for the delay getting back to you. The reason there are different catalogers enabled is that there are some expectations about what we would find in a source scan vs. an image scan. An example is: during a source scan we'll find a package-lock.json and also a lot of package.json files in the node_modules directory. In order to try to avoid adding a bunch of packages that don't make a lot of sense, we have tried to organize the catalogers in such a way that things unexpected to find in directory scans or image scans are not included by default.

That said, we are also actively pursing a couple of improvements. The first of which is adding a "tagging" mechanism (part of PR #1383). The second of which is adding some functionality to the --catalogers flag/configuration that allows prefixing a cataloger name with + or - to include, or exclude catalogers respectively. Would this solve the problem(s) for you?

jsquyres · 2023-05-20T13:41:14Z

An example is: during a source scan we'll find a package-lock.json and also a lot of package.json files in the node_modules directory.

I hear what you're saying, but let me ask my question in a slightly different way: just because you don't expect to find things via specific catalogers, is there a reason to disable them (by default)? E.g., is wall-clock execution time a concern? My assumption is that you would want to find everything that is in the filesystem / image, especially the things that you do not expect to be there. Is that a naïve perspective?

To be clear: I'm not debating the value of having a robust cataloger selection mechanism for those who want/need to have a specific set of catalogers. Having such functionality seems to be an obvious Good Thing.

jsquyres · 2023-05-30T19:37:24Z

@kzantow Ping.

tgerla · 2023-06-01T21:07:24Z

Hi @jsquyres, when we're scanning an image, we are assuming that package install steps are executed and so we're using a more narrow criteria to list packages that have been installed, for example, only reporting Python packages that have egg or wheel metadata files under a site-packages directory.

But when we scan a directory, we don't want to assume that the install steps have been run (we might be scanning a source repository), so we additionally include catalogers that will return results based on declared dependencies, for example, from Python requirements.txt.

This is the general philosophy we've used to come up with the default catalogers for images and for directories, but it may not always be the right set of catalogers for every circumstance. It is a bit of a judgement call on our part.

Hopefully this helps. If you'd like to discuss further, we'd be happy to chat with you on our Slack (https://get.anchore.com/join-anchore-community/) or our every-other-week community meeting (https://github.com/anchore/syft/#join-our-community-meetings).

Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to anchore#1776.

Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776.

Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776. Signed-off-by: Timothy Gerla <tim@gerla.net>

…es (#1887) Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776. Signed-off-by: Timothy Gerla <tim@gerla.net>

…es (anchore#1887) Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to anchore#1776. Signed-off-by: Timothy Gerla <tim@gerla.net>

jsquyres added the enhancement New feature or request label May 3, 2023

tgerla added this to OSS May 4, 2023

tgerla added question Further information is requested and removed enhancement New feature or request labels May 18, 2023

jsquyres mentioned this issue May 19, 2023

Minor cataloger improvements for the CLI #1831

Closed

tgerla mentioned this issue Jun 14, 2023

docs: clarify reasoning of default catalogers for images or directories tgerla/syft#1

Closed

tgerla closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2023

github-project-automation bot moved this to Done in OSS Jun 14, 2023

tgerla mentioned this issue Jun 20, 2023

docs: clarify reasoning of default catalogers for images or directories #1887

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are different catalogers disabled (by default) in different scenarios? #1776

Why are different catalogers disabled (by default) in different scenarios? #1776

jsquyres commented May 3, 2023 •

edited

Loading

jsquyres commented May 19, 2023

kzantow commented May 19, 2023

jsquyres commented May 20, 2023

jsquyres commented May 30, 2023

tgerla commented Jun 1, 2023

Why are different catalogers disabled (by default) in different scenarios? #1776

Why are different catalogers disabled (by default) in different scenarios? #1776

Comments

jsquyres commented May 3, 2023 • edited Loading

jsquyres commented May 19, 2023

kzantow commented May 19, 2023

jsquyres commented May 20, 2023

jsquyres commented May 30, 2023

tgerla commented Jun 1, 2023

jsquyres commented May 3, 2023 •

edited

Loading