-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are different catalogers disabled (by default) in different scenarios? #1776
Comments
Bump. Can anyone share some knowledge here? Thanks! |
@jsquyres apologies for the delay getting back to you. The reason there are different catalogers enabled is that there are some expectations about what we would find in a source scan vs. an image scan. An example is: during a source scan we'll find a That said, we are also actively pursing a couple of improvements. The first of which is adding a "tagging" mechanism (part of PR #1383). The second of which is adding some functionality to the |
I hear what you're saying, but let me ask my question in a slightly different way: just because you don't expect to find things via specific catalogers, is there a reason to disable them (by default)? E.g., is wall-clock execution time a concern? My assumption is that you would want to find everything that is in the filesystem / image, especially the things that you do not expect to be there. Is that a naïve perspective? To be clear: I'm not debating the value of having a robust cataloger selection mechanism for those who want/need to have a specific set of catalogers. Having such functionality seems to be an obvious Good Thing. |
@kzantow Ping. |
Hi @jsquyres, when we're scanning an image, we are assuming that package install steps are executed and so we're using a more narrow criteria to list packages that have been installed, for example, only reporting Python packages that have egg or wheel metadata files under a site-packages directory. But when we scan a directory, we don't want to assume that the install steps have been run (we might be scanning a source repository), so we additionally include catalogers that will return results based on declared dependencies, for example, from Python requirements.txt. This is the general philosophy we've used to come up with the default catalogers for images and for directories, but it may not always be the right set of catalogers for every circumstance. It is a bit of a judgement call on our part. Hopefully this helps. If you'd like to discuss further, we'd be happy to chat with you on our Slack (https://get.anchore.com/join-anchore-community/) or our every-other-week community meeting (https://github.com/anchore/syft/#join-our-community-meetings). |
Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to anchore#1776.
Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776.
Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to #1776. Signed-off-by: Timothy Gerla <tim@gerla.net>
…es (anchore#1887) Add some explanation around why there are different default sets of catalogers for image scans versus directory scans. Hopefully clarify questions related to anchore#1776. Signed-off-by: Timothy Gerla <tim@gerla.net>
This is a (perhaps naïve) question, not a feature request.
We discovered what was admittedly documented (but we had missed this part in the docs): that container images and filesystem scans have a different default set of catalogers enabled. It's an easy-enough issue for us to resolve: we can use the
--cataloger
CLI option to get the behavior that we want.But we're curious: why is there a difference in the default set of catalogers between these two scenarios? As a concrete example: the RPM file cataloger is disabled by default for images, but enabled by default for directory scans.
Granted, some catalogers are obviously not relevant in some scenarios (e.g., it's unlikely that there will be
*.rpm
files in docker images). But is there harm -- e.g., performance degradation / longer wall-clock execution run times -- in having all catalogers enabled for all scenarios?Thanks for any enlightenment you can provide!
The text was updated successfully, but these errors were encountered: