-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Top-level API should be more composable #558
Comments
Adding my two cents:
|
Example of using syft as a lib today: https://gist.github.com/wagoodman/57ed59a6d57600c23913071b8470175b |
From refinement:
|
Update: much of this has been addressed in #864 Also, I think we should distinguish scoped capabilities of syft vs the pure definitions needed to interop with primitives that syft works with. An example: Syft can encode and decode various formats (spdx, cyclonedx, etc). Additionally, syft has the definition of what a These are probably not strongly separated enough in today's API. The definition of a format should be more closely related to that of an SBOM, since that context is semantically useful (as opposed to format on it's own). For example Dealing with the capabilities of syft, this should probably be elevated to a top-level package concern and away from the Format definitions. That is Based on some of the above points, here is a suggested update to the syft API: // in the top-level "syft" package
// migrated from today's "format" package
func FormatByOption(option FormatOption) sbom.Format
func FormatByName(name string) sbom.Format
// migrated from today's "format" package
func IdentifyFormat(by []byte) sbom.Format
func Encode(s sbom.SBOM, f sbom.Format) ([]byte, error)
func Decode(reader io.Reader) (*sbom.SBOM, sbom.Format, error) Edit: in case there are functions within syft that may require encoding/decoding functionality, this should be migrated to it's own package (say in |
A topic that was brought up by @westonsteimel and @luhring was how syft versions are expressed in SBOMs when using syft as a lib. Today we have a Ideally the syft API should be able to raise up this information automatically but additionally allow for override in case wrapping tooling prefers to provide a different name and version. @luhring suggested that we could use the runtime buildinfo section from the running binary to provide this information. I think this is a good idea, however we need to account for the edge case when the buildinfo section is stripped from the binary too. |
Today we have this function call at the top-level syft package: func CatalogPackages(src *source.Source, cfg cataloger.Config) (*pkg.Catalog, []artifact.Relationship, *linux.Release, error) A few points about this:
It would be great to move towards something like this: // where "options" would be declarative and that augment/select cataloging capabilities
func CatalogSource(src *source.Source, /*... options...*/ ) (*sbom.SBOM, error) or maybe injection of pre-constructed catalogers func CatalogSource(src *source.Source, catalogers...) (*sbom.SBOM, error) The downside with the second suggestion:
... which makes me lean towards the first suggestion. The upside of taking
We could additionally introduce a single call that would wrap source creation and cataloging together into one call (which would use func Catalog(input string, /*... cataloger options...*/) (*sbom.SBOM, error) This could be the "one stop shop" call that the |
Next up: what kind of cataloger options are we looking for here? Short answer: functional options that mostly represent existing configuration options seem best fit, as this allows for flexibility, sane defaults, and isn't mutually exclusive to a user-provided struct instance approach. Let's assume we have something that looks like: func Catalog(src *source.Source, options ...CatalogingOption) (*sbom.SBOM, error) Where type CatalogingOption func(*source.Source, *CatalogingConfig) error I propose the following options as a start: # run all catalogers in serial fashion (default is to run in parallel)
WithoutConcurrency() CatalogingOption
# set the file resolver scope for all catalogers (except the secrets cataloger)
WithScope(scope source.Scope) CatalogingOption
# the tool name and version to show in the SBOM descriptor block (defaults to syft and current library version if not provided)
WithToolIdentification(name, version string) CatalogingOption
# in the syft-json format, we allow for outputting application configuration (defaults to empty)
WithToolConfiguration(c interface{}) CatalogingOption
# set the package catalogers you wish to use (must be pre-constructed)
WithPackageCatalogers(catalogers ...pkg.Cataloger) CatalogingOption
# use the package catalogers suited by the source provided when cataloging (default is to use this over `WithPackageCatalogers`)
WithDefaultPackageCatalogers(cfg packages.SearchConfig) CatalogingOption
# enable the file metadata cataloger (default is off)
WithFileMetadata() CatalogingOption
# enable the file digests cataloger (default is off)
WithFileDigests(hashes ...crypto.Hash) CatalogingOption
# enable the secrets cataloger (default is off)... the argument is a struct with all of todays cataloger options
WithSecrets(secretConfig *file.SecretsCatalogerConfig) CatalogingOption
# enable the file classification cataloger with default classifiers (default is off)
WithFileClassification() CatalogingOption
# set the specific file classifiers, and enables the file classification (default is off)
WithFileClassifiers(classifiers ...file.Classifier) CatalogingOption
# enable the file content cataloger (default is off)
WithFileContents(globs ...string) CatalogingOption
# set the file size limits for the contents and secrets catalogers (default is 1 MB for both)
WithFileSizeLimit(byteLimit int64) CatalogingOption Additionally, in case an API user has a way to generate a # override an entire config
WithConfig(override CatalogingConfig) CatalogingOption This suits well for situations where you wish to bring your own defaults, but still allow for augmentation via more functional options. This is also well suited for syft, the application, where we can build our own Using these options within the syft application would look something like this: func generateSBOM(src *source.Source) (*sbom.SBOM, error) {
catalogingConfig, err := appConfig.ToCatalogingConfig()
if err != nil {
return nil, err
}
return syft.Catalog(src,
syft.WithConfig(*catalogingConfig),
)
} Though other external users may choose to use it with defaults and augment behavior # catalog all packages, file metadata, and digests
syft.Catalog(src,
syft.WithFileMetadata(),
syft.WithFileDigests(crypto.SHA256),
) Questions:
|
Separate from the options (but related) is the configuration object used to describe behavior. Based off of the options and todays configuration, I propose a type CatalogingConfig struct {
// tool-specific information
ToolName string
ToolVersion string
ToolConfiguration interface{}
// applies to all catalogers
Scope source.Scope
ProcessTasksInSerial bool
// package
PackageCatalogers []pkg.Cataloger
// file metadata
CaptureFileMetadata bool
DigestHashes []crypto.Hash
// secrets
CaptureSecrets bool
SecretsConfig file.SecretsCatalogerConfig // struct derived from todays secrets cataloger options
SecretsScope source.Scope
// file classification
ClassifyFiles bool
FileClassifiers []file.Classifier
// file contents
ContentsConfig file.ContentsCatalogerConfig // struct derived from todays contents cataloger options
} The idea is that all field types are useful in their current form and require no further processing. For example, With the default options that can be fetched: func DefaultCatalogingConfig() CatalogingConfig {
return CatalogingConfig{
Scope: source.SquashedScope,
ToolName: internal.ApplicationName,
ToolVersion: extractSyftVersionFromLib(), // todo: this function does not exist, but differs from version.FromBuild()
SecretsScope: source.AllLayersScope,
SecretsConfig: file.DefaultSecretsCatalogerConfig(), // similar concept, but provided next to the cataloger
FileClassifiers: file.DefaultClassifiers(), // similar concept, but provided next to the cataloger
ContentsConfig: file.DefaultContentsCatalogerConfig(), // similar concept, but provided next to the cataloger
}
} Why export the config and fields since we already have options? This enables someone being able to provide their own config and options as they see fit (using |
from refinement:
|
This is a great initiative, can't wait for it! I have a use-case for defining a custom package cataloger. Pre-construction is achievable since the Cataloger interface is exposed. However, not sure if there's a way to extend (not override) the list of default package catalogers with the proposed API. |
@lyzs90 there is one way with the proposed API, but I admit it isn't straight forward: config := syft.DefaultCatalogingConfig()
config.PackageCatalogers = append(config.PackageCatalogers, <your catalogers>...)
syft.Catalog(
syft.WithConfig(config),
//... any other overriding options
) We could add an additional helper which would append and not override the list of package catalogers: // current proposed "set/override" operation
WithPackageCatalogers(catalogers ...pkg.Cataloger) CatalogingOption
// possible new "append" operation
WithAdditionalPackageCatalogers(catalogers ...pkg.Cataloger) CatalogingOption How does that sound? |
@wagoodman Now that I think of it, |
update: see #558 (comment) for a more up to date suggestion Most of this conversation has been about the Here is the current cataloger organization today:
This is good, but there is room for improvement on what the package names convey:
This culminates to a final tree that looks like so:
Lastly, providing a helper function that ties together source schemes and sets of package catalogers could be provided within the package packages
func CatalogersBySourceScheme(scheme source.Scheme, cfg SearchConfig) []pkg.Cataloger {
switch scheme {
case source.ImageScheme:
return InstalledCatalogers(cfg)
case source.FileScheme:
return AllCatalogers(cfg)
case source.DirectoryScheme:
return IndexCatalogers(cfg)
}
return nil
} |
In a similar vein to the last comment about package organization, I wanted to touch on the differences between the The current organization of
The current organization of
Ideally the organization of
The final layout would look like:
This paves the way to do a few more things:
|
Focusing on the relationship between the
Take for example the definition of type Artifacts struct {
PackageCatalog *pkg.Catalog
FileMetadata map[source.Coordinates]source.FileMetadata
FileDigests map[source.Coordinates][]file.Digest
FileClassifications map[source.Coordinates][]file.Classification
FileContents map[source.Coordinates]string
Secrets map[source.Coordinates][]file.SearchResult
LinuxDistribution *linux.Release
} With the proposed type Artifacts struct {
PackageCatalog *pkg.Catalog
FileMetadata map[file.Coordinates]file.Metadata
FileDigests map[file.Coordinates][]file.Digest
FileClassifications map[file.Coordinates][]file.Classification
FileContents map[file.Coordinates]string
Secrets map[file.Coordinates][]file.SearchResult
LinuxDistribution *linux.Release
} Lastly, the same migrations should apply to file abstractions as well:
The advantage to this last suggestion is that catalogers would only depend on objects in a package that has no relation to stereoscope image or source packages. |
Focusing on the
|
I've made a mockup of the proposed changes in the above comments, feel free to try it out here https://github.com/anchore/syft/tree/api-wip (this is just for illustration and prototyping, not for merging or cleaning up) |
Incorporating comments from refinement The proposed package structure for
Given these points, and combining with the original points in other comments (e.g. package names should communicate what they operate on) here is an alternative proposed package structure:
Specific changes:
|
Hi guys,
|
Summary of tasks Group: organize definitions (details in
Group: organize catalogers (details in
Group: new top-level API
|
@samj1912 if you have more thoughts from the community meeting re: using syft as a library please feel free to add them here |
After reading through this ticket, I have a few ideas & questions that I'd want to raise. First off, I generally like the structure as proposed in #558 (comment). However, I find the usage of
Also, maybe I've missed this, but what will happen to the currently existing My suggestion would be:
I went quite heavy on the renaming here, but in general feel free to additionally, I've added the top-level
And I could imagine the API being pretty much what is proposed here, which would result in I'm also curious to hear why the name "Cataloger", as to me "Scanner" is the term that comes to mind first for me. To make an example, I'd imagine a book cataloger to look at the cover & back, and put it into a specific section, while a scanner would actually check the contents of the book. That analogy also holds true for the files / artifacts that we're talking about here, I'd say. |
Thanks @tommyknows for your thoughts! I've been starting to get some of these changes in recently with #1383 (just part of this issue) so good timing!
me too... this was an attempt to separate definitions and capabilities, which helps tremendously when avoiding package cycles. That being said, I think there is another approach we can take here (see the update below)
I didn't explicitly have that in the tree, but it would remain in the
This came out of a necessity since
Today we use the term artifact to mean "something that was cataloged" (in a noun-less way)... so we catalog file metadata, digests, secrets (soon to be removed), packages, etc... each of these are normalized by their ability to be identified and related to one another. The only other spot this is inconsistent with is the syft json output where we have a top-level I'm quite hesitant to make such a large change with this issue (which helps us get to 1.0) but am open to talking about how it could change in the future.
Yeah, naming is hard! Scanner came up during initial development, but when trying to craft a package searching interface we wanted the nouns and verbs to be self-descriptive and as specific as possible. The verb "scan" doesn't say what the ultimate action is and can be used to generically describe any "scanning-like" action... where as "catalog" had a more specific connotation. With the raw definitions:
I think where the term
Agreed! We ended up going with something like this in #1383 , happy for opinions / comments! Based off of some of your comments, other internal comments, and time passing and thinking about it again... here's an updated package organization proposal:
|
Thanks a lot for the detailed feedback @wagoodman, very nice write-up! With that background, I definitely agree with what you're saying and proposing. |
Adding two cents as I've just looked again at using Syft as a lib with "fresh eyes" 😄 . I like the idea of For consumers, I think that's "step 2" in the flow. The I have two major thoughts so far... that I want to be sure there are no "smarts" being invoked, and that I want to learn about as few proprietary or library-specific concepts as possible before I can create the sbom. Smarts vs. no smartsThis is how I think about it: The Syft CLI experience has a lot of "smarts". It will do things like auto-detect the availability of the Docker daemon API in the environment, and use that when it can. It can parse "schemes" like As a lib consumer, I don't want smarts. I'm usually writing a program where I want the flow to be as simple and reliable as possible, and I want to explicitly opt-in to any more complicated behavior if I ever need it — since I'm going to have to be the one to support the experience for my users. As a concrete example, I'm interested in writing a tool that uses Syft (as a lib) to detect dark matter in container images. I want a straightforward flow — given an image reference (e.g. Minimizing the number of custom objects to learn about before I can do a thingI spent an hour or so this past weekend to try to figure out how I'd construct a I see functions like: func New(in Input, registryOptions *image.RegistryOptions, exclusions []string) (*Source, func(), error) func NewFromRegistry(in Input, registryOptions *image.RegistryOptions, exclusions []string) (*Source, func(), error) So before I can even begin constructing a "source", I need to figure a few more things out, like this Instead, I'd love 😍 for my journey to start with something like: src, err := source.NewFromRemoteImageRef("registry.fun/foo@sha:bar123") And then I'm good to call "make the SBOM!". |
@luhring I've got some draft code that we could talk through conceptually (with some options not entirely figured yet)... think of this as a conversation starter (with a couple more fictitious source objects thrown in for good measure): package source
import (
"github.com/anchore/stereoscope/pkg/image"
"github.com/anchore/syft/syft/artifact"
v1 "github.com/google/go-containerregistry/pkg/v1"
)
type StereoscopeImageConfig struct {
Reference string
From image.Source
Platform *image.Platform
RegistryOptions *image.RegistryOptions // TODO: takes platform? as string?
// name?
}
type GGCRImageConfig struct {
Image v1.Image
ContentPath string
AdditionalMetadata []image.AdditionalMetadata
// name?
}
// the below configs could just be simple constructor args, but I kept it as a config to start...
type DirectoryConfig struct {
Path string
// name?
// root?
}
type FileConfig struct {
Path string
// name?
// root?
}
type GitConfig struct {
URI string // url or path
// name?
// root?
}
func NewFromDirectory(cfg DirectoryConfig) (Source, error) {
}
func NewFromFile(cfg FileConfig) (Source, error) {
}
func NewFromImage(cfg StereoscopeImageConfig) (Source, error) {
}
func NewFromGGCRImage(cfg GGCRImageConfig) (Source, error) {
}
func NewFromGit(cfg GitConfig) (Source, error) {
}
type StereoscopeImageSource struct {
// ...
// implements ImageInterpreter
}
type GGCRImageSource struct {
// ...
// implements ImageInterpreter
}
type DirectorySource struct {
// ...
// implements PathInterpreter
}
type FileSource struct {
// ...
// implements PathInterpreter
}
type GitSource struct {
// ...
// implements GitInterpreter
}
type ImageMetadata struct {
UserInput string `json:"userInput"`
ID string `json:"imageID"`
ManifestDigest string `json:"manifestDigest"`
MediaType string `json:"mediaType"`
Tags []string `json:"tags"`
Size int64 `json:"imageSize"`
Layers []LayerMetadata `json:"layers"`
RawManifest []byte `json:"manifest"`
RawConfig []byte `json:"config"`
RepoDigests []string `json:"repoDigests"`
Architecture string `json:"architecture"`
Variant string `json:"architectureVariant,omitempty"`
OS string `json:"os"`
}
// LayerMetadata represents all static metadata that defines what a container image layer is.
type LayerMetadata struct {
MediaType string `json:"mediaType"`
Digest string `json:"digest"`
Size int64 `json:"size"`
}
type GitMetadata struct {
Commit string
// ...
}
type Source interface {
artifact.Identifiable
FileResolver(Scope) (FileResolver, error)
}
type ImageInterpreter interface {
Metadata() ImageMetadata
}
type PathInterpreter interface {
Path() string
}
type GitInterpreter interface {
Metadata() GitMetadata
} |
What would you like to be added:
More reusable primitives when syft is used as a library. This would be able to do at least the following tasks:
sbom.Document
(as proposed in Promote cataloging task pattern #554)This should probably be done after (that is, this issue is most likely related to...):
Why is this needed:
The existing top-level API functions are for a full parse-catalog processing unit, where as separating parsing and cataloging into separate functions is useful if you want to reuse the same parsed image (which takes a while) with different catalogers calls (which relatively speaking doesn't take that long).
This would additionally improve interoperability with
anchorectl
.Tasks
The text was updated successfully, but these errors were encountered: