Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace core SBOM-creation API with builder pattern #1383

Merged
merged 35 commits into from
Jan 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
be31e1e
remove existing cataloging API
wagoodman Dec 15, 2023
4e224b5
add file cataloging config
wagoodman Dec 15, 2023
4172a41
add package cataloging config
wagoodman Dec 15, 2023
afc5783
add configs for cross-cutting concerns
wagoodman Dec 15, 2023
f511367
rename CLI option configs to not require import aliases later
wagoodman Dec 15, 2023
f36de9d
update all nested structs for the Catalog struct
wagoodman Dec 15, 2023
768e232
update Catalog cli options
wagoodman Dec 15, 2023
143b0f6
migrate relationship capabilities to separate internal package
wagoodman Dec 15, 2023
e5c582f
refactor golang cataloger to use configuration options when creating …
wagoodman Dec 15, 2023
62b19c2
create internal object to facilitate reading from and writing to an SBOM
wagoodman Dec 15, 2023
839b017
create a command-like object (task) to facilitate partial SBOM creation
wagoodman Dec 15, 2023
44d8543
add cataloger selection capability
wagoodman Dec 15, 2023
87f3eac
add package, file, and environment related tasks
wagoodman Dec 15, 2023
4848648
update existing file catalogers to use nested UI elements
wagoodman Dec 15, 2023
0ed13db
add CreateSBOMConfig that drives the SBOM creation process
wagoodman Dec 15, 2023
b811336
capture SBOM creation info as a struct
wagoodman Dec 15, 2023
a5fe920
add CreateSBOM() function
wagoodman Dec 15, 2023
a3a3961
fix tests
wagoodman Dec 15, 2023
473605c
update docs with SBOM selection help + breaking changes
wagoodman Dec 15, 2023
63c23e2
fix multiple override default inputs
wagoodman Dec 16, 2023
2550e62
fix deprecation flag printing to stdout
wagoodman Dec 16, 2023
098fbd7
refactor cataloger selection description to separate object
wagoodman Jan 3, 2024
208333c
address review comments
wagoodman Jan 3, 2024
e561879
keep expression errors and show specific suggestions only
wagoodman Jan 3, 2024
3f38495
address additional review feedback
wagoodman Jan 10, 2024
dbfcf26
Merge remote-tracking branch 'origin/main' into refactor-cataloging-api
wagoodman Jan 10, 2024
81fa9b2
address more review comments
wagoodman Jan 11, 2024
9fcbbef
addressed additional PR review feedback
wagoodman Jan 11, 2024
81d621b
fix file selection references
wagoodman Jan 11, 2024
af42ef5
remove guess language data generation option
wagoodman Jan 11, 2024
498870d
add tests for coordinatesForSelection
wagoodman Jan 12, 2024
5628045
rename relationship attributes
wagoodman Jan 12, 2024
f8626b1
add descriptions to relationships config fields
wagoodman Jan 12, 2024
f4fb2e1
improve documentation around configuration options
wagoodman Jan 12, 2024
55b4c1d
add explicit errors around legacy config entries
wagoodman Jan 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
280 changes: 161 additions & 119 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,21 +109,21 @@ nix-shell -p syft

To generate an SBOM for a container image:

```
```bash
syft <image>
```

The above output includes only software that is visible in the container (i.e., the squashed representation of the image). To include software from all image layers in the SBOM, regardless of its presence in the final image, provide `--scope all-layers`:

```
```bash
syft <image> --scope all-layers
```

### Supported sources

Syft can generate an SBOM from a variety of sources:

```
```bash
# catalog a container image archive (from the result of `docker image save ...`, `podman save ...`, or `skopeo copy` commands)
syft path/to/image.tar

Expand Down Expand Up @@ -155,69 +155,114 @@ If docker is not present, then the Podman daemon is attempted next, followed by

This default behavior can be overridden with the `default-image-pull-source` configuration option (See [Configuration](https://github.com/anchore/syft#configuration) for more details).

### Default Cataloger Configuration by scan type

Syft uses different default sets of catalogers depending on what it is scanning: a container image or a directory on disk. The default catalogers for an image scan assumes that package installation steps have already been completed. For example, Syft will identify Python packages that have egg or wheel metadata files under a site-packages directory, since this indicates software actually installed on an image.

However, if you are scanning a directory, Syft doesn't assume that all relevant software is installed, and will use catalogers that can identify declared dependencies that may not yet be installed on the final system: for example, dependencies listed in a Python requirements.txt.

You can override the list of enabled/disabled catalogers by using the "catalogers" keyword in the [Syft configuration file](https://github.com/anchore/syft#configuration).

##### Image Scanning:
- alpmdb
- apkdb
- binary
- dotnet-deps
- dotnet-portable-executable
- dpkgdb
- go-module-binary
- graalvm-native-image
- java
- javascript-package
- linux-kernel
- nix-store
- php-composer-installed
- portage
- python-package
- rpm-db
- ruby-gemspec
- sbom

##### Directory Scanning:
- alpmdb
- apkdb
- binary
- cocoapods
- conan
- dartlang-lock
- dotnet-deps
- dotnet-portable-executable
- dpkgdb
- elixir-mix-lock
- erlang-rebar-lock
- go-mod-file
- go-module-binary
- graalvm-native-image
- haskell
- java
- java-gradle-lockfile
- java-pom
- javascript-lock
- linux-kernel
- nix-store
- php-composer-lock
- portage
- python-index
- python-package
- rpm-db
- rpm-file
- ruby-gemfile
- rust-cargo-lock
- sbom
- swift-package-manager

##### Non Default:
- cargo-auditable-binary

### File selection

By default, Syft will catalog file details and digests for files that are owned by discovered packages. You can change this behavior by using the `SYFT_FILE_METADATA_SELECTION` environment variable or the `file.metadata.selection` configuration option. The options are:

- `all`: capture all files from the search space
- `owned-by-package`: capture only files owned by packages (default)
- `none`: disable capturing any file information


### Package cataloger selection

#### Concepts

> [!IMPORTANT]
> Syft uses a different set of catalogers by default when scanning files directly than it does when scanning images

The catalogers for an image scan assumes that package installation steps have already been completed. For example, Syft will identify Python packages that have egg or wheel metadata files under a `site-packages` directory, since this is how the canonical tooling `pip` installs python packages.

The catalogers for a directory scan will look for installed software as well as declared dependencies that are not necessarily installed. For example, dependencies listed in a Python `requirements.txt`.

This default set of catalogers being dynamic is critical as this allows Syft to be used in a variety of contexts while still generating accurate SBOMs.
Overriding the set of default catalogers is not recommended for most purposes, however, is possible if needed.

Catalogers can be referenced in two different ways:
- *by name*: the exact cataloger name (e.g. `java-pom-cataloger` or `java-archive-cataloger`)
- *by tag*: a tag that is associated with a cataloger (e.g. `java`)

Syft can take lists of references on the CLI or in the application configuration to define which catalogers to use.

You can **set** the list of catalogers explicitly to use with the `--override-default-catalogers` CLI flag, accepting a comma-separated list of cataloger names or tags.

You can also **add** to, **remove** from, or **sub-select** catalogers to use within the default set of catalogers by using the `--select-catalogers` CLI flag.
- To **sub-select** catalogers simply provide a tag (e.g. `--select-catalogers TAG`). Catalogers will always be selected from the default set of catalogers (e.g. `--select-catalogers java,go` will select all the `java` catalogers in the default set and all the `go` catalogers in the default set).
- To **add** a cataloger prefix the cataloger name with `+` (e.g. `--select-catalogers +NAME`). Added catalogers will _always be added_ regardless of removals, filtering, or other defaults.
- To **remove** a cataloger prefix the cataloger name or tag with `-` (e.g. `--select-catalogers -NAME_OR_TAG`). Catalogers are removed from the set of default catalogers after processing any sub-selections.

These rules and the dynamic default cataloger sets approximates to the following logic:

```
image_catalogers = all_catalogers AND catalogers_tagged("image")

directory_catalogers = all_catalogers AND catalogers_tagged("directory")

default_catalogers = image_catalogers OR directory_catalogers

sub_selected_catalogers = default_catalogers INTERSECT catalogers_tagged(TAG) [ UNION sub_selected_catalogers ... ]

base_catalogers = default_catalogers OR sub_selected_catalogers

final_set = (base_catalogers SUBTRACT removed_catalogers) UNION added_catalogers
```


#### Examples

Only scan for python related packages with catalogers appropriate for the source type (image or directory):
```bash
syft <some container image> --select-catalogers "python"
# results in the following catalogers being used:
# - python-installed-package-cataloger
```

Same command, but the set of catalogers changes based on what is being analyzed (in this case a directory):
```bash
syft <a directory> --select-catalogers "python"
# results in the following catalogers being used:
# - python-installed-package-cataloger
# - python-package-cataloger
```

Use the default set of catalogers and add a cataloger to the set:
```bash
syft ... --catalogers "+sbom-cataloger"
```

Use the default set of catalogers but remove any catalogers that deal with RPMs:
```bash
syft ... --catalogers "-rpm"
```

Only scan with catalogers that:
- are tagged with "go"
- always use the sbom-cataloger
- are appropriate for the source type (image or directory)

```bash
syft <some container image> --select-catalogers "go,+sbom-cataloger"
# results in the following catalogers being used:
# - go-module-binary-cataloger
# - sbom-cataloger
```

Scan with all catalogers that deal with binary analysis, regardless of the source type:
```bash
syft ... --override-default-catalogers "binary"
# results in the following catalogers being used:
# - binary-cataloger
# - cargo-auditable-binary-cataloger
# - dotnet-portable-executable-cataloger
# - go-module-binary-cataloger
```

Only scan with the specific `go-module-binary-cataloger` and `go-module-file-cataloger` catalogers:
```bash
syft ... --override-default-catalogers "go-module-binary-cataloger,go-module-file-cataloger"
```


### Excluding file paths

Expand Down Expand Up @@ -468,21 +513,18 @@ output: "syft-table"
# same as -q ; SYFT_QUIET env var
quiet: false
wagoodman marked this conversation as resolved.
Show resolved Hide resolved

# same as --file; write output report to a file (default is to write to stdout)
file: ""

# enable/disable checking for application updates on startup
# same as SYFT_CHECK_FOR_APP_UPDATE env var
check-for-app-update: true

# allows users to specify which image source should be used to generate the sbom
# valid values are: registry, docker, podman
default-image-pull-source: ""
# maximum number of workers used to process the list of package catalogers in parallel
parallelism: 1

# a list of globs to exclude from scanning. same as --exclude ; for example:
# a list of globs to exclude from scanning, for example:
# exclude:
# - "/etc/**"
# - "./out/**/*.json"
# same as --exclude ; SYFT_EXCLUDE env var
wagoodman marked this conversation as resolved.
Show resolved Hide resolved
exclude: []

# allows users to exclude synthetic binary packages from the sbom
Expand All @@ -493,6 +535,10 @@ exclude-binary-overlap-by-ownership: true
# same as --platform; SYFT_PLATFORM env var
platform: ""

# the search space to look for file and package data (options: all-layers, squashed)
# SYFT_SCOPE env var
scope: "squashed"

# set the list of package catalogers to use when generating the SBOM
# default = empty (cataloger set determined automatically by the source type [image or file/directory])
# catalogers:
Expand Down Expand Up @@ -533,7 +579,6 @@ platform: ""
# - spm-cataloger
catalogers:


# all format configuration
format:

Expand Down Expand Up @@ -586,9 +631,36 @@ format:
pretty: false


file:

metadata:
# select which files should be captured by the file-metadata cataloger and included in the SBOM.
# Options include:
# - "all-files": capture all files from the search space
# - "owned-files": capture only files owned by packages
# - "unowned-files": capture only files not owned by packages
# - "no-files", "": do not capture any files
# SYFT_FILE_METADATA_SELECTION env var
selection: "owned-files"

# the file digest algorithms to use when cataloging files (options: "md5", "sha1", "sha224", "sha256", "sha384", "sha512")
# SYFT_FILE_METADATA_DIGESTS env var
digests: ["sha256"]

# capture the contents of select files in the SBOM
content:
# skip searching a file entirely if it is above the given size (default = 1MB; unit = bytes)
# SYFT_FILE_CONTENT_SKIP_FILES_ABOVE_SIZE env var
skip-files-above-size: 1048576

# file globs for the cataloger to match on
# SYFT_FILE_CONTENT_GLOBS env var
globs: []


# cataloging packages is exposed through the packages and power-user subcommands
package:

# search within archives that do contain a file index to search against (zip)
# note: for now this only applies to the java package cataloger
# SYFT_PACKAGE_SEARCH_INDEXED_ARCHIVES env var
Expand All @@ -600,14 +672,10 @@ package:
# SYFT_PACKAGE_SEARCH_UNINDEXED_ARCHIVES env var
search-unindexed-archives: false

cataloger:
# enable/disable cataloging of packages
# SYFT_PACKAGE_CATALOGER_ENABLED env var
enabled: true
# allows users to exclude synthetic binary packages from the sbom
# these packages are removed if an overlap with a non-synthetic package is found
exclude-binary-overlap-by-ownership: true

# the search space to look for packages (options: all-layers, squashed)
# same as -s ; SYFT_PACKAGE_CATALOGER_SCOPE env var
scope: "squashed"

golang:
# search for go package licences in the GOPATH of the system running Syft, note that this is outside the
Expand Down Expand Up @@ -657,45 +725,11 @@ python:
guess-unpinned-requirements: false

javascript:
search-remote-licenses: false
npm-base-url: "https://registry.npmjs.org"

file-contents:
cataloger:
# enable/disable cataloging of file contents
# SYFT_FILE_CONTENTS_CATALOGER_ENABLED env var
enabled: true

# the search space to look for file contents (options: all-layers, squashed)
# SYFT_FILE_CONTENTS_CATALOGER_SCOPE env var
scope: "squashed"
search-remote-licenses: false
npm-base-url: "https://registry.npmjs.org"

# skip searching a file entirely if it is above the given size (default = 1MB; unit = bytes)
# SYFT_FILE_CONTENTS_SKIP_FILES_ABOVE_SIZE env var
skip-files-above-size: 1048576

# file globs for the cataloger to match on
# SYFT_FILE_CONTENTS_GLOBS env var
globs: []

file-metadata:
cataloger:
# enable/disable cataloging of file metadata
# SYFT_FILE_METADATA_CATALOGER_ENABLED env var
enabled: true

# the search space to look for file metadata (options: all-layers, squashed)
# SYFT_FILE_METADATA_CATALOGER_SCOPE env var
scope: "squashed"

# the file digest algorithms to use when cataloging files (options: "md5", "sha1", "sha224", "sha256", "sha384", "sha512")
# SYFT_FILE_METADATA_DIGESTS env var
digests: ["sha256"]

# maximum number of workers used to process the list of package catalogers in parallel
parallelism: 1

# options that apply to all scan sources
# configuration for the source that the SBOM is generated from (e.g. a file, directory, or container image)
source:
# alias name for the source
# SYFT_SOURCE_NAME env var; --source-name flag
Expand All @@ -710,6 +744,14 @@ source:
# the file digest algorithms to use on the scanned file (options: "md5", "sha1", "sha224", "sha256", "sha384", "sha512")
digests: ["sha256"]

image:

# allows users to specify which image source should be used to generate the sbom
# valid values are: registry, docker, podman
wagoodman marked this conversation as resolved.
Show resolved Hide resolved
# SYFT_SOURCE_IMAGE_DEFAULT_PULL_SOURCE env var
default-pull-source: ""


# options when pulling directly from a registry via the "registry:" or "containerd:" scheme
registry:
# skip TLS verification when communicating with the registry
Expand Down
Loading
Loading