Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor source API #1846

Merged
merged 7 commits into from
Jun 30, 2023
Merged

Conversation

wagoodman
Copy link
Contributor

@wagoodman wagoodman commented May 25, 2023

Refactors the single uber-object of source.Source into multiple single responsibility objects (e.g. source.DirectorySource, source.StereoscopeImageSource, etc) that all fall under the same (new) source.Source abstraction:

type Source interface {
	artifact.Identifiable
	FileResolver(Scope) (file.Resolver, error)
	Describe() Description
	io.Closer
}

There are two ways to create a source object:

  1. directly
  2. via detection (considering scheme + a content analysis of the reference if it's a local file)

The first is new with this PR:

src, err := NewFromDirectory(
	DirectoryConfig{
		Path: "/somewhere/to/look/at",
	},
)

The second was a simplification of the existing approach:

detection, err := source.Detect("/somewhere/to/look/at", app.DefaultImagePullSource)
if err != nil {
	return err
}

src, err := detection.NewSource(
	&source.Alias{
		Name:    app.SourceName,
		Version: app.SourceVersion,
	},
	app.Registry.ToOptions(),
	app.Platform,
	app.Exclusions,
)
if src != nil {
	defer src.Close()
}

Open questions:

  1. Today we do NOT have the source.target section JSON output captured in the JSON schema -- should we start to do this?

Breaking changes:

  • JSON schema v9 bump (see [1]). Note: there are some structural changes to the source.target section in the JSON output (now called source.metadata), however, we have never made guarantees about this section of the JSON document.
  • The new source.Source interface replaces the existing object, which breaks the existing API.

Main changes:

  • Added new FileSource, DirectorySource, and StereoscopeImageSource concrete types that implement the new source.Source interface
  • There is a lot more testing around how the source ID is generated and what is used to define that ID for each source object.
  • Simplified much of the source.Input and detection approach to a source.Detect() call which you can create a source object from.
  • [1] Overall attempts to move the serialized source object to act like the package object: there is a top level declaration of name and version and nested metadata, and all metadata is mapped 1:1 in the output. This PR removes much of the "select a field and map it with special logic on encoding/decoding" for the source descriptions. Now it is a much simpler pass through for all fields on source.Description. To keep patterns in line with how packages are done, the target field in the JSON output of the source block has been renamed to metadata. This means a new JSON schema has been generated (v9).
  • Elevates source.Name and source.Version to be outside of the specific source metadata. Instead it is at the top level, next to ID, as this is common to all sources.
  • Adds a new file.ChrootContext for the root+cwd+base logic from the directory resolver (simplifies the dir resolver and allows for reuse of the chroot-like logic).
  • FileSource used to not respond to FilesByPath, however, now it does (the parent dir is indexed with all other files excluded now)

Auxiliary changes:

  • Testing snapshots for formats now store the redacted output instead of the original output
  • A new internal.AllSourceMetadataReflectTypes collection to aid in testing, ensuring completion testing for source.*Metadata types
  • Encapsulates the source name and version into an alias object

Deferred work:

Follow up PRs:

  • remove pkg metadata type reflect type helpers and use packagemetadata helpers with what was introduced with sourcemetadata. Related to Remove MetadataType from the core package struct #1735
  • unexport source models if possible
  • move JSON schema version to the syftjson package
  • Can we figure a better source ID for the directory source? (Today it's based on path, can we make it content driven?)

JSON Schema Diff for reviewers:

# $ diff ./schema/json/schema-8.0.1.json ./schema/json/schema-9.0.0.json
<   "$id": "https://github.com/anchore/syft/syft/formats/syftjson/model/document",
---
>   "$id": "anchore.io/schema/syft/json/9.0.0/document",
1851a1852,1857
>           "type": "string"
>         },
>         "name": {
>           "type": "string"
>         },
>         "version": {
1857c1863
<         "target": true
---
>         "metadata": true
1861a1868,1869
>         "name",
>         "version",
1863c1871
<         "target"
---
>         "metadata"

Addresses #558 (comment)

Closes: #1866

@wagoodman wagoodman changed the title refactor source API Refactor source API May 25, 2023
@wagoodman wagoodman self-assigned this May 25, 2023
@wagoodman wagoodman added the breaking-change Change is not backwards compatible label May 25, 2023
@github-actions
Copy link

github-actions bot commented May 25, 2023

Benchmark Test Results

Benchmark results from the latest changes vs base branch
goos: linux%0Agoarch: amd64%0Apkg: github.com/anchore/syft/test/integration%0Acpu: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz%0A                                                          │ ./.tmp/benchmark-d8a15f2.txt │%0A                                                          │            sec/op            │%0AImagePackageCatalogers/alpmdb-cataloger-2                                    12.05m ± 1%25%0AImagePackageCatalogers/apkdb-cataloger-2                                     732.4µ ± 1%25%0AImagePackageCatalogers/binary-cataloger-2                                    207.0µ ± 2%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                    589.9µ ± 3%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                               1.228m ± 2%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                          96.36µ ± 2%25%0AImagePackageCatalogers/java-cataloger-2                                      13.13m ± 1%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                      94.82µ ± 3%25%0AImagePackageCatalogers/javascript-package-cataloger-2                        415.7µ ± 2%25%0AImagePackageCatalogers/nix-store-cataloger-2                                 280.3µ ± 7%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                    801.0µ ± 2%25%0AImagePackageCatalogers/portage-cataloger-2                                   465.5µ ± 3%25%0AImagePackageCatalogers/python-package-cataloger-2                            3.206m ± 1%25%0AImagePackageCatalogers/r-package-cataloger-2                                 201.7µ ± 1%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                    529.9µ ± 1%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                              916.1µ ± 1%25%0AImagePackageCatalogers/sbom-cataloger-2                                      119.2µ ± 1%25%0Ageomean                                                                      615.1µ%0A%0A                                                          │ ./.tmp/benchmark-d8a15f2.txt │%0A                                                          │             B/op             │%0AImagePackageCatalogers/alpmdb-cataloger-2                                   5.127Mi ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                    205.0Ki ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                   30.20Ki ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                   169.1Ki ± 0%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                              405.6Ki ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                         9.906Ki ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                     2.826Mi ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                     8.594Ki ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                       100.9Ki ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                49.14Ki ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                   186.6Ki ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                  120.0Ki ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                           1.004Mi ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                53.30Ki ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                   180.9Ki ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                             144.1Ki ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                     14.20Ki ± 0%25%0Ageomean                                                                     132.8Ki%0A%0A                                                          │ ./.tmp/benchmark-d8a15f2.txt │%0A                                                          │          allocs/op           │%0AImagePackageCatalogers/alpmdb-cataloger-2                                    87.75k ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                     4.182k ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                     830.0 ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                    3.002k ± 0%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                               6.338k ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                           281.0 ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                      39.88k ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                       228.0 ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                        1.404k ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                  895.0 ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                    4.079k ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                   2.269k ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                            16.44k ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                  929.0 ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                    3.989k ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                              2.447k ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                       394.0 ± 0%25%0Ageomean                                                                      2.583k

@wagoodman wagoodman force-pushed the source-api-refactor-on-location-refactor branch from 92b411b to 869d879 Compare June 16, 2023 20:35
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
@wagoodman wagoodman force-pushed the source-api-refactor-on-location-refactor branch from 869d879 to f33166d Compare June 16, 2023 20:42
@wagoodman wagoodman requested a review from a team June 16, 2023 20:43
@wagoodman wagoodman marked this pull request as ready for review June 16, 2023 20:43
@wagoodman
Copy link
Contributor Author

note from sync review: this is a good time to make all breaking changes now, can we finalize detection.NewSource now?

@wagoodman
Copy link
Contributor Author

from sync review:

  • are we using the file.Base? Should we allow this to be configuable for a file source?
  • should we name syftjson.model.source.type to metadatatype?

Copy link
Contributor

@kzantow kzantow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial feedback, overall looking like a 👍 change

syft/source/detection.go Outdated Show resolved Hide resolved
syft/source/detection.go Outdated Show resolved Hide resolved
syft/file/chroot_context.go Outdated Show resolved Hide resolved
syft/formats/internal/testutils/utils.go Outdated Show resolved Hide resolved
syft/formats/cyclonedxjson/encoder_test.go Outdated Show resolved Hide resolved
syft/formats/syftjson/to_format_model_test.go Outdated Show resolved Hide resolved
syft/internal/fileresolver/container_image_squash_test.go Outdated Show resolved Hide resolved
syft/source/description.go Show resolved Hide resolved
syft/source/directory_source.go Show resolved Hide resolved
// from the path provided with an attempt to prune a prefix if a base is given. Since the contents of the directory
// are not considered, there is no semantic meaning to the artifact ID -- this is why the alias is preferred without
// consideration for the path.
func deriveIDFromDirectory(cwd string, cfg DirectoryConfig) artifact.ID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is CWD part of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like it was needed for the chroot object, but I'm not using it to determine the ID here. I've updated this to extract cwd entirely from this execution path.

wagoodman and others added 4 commits June 23, 2023 16:38
Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
…n-location-refactor

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@wagoodman wagoodman force-pushed the source-api-refactor-on-location-refactor branch from 7880d62 to 15fc598 Compare June 28, 2023 20:37
// scanning root changes (e.g. a user scans a directory, then moves it to a new location and scans again).
info = fmt.Sprintf("%s@%s", cfg.Alias.Name, cfg.Alias.Version)
} else {
log.Warn("no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to call out this change explicitly -- I still haven't been able to reconcile what a "good" ID is for a directory (that isn't too expensive to figure). But I've added this warning to at least let folks know that if you're doing a directory scan an are not setting --source-name or --source-version explicitly then this is most likely wrong. I didn't call out the specific options since this is the lib path and not in the cmd pkg.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an Info or Debug message? This will show for every directory scan otherwise, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the problem with the directory source ID -- it's really not adequate for any use case where you're trying to cross correlate SBOMs historically since it's brittle and may overlap with entirely unrelated scans. It feels like it should be a warning for that reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did find that this wasn't showing up though, and I discovered why: we can't assume that a pointer to alias means that the alias is optional, we still need to check individual fields. I've updated the API to not take alias pointers in config objects (any spot on the exported API where they were function args it is still a pointer though). This allows for the zero value to be used in configs (or for the caller to fully specify like with the cmd package) and we need to check the name and version explicitly, which was already being done in some circumstances.

syft/formats/syftjson/to_format_model_test.go Outdated Show resolved Hide resolved
syft/internal/packagemetadata/generated.go Show resolved Hide resolved
"reflect"
)

func AllNames() []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I missed this in the sync review, but where the "incorrect" names being handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR doesn't go all the way with the packagemetadata, that's reserved for #1844 . In short, this isn't used yet (except for a test), only AllTypes() is used and is consistent with the sourcemetadata package approach. The value add for adding this in this PR was to add the completion test around packagemetadata.AllNames() to detect when there is drift with the ast analysis and the generated code... a test that will be extended to include the JSON name mapping in #1844 (as done in the sourcemetadata.AllNames() completion test).

// scanning root changes (e.g. a user scans a directory, then moves it to a new location and scans again).
info = fmt.Sprintf("%s@%s", cfg.Alias.Name, cfg.Alias.Version)
} else {
log.Warn("no explicit name and version provided for directory source, deriving artifact ID from the given path (which is not ideal)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an Info or Debug message? This will show for every directory scan otherwise, I think

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@wagoodman wagoodman enabled auto-merge (squash) June 30, 2023 14:15
@wagoodman wagoodman merged commit 4da3be8 into main Jun 30, 2023
@wagoodman wagoodman deleted the source-api-refactor-on-location-refactor branch June 30, 2023 14:19
spiffcs added a commit that referenced this pull request Jul 11, 2023
* main:
  feat: CLI flag for directory base (#1867)
  Fix CPE gen for k8s python client (#1921)
  chore: update iterations to protect against race (#1927)
  chore(deps): update bootstrap tools to latest versions (#1922)
  fix: Don't use the actual redis or grpc CPEs for gems (#1926)
  fix(install): return with right error code (#1915)
  Remove erroneous Java CPEs from generation (#1918)
  chore(deps): bump golang.org/x/net from 0.11.0 to 0.12.0 (#1916)
  Switch UI to bubbletea (#1888)
  fix: use filepath.EvalSymlinks if os.Readlink fails to evaluate the link (#1884)
  add file source digest support (#1914)
  chore(deps): update bootstrap tools to latest versions (#1908)
  chore(deps): bump golang.org/x/mod from 0.11.0 to 0.12.0 (#1912)
  chore(deps): bump golang.org/x/term from 0.9.0 to 0.10.0 (#1913)
  doc(readme): add installation section with scoop (#1909)
  Refactor source API (#1846)
  chore(deps): update bootstrap tools to latest versions (#1905)
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* refactor source API and syft json source block

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* update source detection and format test utils

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>

* generate list of all source metadata types

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* extract base and root normalization into helper functions

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* preserve syftjson model package name import ref

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* alias should not be a pointer

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

---------

Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change is not backwards compatible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit the source.New* set of functions
2 participants