-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add package dependency completness field #3402
base: main
Are you sure you want to change the base?
Conversation
bb1b51b
to
141d19a
Compare
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
141d19a
to
17abccb
Compare
@wagoodman I wanted to make sure I understand why the It seems like as we make a set of dependency relationships more complete, it passes from complete, to mixed, and back to complete, which is surprising:
It seems weird to me that a cataloger can effectively get promoted from complete to mixed, and then from mixed to complete, by adding relationship data. When I first read the descriptions above, I thought, "rust isn't complete, it's mixed, because it includes transitive dependencies." Am I understanding correctly here? |
Here each node in the cargo.lock is describing only direct dependencies so this is |
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
17abccb
to
ad2abfa
Compare
Cross posting #3010 (comment) so that it gets seen here. |
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Language pkg.Language `json:"language"` | ||
CPEs cpes `json:"cpes"` | ||
PURL string `json:"purl"` | ||
Dependencies pkg.DependencyCompleteness `json:"dependencies"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right name for this field? licenses
is the actual licenses, cpes
is the actual CPEs, locations
is the actual locations, but dependencies
is the completeness and directness of the dependencies. What about dependencyCompleteness
or something? I hate to make it that much longer, but I also hate to have dependencies
be the only plural noun on here that's no a slice of noun.
CPEs: cpes, | ||
PURL: p.PURL, | ||
Metadata: p.Metadata, | ||
Name: p.Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting to see syft/format/...
for other formats. Do SPDX and CDX have anywhere to put dependency completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SPDX only in 3.x (which we don't support yet), for Cyclone there is support for it in the format spec for versions we support -- I'll update the PR to map appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this change get missed? I don't see a diff in the cyclonedx and cyclonedx json formatters, and I expected to. (I think it's also ok to do this as a follow up PR, since the current PR is large.)
PURL: portableExecutablePackageURL(name, version), | ||
Metadata: metadata, | ||
PURL: portableExecutablePackageURL(name, ver), | ||
// by nature PE metadata does not have any dependency information, thus we are forced to claim incomplete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify the difference between unknown and incomplete one more time:
- Dependency is known to be incomplete or known to be missing entirely -> Incomplete
- It is not knowable whether dependency data is complete -> unknown.
Is "PE file does not include dependency information, so this package's dependencies are known to be incomplete." a good summary of the comment here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basic definitions:
incomplete: the package does not have all of its direct dependencies resolved.
unknown: indicates when dependency resolution mechanism for this package is not well understood.
I've struggled through most of development on a good firm definition for unknown vs incomplete.
Is "PE file does not include dependency information, so this package's dependencies are known to be incomplete." a good summary of the comment here then?
It seems like one, but I'm stuck on "dependencies are known to be incomplete", which isn't quite true.
Do you think we should extend unknown
to really mean "no assertion" (for any reason), in which case may things listed as incomplete in the interim would switch to unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the definitions based off of our offline discussions:
incomplete: indicates that the package is known to not have all of its direct dependencies listed.
unknown: indicates that the completeness of the dependencies cannot be considered positively complete or incomplete. This should be used when the dependency resolution mechanism is not well understood, the set of dependencies is unknowable, or no attempt has been made to resolve dependencies (no assertion).
That also means that I don't think we have a case yet where we would list 'incomplete'.
relationships = append(relationships, artifact.Relationship{ //nolint:gocritic // we intentionally want to use the reference to the package which will still be mutated and finalized to a value later | ||
// both the to and from packages may be mutated based on the resolved dependencies, so we need references to these values | ||
// to ensure the nodes used are consistent with the final state of the packages. | ||
// note: it is VITAL that these references are replaced with values by the caller of this function before using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These kind of comments always make me nervous. Is there something we can do to make sure this happens? What if we return something that represents instructions for building an []artifact.Relationships
in the caller, so that the caller can't forget and return a []artifact.Relationships
with unexpected types in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's what the finalizeRelationships
call at the final return of the cataloger (a chokepoint)
syft/syft/pkg/cataloger/java/parse_pom_xml.go
Lines 84 to 97 in 5b7ec60
return finalizePackages(pkgs), finalizeRelationships(relationships), errs | |
} | |
func finalizeRelationships(relationships []artifact.Relationship) []artifact.Relationship { | |
for i := range relationships { | |
if f, ok := relationships[i].From.(*pkg.Package); ok { | |
relationships[i].From = *f | |
} | |
if t, ok := relationships[i].To.(*pkg.Package); ok { | |
relationships[i].To = *t | |
} | |
} | |
return relationships | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that. I think I'd rather the type system remember that finalizeRelationships
needs to be called. For example, you could have a local struct unfinalizedRelationship
and return a slice of those, and then finalize relationship can take a slice of unfinalizedRelationship
and return a slice of a artifact.Relationship
. That way if someone tries to use this function without finalizing the relationships, the compiler won't let them.
syft/pkg/cataloger/rust/package.go
Outdated
@@ -46,6 +47,8 @@ func newPackageFromAudit(dep *rustaudit.Package, locations ...file.Location) pkg | |||
Language: pkg.Rust, | |||
Type: pkg.RustPkg, | |||
Locations: file.NewLocationSet(locations...), | |||
// no attempt is made by the parser function to resolve dependencies | |||
Dependencies: pkg.IncompleteDependencies, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rust auditable binary cataloger is a lot like the Go binary cataloger. I think this cataloger can be changed to emit a "main" package and CompleteWithIndirect. Or do we not trust the completeness of this metadata if it's present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying not to change the cataloger logic in this PR -- what you're describing would be a good follow up enhancement for this cataloger
Type Type `cyclonedx:"type"` // the package type (e.g. Npm, Yarn, Python, Rpm, Deb, etc) | ||
CPEs []cpe.CPE `hash:"ignore"` // all possible Common Platform Enumerators (note: this is NOT included in the definition of the ID since all fields on a CPE are derived from other fields) | ||
PURL string `hash:"ignore"` // the Package URL (see https://github.com/package-url/purl-spec) | ||
Dependencies DependencyCompleteness `hash:"ignore"` // the completeness of the dependency information for this package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I said this above, but I don't think Dependencies
is the right name for this field. Licenses
is of type LicenseSet
, which is a set of licenses. CPEs
is a slice of cpe.CPE
. Locations
is a LocationSet
. Dependencies
is an enum about how complete the dependencies are. It seems to break an established pattern.
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Locations: file.NewLocationSet(file.NewVirtualLocation("/hello-auditable", "/hello-auditable")), | ||
Language: pkg.Rust, | ||
Type: pkg.RustPkg, | ||
Dependencies: pkg.UnknownDependencyCompleteness, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to change now that #3500 is merged?
@@ -46,6 +47,8 @@ func newPackageFromAudit(dep *rustaudit.Package, locations ...file.Location) pkg | |||
Language: pkg.Rust, | |||
Type: pkg.RustPkg, | |||
Locations: file.NewLocationSet(locations...), | |||
// no attempt is made by the parser function to resolve dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the attempt is made now that #3500 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll handle this on the rebase
@@ -9,17 +9,18 @@ import ( | |||
"github.com/anchore/syft/syft/pkg/cataloger/internal/pkgtest" | |||
) | |||
|
|||
func TestParsePackPackage(t *testing.T) { | |||
func xTestParsePackPackage(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is committing this x
intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch, that's a typo
Description
Adds a field that describes the completeness of a package's direct dependencies:
complete
: the package has all of its direct dependencies resolved and related to this package.incomplete
: indicates that the package is known to not have all of its direct dependencies listed. This is reserved for cases where we know there are a non-zero number of dependencies for a package, but we are not listing them intentionally or because we are unable to resolve them.complete-with-indirect
: a superset ofcomplete
-- indicates that the package has all of its direct dependencies resolved as well as some or all of indirect dependencies. What is notable about this is that direct and indirect dependencies are linked directly to this package and are not separable (you cannot distinguish between a direct and indirect dependency from the perspective of this package).unknown
: indicates that the completeness of the dependencies cannot be considered positively complete or incomplete. This should be used when the dependency resolution mechanism is not well understood, the set of dependencies is unknowable, or no attempt has been made to resolve dependencies (no assertion).In addition to adding this new field, all catalogers were updated to raise up accurate values for this field:
complete-with-indirect
but all dependencies getunknown
WithResolvingProcessors(wheelEggRelationships)
on the catalogerType of change
Checklist: