-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 1944 - update purl generation to use a consistent groupID #2033
Conversation
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
f03e747
to
f8a6582
Compare
@westonsteimel I combined the work from 2032 and the single groupID answer code together in this branch - this does not alter any CPE generation or change how java packages are deduped - I'll be updating those in a separate PR I'm adding two commits here -
Syft Maindb crawler results : currently running:
^ I have to run these again but my ISP just went out and I don't want to pull this down over data 😬 Latest Comparison
|
Benchmark Test ResultsBenchmark results from the latest changes vs base branch
|
2ce6400
to
fe85967
Compare
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
fe85967
to
9f3157c
Compare
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
// 2. The group ID from the POM project | ||
// 3. The group ID from a select map of known group IDs | ||
// 3. The group ID from the Java manifest | ||
func GroupIDFromJavaMetadata(pkgName string, metadata pkg.JavaMetadata) (groupID string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these new functions probably don't belong in the cpe
package since they have been replicated to not affect the CPE generation logic today. My vote would be to put this under the java cataloger package (as near duplicate functions) and unexport all of the functions (I'd duplicate the looksLikeGroupID
function too... since there isn't a specific definition for this and the impl may change based on the use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I parsed this comment correctly - check f1e05da
If there are some edits or things to move around from here we can do a quick sync on the package organization and exports/imports
parentKey := fmt.Sprintf("%s:%s:%s", groupID, parentPkg.Name, parentPkg.Version) | ||
pomProjectKey := fmt.Sprintf("%s:%s:%s", pomProperties.GroupID, pomProperties.ArtifactID, pomProperties.Version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be trying to correlate the parent name and the current pom properties artifact ID and changing the virtual path when this occurs. Can you leave a code comment to why this is important? (and this might look test worthy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with commentary and test updates in 06bd7b9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify: why is the parent package name and pom properties artifact ID being compared? (instead of artifact id vs artifact id or name vs name?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment explaining the significance and why we use the values we're using: 1bbb6c5
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay -- I meant to submit this a few days ago!
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Latest run: 1e4f54a
|
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Ratio when removing ArtifactID: 77a6fd4
Conclusion is we had less correct by about 10,000 with a larger sample size so I'm reverting the commit. |
05f7f67
to
c3b3e79
Compare
This reverts commit 77a6fd4. Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
c3b3e79
to
1bbb6c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the deep dive on this one @spiffcs 🙌
…ore#2033) Separate the logic for CPE and PURL generation. PURL generation needs a single answer for groupID based on a priority of discovering the field. CPE generation still uses multiple potential groupID to populate the candidate cpe. Improve GroupID detection. Currently syft does not use any hierarchy for GroupID detection and treats all sources as equal. It treats fields from the manifest file with priority. This change adds a hierarchy to the fields and returns a single answer based on that hierarchy. --------- Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com> Signed-off-by: Keith Zantow <kzantow@gmail.com> Co-authored-by: Keith Zantow <kzantow@gmail.com>
Summary
This PR aims to improve 3 related aspects of syft's java cataloging. Pulls in #2032
Separate the logic for CPE and PURL generation. PURL generation needs a single answer for groupID based on a priority of discovering the field. CPE generation still uses multiple potential groupID to populate the candidate cpe.
Improve GroupID detection. Currently syft does not use any hierarchy for GroupID detection and treats all sources as equal. It treats fields from the manifest file with priority. We should try to obtain a GroupID answer from Pom Properties, then the Pom Project and finally, if a GroupID has not been found, the Manifest. Manifest should not return an answer if one was found in PomProject, and PomProject should not return a GroupID answer if PomProperties found one.
Note: we've removed the
child matches parent by virtual path
case because the virtual path now has more uniqueness that depends on a packages metadata. The parent did not fulfill this requirement and thus a new package was created, virtual paths no longer match, and the two packages were not merged