-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-623 ci: Add a workflow for Trivy scan #14623
Conversation
Ticket title is 'Generic ticket for minor code cleanup and improvement' |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14623/1/execution/node/1183/log |
.github/workflows/triviy.yml
Outdated
cp trivy-report-daos.${{ steps.gen_extension.outputs.EXTENSION }}.txt report | ||
cp utils/trivy/.trivyignore report/trivyignore.txt | ||
|
||
- name: Upload the report to the GitHub artifactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: Upload the report to the GitHub artifactory | |
- name: Upload the report to the GitHub artifact store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Skip-test: true Skip-unit-test: true Skip-func-test: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Skip-test: true Skip-unit-test: true Skip-func-test: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This reverts commit 4ad656b. Skip-test: true Skip-unit-test: true Skip-func-test: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This reverts commit f5105c5. Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Skip-test: true Skip-unit-test: true Skip-func-test: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72 We don't allow force-pushing once reviewing has started as it voids the ability to incrementally review only what was changed since previous reviews. You are essentially asking reviewers to start from scratch and re-review the entire PR again including code they have already reviewed which is not a good use of time. |
.github/workflows/triviy.yml
Outdated
schedule: | ||
- cron: '46 8 * * 0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a schedule here instead of just on landing? I.e.:
schedule: | |
- cron: '46 8 * * 0' | |
push: | |
branches: ["master"] |
This way we know exactly which landing may have caused a regression and don't have to bisect 24h worth landings to find the one that is causing a failure -- assuming anyone even notices the failed result of a scheduled test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two factors in Trivy scans:
- a new code that enables new use cases and new potential vulnerability areas
- new vulnerabilities detected in already used packages
I assume that the second bullet will be the root cause of the majority of incidents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the first bullet, then that is something that the developer introduced with their PR and so they need to resolve it there.
Even the second bullet does not represent cases we don't already experience. Any one of our linters for example can start flagging pre-existing code that it never used to simply because a new version of the linter is introduced and starts flagging existing constructs that it does not like any more. When this happens we have to fix the code it starts flagging. It fails PRs. Because if it doesn't it is ignored and landed and then everyone's PRs start flagging the offending (yet pre-existing) code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk is that random PR can be affected by a new vulnerability detected only because of the time coincidence.
You can make some changes in C-code, and your PR will be blocked because a new Java/python vulnerability will be detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the risk and we already accept that kind of risk and resolution process with our other checkers. We frequently see unrelated linting issues creep into PRs due to updated linting tools, spelling checkers, etc.
Those are either corrected in the PR that found them or somebody is assigned the task of getting master updated to the correction. But it's only because they block progress that they get fixed. Otherwise nobody takes on the task and distraction to address them if they don't halt progress. History has proven this to us.
PRs that are blocked wait for the update and rebase, or they can be force-landed if gatekeepers are willing.
Given that we accept this process for less serious issues like spell-checking and code linting, we must surely accept this process for actual security vulnerabilities that are detected!
.github/workflows/triviy.yml
Outdated
|
||
# generate trivy report only if no errors detected | ||
- name: Generate trivy report file extension | ||
# github.base_ref-based for PR and github.ref_name-based for schedule/workflow_dispatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using landing-time testing instead of scheduled resolve this either/or for github.base_ref
/github.ref_name
?
In either case, see below for a more concise solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not want to block PR because of detected vulnerabilities, as the solution is to update the corrupted package to the proper version or, like what we did with Hadoop, create a suppression.
It will be great to add a status badge to monitor the Trivy scan as we have it done for other regular actions, when we have this PR landed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not want to block PR because of detected vulnerabilities,
Why not? That is how we operate here. If some new "check" comes up as failed, it needs to be fixed before the PR can land. Otherwise we propagate that failure to everyone else's PR once they rebase on the branch that allowed the PR with the failure in it to land. This is huge waste of everyone else's time.
It will be great to add a status badge to monitor the Trivy scan as we have it done for other regular actions
Nobody monitors these badges. All of these checks behind these badges fail PRs (and landing branches even) when they start failing. That is the only way failures are noticed here.
Even scheduled (i.e. cron) failures will go unnoticed until they start showing up in PRs, by which time it now becomes somebody's job to start bisecting and otherwise hunting for the source of the failure. That somebody will most assuredly come from the SRE team, disrupting whatever tasks they had going on and adds zero value to their OKRs.
Failures need to be alerted and fixed when and where they happen and by whomever introduces them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the Trivy scan might fail not because of new code but because of a new vulnerability detected in some unrelated to the PR code.
There are two goals (from the SDLe process perspective):
- prevent any new vulnerability from being introduced by developers (rare case)
- detect new vulnerabilities that are detected and reported by subcomponent owners
The second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the Trivy scan might fail not because of new code but because of a new vulnerability detected in some unrelated to the PR code.
Yes, that is understood. As described above, we already accept this process model with our other checks and validations.
The problem is that we do not have processes that support allowing new (even unrelated) issues to "slide" and land so that they are addressed later. Nobody ever owns fixing those and they continue to pile up until the pile is so big that it's a nightmare. So the only process that works for us is to have new -- even unrelated issues -- block progress, forcing somebody to own getting it fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few changes on updated code.
Is there any reason we are not using the https://github.com/aquasecurity/trivy-action github action, possibly/potentially with GitHub Code Scanning? |
Skip-test: true Skip-unit-test: true Skip-func-test: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
It is because of not signed merge commits and instruction that proposes the following sequence of commands:
|
The reason was/is to have easy local reproduction of the Trivy scan. |
In the future you can disregard issues with unsigned merge commits. They will disappear when the PR is landed.
Yes, it is unfortunate that that is the suggested solution for such a case. |
I think it is doubtful that anyone is going to try to reproduce the scan locally. Everyone does their work in CI here. We have many workflows that do not have easy local reproductions but ultimately the tools underlying them can be used locally. This seems to fit that same category. I think leveraging the work put into the action and it's ongoing maintenance (by somebody else) and the (i.e GitHub Code Scanning) reporting capabilities of using the action outweigh the slim possibilities that somebody might want to try to use the workflow code to try to reproduce locally. Having good and clear reporting (such as GitHub Code Scanning provides) should be the primary goal so that we make it easy for developers to find their errors without having to decode difficult to understand ad-hoc report outputs. |
Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SDL the driving force behind us running this scan?
It might be useful to suppress some (or all maybe even) of the ignores just so that we can see how detected vulnerabilities are presented to the user so that we can make sure it's useful/usable by developers. Feel free to do that in a fork of this PR if you don't want that churn in this PR.
utils/trivy/.trivyignore
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of ignored CVEs. Are we allowed to ignore this many? Some of them are HIGH
even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Trivy is because of SDL
- It has been agreed with SDL people that we can ignore all Hadoop-related CVE
- We do not release any Java code so we can practically ignore all Java-related issues
- I want first to have the running workflow and next clean all required issues (in a separate PR)
- An example of a failed build with suppressed ignores: https://github.com/grom72/daos/actions/runs/10159867708/job/28095002601#step:8:31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of a failed build with suppressed ignores: grom72/daos/actions/runs/10159867708/job/28095002601#step:8:31
Nice. Is there a reason we don't take advantage of the whole Code Scanning workflow in GitHub with this action?
https://github.com/aquasecurity/trivy-action#using-trivy-with-github-code-scanning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow supports two things:
- SDL evidence generation where I want to have also ignored issues documented
- Clear info in the log when something goes wrong
- Clear info in the log about ignored issues
Sarif scan added. Results of the run w/ expected failures are here (see first two issues):
https://github.com/daos-stack/daos/security/code-scanning?query=pr%3A14623+is%3Aclosed
Skip all functional tests Doc-only: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
Doc-only: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Doc-only: true Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Doc-only: true This reverts commit bceb0ac. Required-githooks: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This reverts commit 666e3b9. Doc-only: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Required-githooks: true Doc-only: true Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Trivy scan is required for SLDe process.
Example execution:
https://github.com/daos-stack/daos/actions/runs/9613005998/job/26514814960?pr=14623
There is no need to run any Functional/Unit/NLT tests, as this PR introduces only a new GHA workflow that does not affect the source code in any way.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: