-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-623 ci: Add a workflow for Trivy scan #14623
Changes from 7 commits
ce5510d
156e9a6
6c3d436
c9da1db
c676df4
cf121a2
b7277dd
dd69b94
a661156
e02ca0f
660b542
5489392
666e3b9
bceb0ac
0744af2
3036508
857a1d4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
name: Trivy scan | ||
|
||
on: | ||
workflow_dispatch: | ||
schedule: | ||
- cron: '46 8 * * 0' | ||
pull_request: | ||
branches: ["master", "release/**"] | ||
|
||
# Declare default permissions as nothing. | ||
permissions: {} | ||
|
||
jobs: | ||
trivy-scan: | ||
name: Trivy scan | ||
runs-on: ubuntu-22.04 | ||
|
||
steps: | ||
- name: Install trivy package | ||
run: | | ||
sudo apt-get install wget apt-transport-https gnupg lsb-release | ||
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add - | ||
echo deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main | \ | ||
sudo tee -a /etc/apt/sources.list.d/trivy.list | ||
sudo apt-get update | ||
sudo apt-get install trivy | ||
|
||
- name: Checkout code | ||
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1 | ||
with: | ||
persist-credentials: false | ||
|
||
- name: Checkout latest trivy configuration | ||
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 | ||
with: | ||
ref: grom72/trivy # 'grom72/trivy' to be changed to 'master' before the merge | ||
path: trivy | ||
persist-credentials: false | ||
|
||
- name: Update trivy configuration | ||
run: | | ||
cp -f -r ./trivy/utils/trivy ./utils | ||
rm -rf ./trivy | ||
|
||
- name: Scan with trivy | ||
run: | | ||
trivy fs -c utils/trivy/trivy.yaml -f table --dependency-tree \ | ||
--skip-files "src/client/java/hadoop-daos/pom.xml" \ | ||
--show-suppressed --exit-code 1 . | ||
|
||
# generate trivy report only if no errors detected | ||
- name: Generate trivy report file extension | ||
# github.base_ref-based for PR and github.ref_name-based for schedule/workflow_dispatch | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does using landing-time testing instead of scheduled resolve this either/or for In either case, see below for a more concise solution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not want to block PR because of detected vulnerabilities, as the solution is to update the corrupted package to the proper version or, like what we did with Hadoop, create a suppression. It will be great to add a status badge to monitor the Trivy scan as we have it done for other regular actions, when we have this PR landed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Why not? That is how we operate here. If some new "check" comes up as failed, it needs to be fixed before the PR can land. Otherwise we propagate that failure to everyone else's PR once they rebase on the branch that allowed the PR with the failure in it to land. This is huge waste of everyone else's time.
Nobody monitors these badges. All of these checks behind these badges fail PRs (and landing branches even) when they start failing. That is the only way failures are noticed here. Even scheduled (i.e. cron) failures will go unnoticed until they start showing up in PRs, by which time it now becomes somebody's job to start bisecting and otherwise hunting for the source of the failure. That somebody will most assuredly come from the SRE team, disrupting whatever tasks they had going on and adds zero value to their OKRs. Failures need to be alerted and fixed when and where they happen and by whomever introduces them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that the Trivy scan might fail not because of new code but because of a new vulnerability detected in some unrelated to the PR code.
The second There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, that is understood. As described above, we already accept this process model with our other checks and validations. The problem is that we do not have processes that support allowing new (even unrelated) issues to "slide" and land so that they are addressed later. Nobody ever owns fixing those and they continue to pile up until the pile is so big that it's a nightmare. So the only process that works for us is to have new -- even unrelated issues -- block progress, forcing somebody to own getting it fixed. |
||
id: gen_extension | ||
run: | | ||
EXTENSION=$(echo "${{ github.base_ref }}" | sed -e 's/release\///' -e's/\//_/' ) | ||
if [ -z "${EXTENSION}" ]; then | ||
EXTENSION=$(echo "${{ github.ref_name }}" | sed -e 's/release\///' -e's/\//_/' ) | ||
fi | ||
brianjmurrell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
echo "EXTENSION=$EXTENSION" >> $GITHUB_OUTPUT | ||
|
||
- name: Generate trivy report | ||
run: | | ||
trivy fs -c utils/trivy/trivy.yaml \ | ||
--skip-files "src/client/java/hadoop-daos/pom.xml" \ | ||
--show-suppressed \ | ||
--output trivy-report-daos.${{ steps.gen_extension.outputs.EXTENSION }}.txt . | ||
|
||
- name: Print trivy report | ||
run: cat trivy-report-daos.${{ steps.gen_extension.outputs.EXTENSION }}.txt | ||
|
||
- name: Prepare the report to be uploaded to the GitHub artifact store | ||
run: | | ||
mkdir report | ||
cp trivy-report-daos.${{ steps.gen_extension.outputs.EXTENSION }}.txt report | ||
cp utils/trivy/.trivyignore report/trivyignore.txt | ||
|
||
- name: Upload the report to the GitHub artifact store | ||
uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4.3.3 | ||
env: | ||
EXTENSION: ${{ steps.gen_extension.outputs.EXTENSION }} | ||
with: | ||
path: report/* | ||
name: trivy-report-daos.${{ steps.gen_extension.outputs.EXTENSION }} |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like a lot of ignored CVEs. Are we allowed to ignore this many? Some of them are There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Nice. Is there a reason we don't take advantage of the whole Code Scanning workflow in GitHub with this action? https://github.com/aquasecurity/trivy-action#using-trivy-with-github-code-scanning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The workflow supports two things:
Sarif scan added. Results of the run w/ expected failures are here (see first two issues): |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
## Ignored hadoop related CVE | ||
## CVE-2023-52428,MEDIUM,,"Denial of Service in Connect2id Nimbus JOSE+JWT","com.nimbusds:nimbus-jose-jwt","9.8.1","9.37.2",https://avd.aquasec.com/nvd/cve-2023-52428 | ||
CVE-2023-52428 | ||
## CVE-2023-39410,HIGH,7.5,"apache-avro: Apache Avro Java SDK: Memory when deserializing untrusted data in Avro Java SDK","org.apache.avro:avro","1.7.7","1.11.3",https://avd.aquasec.com/nvd/cve-2023-39410 | ||
CVE-2023-39410 | ||
## CVE-2024-25710,HIGH,5.5,"commons-compress: Denial of service caused by an infinite loop for a corrupted DUMP file","org.apache.commons:commons-compress","1.21","1.26.0",https://avd.aquasec.com/nvd/cve-2024-25710 | ||
CVE-2024-25710 | ||
## CVE-2024-26308,HIGH,5.5,"commons-compress: OutOfMemoryError unpacking broken Pack200 file","org.apache.commons:commons-compress","1.21","1.26.0",https://avd.aquasec.com/nvd/cve-2024-26308 | ||
CVE-2024-26308 | ||
## CVE-2024-29131,MEDIUM,,"commons-configuration: StackOverflowError adding property in AbstractListDelimiterHandler.flattenIterator()","org.apache.commons:commons-configuration2","2.8.0","2.10.1",https://avd.aquasec.com/nvd/cve-2024-29131 | ||
CVE-2024-29131 | ||
## CVE-2024-29133,MEDIUM,,"commons-configuration: StackOverflowError calling ListDelimiterHandler.flatten(Object, int) with a cyclical object tree","org.apache.commons:commons-configuration2","2.8.0","2.10.1",https://avd.aquasec.com/nvd/cve-2024-29133 | ||
CVE-2024-29133 | ||
## CVE-2022-40150,HIGH,7.5,"jettison: memory exhaustion via user-supplied XML or JSON data","org.codehaus.jettison:jettison","1.1","1.5.2",https://avd.aquasec.com/nvd/cve-2022-40150 | ||
CVE-2022-40150 | ||
## CVE-2022-45685,HIGH,7.5,"jettison: stack overflow in JSONObject() allows attackers to cause a Denial of Service (DoS) via crafted JSON data","org.codehaus.jettison:jettison","1.1","1.5.2",https://avd.aquasec.com/nvd/cve-2022-45685 | ||
CVE-2022-45685 | ||
## CVE-2022-45693,HIGH,7.5,"jettison: If the value in map is the map's self, the new new JSONObject(map) cause StackOverflowError which may lead to dos","org.codehaus.jettison:jettison","1.1","1.5.2",https://avd.aquasec.com/nvd/cve-2022-45693 | ||
CVE-2022-45693 | ||
## CVE-2023-1436,HIGH,7.5,"jettison: Uncontrolled Recursion in JSONArray","org.codehaus.jettison:jettison","1.1","1.5.4",https://avd.aquasec.com/nvd/cve-2023-1436 | ||
CVE-2023-1436 | ||
## CVE-2022-40149,MEDIUM,7.5,"jettison: parser crash by stackoverflow","org.codehaus.jettison:jettison","1.1","1.5.1",https://avd.aquasec.com/nvd/cve-2022-40149 | ||
CVE-2022-40149 | ||
## CVE-2023-34455,HIGH,7.5,"snappy-java: Unchecked chunk length leads to DoS","org.xerial.snappy:snappy-java","1.1.8.2","1.1.10.1",https://avd.aquasec.com/nvd/cve-2023-34455 | ||
CVE-2023-34455 | ||
## CVE-2023-43642,HIGH,7.5,"snappy-java: Missing upper bound check on chunk length in snappy-java can lead to Denial of Service (DoS) impact","org.xerial.snappy:snappy-java","1.1.8.2","1.1.10.4",https://avd.aquasec.com/nvd/cve-2023-43642 | ||
CVE-2023-43642 | ||
## CVE-2023-34453,MEDIUM,7.5,"snappy-java: Integer overflow in shuffle leads to DoS","org.xerial.snappy:snappy-java","1.1.8.2","1.1.10.1",https://avd.aquasec.com/nvd/cve-2023-34453 | ||
CVE-2023-34453 | ||
## CVE-2023-34454,MEDIUM,7.5,"snappy-java: Integer overflow in compress leads to DoS","org.xerial.snappy:snappy-java","1.1.8.2","1.1.10.1",https://avd.aquasec.com/nvd/cve-2023-34454 | ||
CVE-2023-34454 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
{{ range . }} | ||
Trivy Vulnerability Scan Results ({{- .Target -}}) | ||
VulnerabilityID,Severity,CVSS Score,Title,Library,Vulnerable Version,Fixed Version,Information URL,Triage Information | ||
{{ range .Vulnerabilities }} | ||
{{- .VulnerabilityID }}, | ||
{{- .Severity }}, | ||
{{- range $key, $value := .CVSS }} | ||
{{- if (eq $key "nvd") }} | ||
{{- .V3Score -}} | ||
{{- end }} | ||
{{- end }}, | ||
{{- quote .Title }}, | ||
{{- quote .PkgName }}, | ||
{{- quote .InstalledVersion }}, | ||
{{- quote .FixedVersion }}, | ||
{{- .PrimaryURL }} | ||
{{ else -}} | ||
No vulnerabilities found at this time. | ||
{{ end }} | ||
Trivy Dependency Scan Results ({{ .Target }}) | ||
ID,Name,Version,Notes | ||
{{ range .Packages -}} | ||
{{- quote .ID }}, | ||
{{- quote .Name }}, | ||
{{- quote .Version }} | ||
{{ else -}} | ||
No dependencies found at this time. | ||
{{ end }} | ||
{{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a schedule here instead of just on landing? I.e.:
This way we know exactly which landing may have caused a regression and don't have to bisect 24h worth landings to find the one that is causing a failure -- assuming anyone even notices the failed result of a scheduled test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two factors in Trivy scans:
I assume that the second bullet will be the root cause of the majority of incidents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the first bullet, then that is something that the developer introduced with their PR and so they need to resolve it there.
Even the second bullet does not represent cases we don't already experience. Any one of our linters for example can start flagging pre-existing code that it never used to simply because a new version of the linter is introduced and starts flagging existing constructs that it does not like any more. When this happens we have to fix the code it starts flagging. It fails PRs. Because if it doesn't it is ignored and landed and then everyone's PRs start flagging the offending (yet pre-existing) code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk is that random PR can be affected by a new vulnerability detected only because of the time coincidence.
You can make some changes in C-code, and your PR will be blocked because a new Java/python vulnerability will be detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the risk and we already accept that kind of risk and resolution process with our other checkers. We frequently see unrelated linting issues creep into PRs due to updated linting tools, spelling checkers, etc.
Those are either corrected in the PR that found them or somebody is assigned the task of getting master updated to the correction. But it's only because they block progress that they get fixed. Otherwise nobody takes on the task and distraction to address them if they don't halt progress. History has proven this to us.
PRs that are blocked wait for the update and rebase, or they can be force-landed if gatekeepers are willing.
Given that we accept this process for less serious issues like spell-checking and code linting, we must surely accept this process for actual security vulnerabilities that are detected!