Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-623 ci: Add a workflow for Trivy scan #14623

Merged
merged 17 commits into from
Aug 2, 2024
Merged

DAOS-623 ci: Add a workflow for Trivy scan #14623

merged 17 commits into from
Aug 2, 2024

Conversation

grom72
Copy link
Contributor

@grom72 grom72 commented Jun 21, 2024

Trivy scan is required for SLDe process.

Example execution:
https://github.com/daos-stack/daos/actions/runs/9613005998/job/26514814960?pr=14623

There is no need to run any Functional/Unit/NLT tests, as this PR introduces only a new GHA workflow that does not affect the source code in any way.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

Ticket title is 'Generic ticket for minor code cleanup and improvement'
Status is 'Resolved'
Labels: 'request_for_2.6,request_for_2.6.1'
https://daosio.atlassian.net/browse/DAOS-623

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14623/1/execution/node/1183/log

@grom72 grom72 marked this pull request as ready for review June 21, 2024 19:46
.github/workflows/triviy.yml Outdated Show resolved Hide resolved
.github/workflows/triviy.yml Outdated Show resolved Hide resolved
cp trivy-report-daos.${{ steps.gen_extension.outputs.EXTENSION }}.txt report
cp utils/trivy/.trivyignore report/trivyignore.txt

- name: Upload the report to the GitHub artifactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: Upload the report to the GitHub artifactory
- name: Upload the report to the GitHub artifact store

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.github/workflows/triviy.yml Outdated Show resolved Hide resolved
grom72 added 7 commits June 25, 2024 16:53
Skip-test: true
Skip-unit-test: true
Skip-func-test: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Skip-test: true
Skip-unit-test: true
Skip-func-test: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This reverts commit 4ad656b.

Skip-test: true
Skip-unit-test: true
Skip-func-test: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This reverts commit f5105c5.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Skip-test: true
Skip-unit-test: true
Skip-func-test: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@brianjmurrell
Copy link
Contributor

grom72 (Tomasz Gromadzki) force-pushed the grom72/trivy branch from 34df04c to b7277dd

@grom72 We don't allow force-pushing once reviewing has started as it voids the ability to incrementally review only what was changed since previous reviews. You are essentially asking reviewers to start from scratch and re-review the entire PR again including code they have already reviewed which is not a good use of time.

Comment on lines 5 to 6
schedule:
- cron: '46 8 * * 0'
Copy link
Contributor

@brianjmurrell brianjmurrell Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a schedule here instead of just on landing? I.e.:

Suggested change
schedule:
- cron: '46 8 * * 0'
push:
branches: ["master"]

This way we know exactly which landing may have caused a regression and don't have to bisect 24h worth landings to find the one that is causing a failure -- assuming anyone even notices the failed result of a scheduled test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two factors in Trivy scans:

  • a new code that enables new use cases and new potential vulnerability areas
  • new vulnerabilities detected in already used packages

I assume that the second bullet will be the root cause of the majority of incidents

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the first bullet, then that is something that the developer introduced with their PR and so they need to resolve it there.

Even the second bullet does not represent cases we don't already experience. Any one of our linters for example can start flagging pre-existing code that it never used to simply because a new version of the linter is introduced and starts flagging existing constructs that it does not like any more. When this happens we have to fix the code it starts flagging. It fails PRs. Because if it doesn't it is ignored and landed and then everyone's PRs start flagging the offending (yet pre-existing) code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk is that random PR can be affected by a new vulnerability detected only because of the time coincidence.
You can make some changes in C-code, and your PR will be blocked because a new Java/python vulnerability will be detected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the risk and we already accept that kind of risk and resolution process with our other checkers. We frequently see unrelated linting issues creep into PRs due to updated linting tools, spelling checkers, etc.

Those are either corrected in the PR that found them or somebody is assigned the task of getting master updated to the correction. But it's only because they block progress that they get fixed. Otherwise nobody takes on the task and distraction to address them if they don't halt progress. History has proven this to us.

PRs that are blocked wait for the update and rebase, or they can be force-landed if gatekeepers are willing.

Given that we accept this process for less serious issues like spell-checking and code linting, we must surely accept this process for actual security vulnerabilities that are detected!


# generate trivy report only if no errors detected
- name: Generate trivy report file extension
# github.base_ref-based for PR and github.ref_name-based for schedule/workflow_dispatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using landing-time testing instead of scheduled resolve this either/or for github.base_ref/github.ref_name?

In either case, see below for a more concise solution.

Copy link
Contributor Author

@grom72 grom72 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to block PR because of detected vulnerabilities, as the solution is to update the corrupted package to the proper version or, like what we did with Hadoop, create a suppression.

It will be great to add a status badge to monitor the Trivy scan as we have it done for other regular actions, when we have this PR landed

License
Coverity Scan Build Status
Build
Codespell
Doxygen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to block PR because of detected vulnerabilities,

Why not? That is how we operate here. If some new "check" comes up as failed, it needs to be fixed before the PR can land. Otherwise we propagate that failure to everyone else's PR once they rebase on the branch that allowed the PR with the failure in it to land. This is huge waste of everyone else's time.

It will be great to add a status badge to monitor the Trivy scan as we have it done for other regular actions

Nobody monitors these badges. All of these checks behind these badges fail PRs (and landing branches even) when they start failing. That is the only way failures are noticed here.

Even scheduled (i.e. cron) failures will go unnoticed until they start showing up in PRs, by which time it now becomes somebody's job to start bisecting and otherwise hunting for the source of the failure. That somebody will most assuredly come from the SRE team, disrupting whatever tasks they had going on and adds zero value to their OKRs.

Failures need to be alerted and fixed when and where they happen and by whomever introduces them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the Trivy scan might fail not because of new code but because of a new vulnerability detected in some unrelated to the PR code.
There are two goals (from the SDLe process perspective):

  • prevent any new vulnerability from being introduced by developers (rare case)
  • detect new vulnerabilities that are detected and reported by subcomponent owners

The second

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the Trivy scan might fail not because of new code but because of a new vulnerability detected in some unrelated to the PR code.

Yes, that is understood. As described above, we already accept this process model with our other checks and validations.

The problem is that we do not have processes that support allowing new (even unrelated) issues to "slide" and land so that they are addressed later. Nobody ever owns fixing those and they continue to pile up until the pile is so big that it's a nightmare. So the only process that works for us is to have new -- even unrelated issues -- block progress, forcing somebody to own getting it fixed.

.github/workflows/triviy.yml Outdated Show resolved Hide resolved
.github/workflows/triviy.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@brianjmurrell brianjmurrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few changes on updated code.

@brianjmurrell
Copy link
Contributor

Is there any reason we are not using the https://github.com/aquasecurity/trivy-action github action, possibly/potentially with GitHub Code Scanning?

Skip-test: true
Skip-unit-test: true
Skip-func-test: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72
Copy link
Contributor Author

grom72 commented Jun 25, 2024

grom72 (Tomasz Gromadzki) force-pushed the grom72/trivy branch from 34df04c to b7277dd

@grom72 We don't allow force-pushing once reviewing has started as it voids the ability to incrementally review only what was changed since previous reviews. You are essentially asking reviewers to start from scratch and re-review the entire PR again including code they have already reviewed which is not a good use of time.

It is because of not signed merge commits and instruction that proposes the following sequence of commands:

git rebase HEAD~7 --signoff
git push --force-with-lease origin grom72/trivy

@grom72
Copy link
Contributor Author

grom72 commented Jun 25, 2024

Is there any reason we are not using the https://github.com/aquasecurity/trivy-action github action, possibly/potentially with GitHub Code Scanning?

The reason was/is to have easy local reproduction of the Trivy scan.

@grom72 grom72 requested a review from brianjmurrell June 25, 2024 17:10
@brianjmurrell
Copy link
Contributor

It is because of not signed merge commits

In the future you can disregard issues with unsigned merge commits. They will disappear when the PR is landed.

and instruction that proposes the following sequence of commands:

git rebase HEAD~7 --signoff
git push --force-with-lease origin grom72/trivy

Yes, it is unfortunate that that is the suggested solution for such a case.

@brianjmurrell
Copy link
Contributor

Is there any reason we are not using the aquasecurity/trivy-action github action, possibly/potentially with GitHub Code Scanning?

The reason was/is to have easy local reproduction of the Trivy scan.

I think it is doubtful that anyone is going to try to reproduce the scan locally. Everyone does their work in CI here. We have many workflows that do not have easy local reproductions but ultimately the tools underlying them can be used locally. This seems to fit that same category.

I think leveraging the work put into the action and it's ongoing maintenance (by somebody else) and the (i.e GitHub Code Scanning) reporting capabilities of using the action outweigh the slim possibilities that somebody might want to try to use the workflow code to try to reproduce locally. Having good and clear reporting (such as GitHub Code Scanning provides) should be the primary goal so that we make it easy for developers to find their errors without having to decode difficult to understand ad-hoc report outputs.

@grom72 grom72 marked this pull request as draft June 26, 2024 15:20
grom72 added 2 commits July 25, 2024 23:54
Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72 grom72 removed the request for review from ipoddubn July 25, 2024 22:32
@grom72
Copy link
Contributor Author

grom72 commented Jul 25, 2024

Is there any reason we are not using the https://github.com/aquasecurity/trivy-action github action, possibly/potentially with GitHub Code Scanning?

Done

@grom72 grom72 marked this pull request as ready for review July 25, 2024 22:33
Copy link
Contributor

@brianjmurrell brianjmurrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SDL the driving force behind us running this scan?

It might be useful to suppress some (or all maybe even) of the ignores just so that we can see how detected vulnerabilities are presented to the user so that we can make sure it's useful/usable by developers. Feel free to do that in a fork of this PR if you don't want that churn in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of ignored CVEs. Are we allowed to ignore this many? Some of them are HIGH even.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Trivy is because of SDL
  2. It has been agreed with SDL people that we can ignore all Hadoop-related CVE
  3. We do not release any Java code so we can practically ignore all Java-related issues
  4. I want first to have the running workflow and next clean all required issues (in a separate PR)
  5. An example of a failed build with suppressed ignores: https://github.com/grom72/daos/actions/runs/10159867708/job/28095002601#step:8:31

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of a failed build with suppressed ignores: grom72/daos/actions/runs/10159867708/job/28095002601#step:8:31

Nice. Is there a reason we don't take advantage of the whole Code Scanning workflow in GitHub with this action?

https://github.com/aquasecurity/trivy-action#using-trivy-with-github-code-scanning

Copy link
Contributor Author

@grom72 grom72 Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow supports two things:

  • SDL evidence generation where I want to have also ignored issues documented
  • Clear info in the log when something goes wrong
  • Clear info in the log about ignored issues

Sarif scan added. Results of the run w/ expected failures are here (see first two issues):
https://github.com/daos-stack/daos/security/code-scanning?query=pr%3A14623+is%3Aclosed

@grom72 grom72 requested a review from wiliamhuang July 30, 2024 09:37
grom72 added 2 commits July 30, 2024 18:56
Skip all functional tests
Doc-only: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Doc-only: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

Doc-only: true
Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Doc-only: true

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
grom72 added 3 commits July 31, 2024 08:34
Doc-only: true

This reverts commit bceb0ac.

Required-githooks: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This reverts commit 666e3b9.
Doc-only: true
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Required-githooks: true
Doc-only: true

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72 grom72 requested a review from brianjmurrell July 31, 2024 07:06
@grom72 grom72 requested a review from a team August 2, 2024 10:49
@daltonbohning daltonbohning merged commit 846b57a into master Aug 2, 2024
40 checks passed
@daltonbohning daltonbohning deleted the grom72/trivy branch August 2, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants