Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New detected licenses appears for SPDX project with submodule #7231

Closed
Tracked by #6945
nnobelis opened this issue Jun 30, 2023 · 12 comments · Fixed by #7276
Closed
Tracked by #6945

New detected licenses appears for SPDX project with submodule #7231

nnobelis opened this issue Jun 30, 2023 · 12 comments · Fixed by #7276
Assignees
Labels
bug Issues that are considered to be bugs

Comments

@nnobelis
Copy link
Member

Our customer has a repository containing a project.spdx.yml file. In this file, Git submodules, pointing to third party repositories, are referenced as packages.

In this repository, the customer defines a license_finding_curations in a package_configurations for one of these referenced packages. Thus, it should be possible to prevent rule violation for code in the third party repositories.

Unfortunately, since the 20.06, there packages_configuration are not applied anymore and the customer get some rule violations.

A minimal project reproducing the setup:
https://github.com/nnobelis/spdx-project

Last successful build: version 47bd691..

@nnobelis nnobelis changed the title Packages configuration are not applied anymore Package configurations are not applied anymore Jun 30, 2023
@fviernau fviernau self-assigned this Jun 30, 2023
@fviernau fviernau added the bug Issues that are considered to be bugs label Jun 30, 2023
@fviernau
Copy link
Member

I've tried to reproduce this with ORT version cdecb9b7232c1e9c25a7f928d2a07e3851811987 and on my machine the license finding curation from the package configuration is applied. I did the follwing:

  1. clone the test repo via HTTP
  2. ort --debug analyze -i . -o .
  3. ort --info scan -i analyzer-result.yml -o .
  4. run the evaluator against ort-config rules
  5. compute the static HTML report for the evaluation result

Screenshot from 2023-06-30 14-38-21

@nnobelis
Copy link
Member Author

I think I finally understand what is happening here.

The problem is not that package configurations are not applied anymore, the problem is that a new violation appears and the package configuration only cover the "previous" one.

Also now there are two rules violations::

package_configurations:
- id: "SpdxDocumentFile::spdx-project:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-project.git"
    revision: "8ff8f3e99b06319a110e32b4c1593315cb6fbaf7"
  license_finding_curations:
  - path: "spdx-dependency/README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

and

package_configurations:
- id: "SpdxDocumentFile::spdx-dependency:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-dependency.git"
    revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
  license_finding_curations:
  - path: "README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"
  - path: "README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

The second being what is already defined in the .ort.yml file (and what our customer has defined). I have no idea why there are two license_finding_curations in this how to fix hint.

Anyway, the bug is that new a violation appears, not that package configuration are not applied.
I will change the title of the issue.

@nnobelis nnobelis changed the title Package configurations are not applied anymore New rules violation appears for SPDX project with submodule Jun 30, 2023
@fviernau
Copy link
Member

fviernau commented Jul 3, 2023

Anyway, the bug is that new a violation appears, not that package configuration are not applied. I will change the title of the issue.

I believe we should not discuss this issue on the level of "rule violations", because it adds complexity IMO without need and the actual policy you use are not specified to me, so I cannot know what is causing them. @nnobelis can you please show the differences in the detected licenses for these two? So, to me the bug is still not yet clearly specified.

NOTE: IIUC the new rule violation is for a project not for a package. Package configurations are only for packages. For projects the path excludes and license finding curations from the ort.yml get applied.

@nnobelis nnobelis changed the title New rules violation appears for SPDX project with submodule New detected licenses appears for SPDX project with submodule Jul 3, 2023
@nnobelis
Copy link
Member Author

nnobelis commented Jul 3, 2023

@fviernau yes you are right, but I won't change the title of the ticket as you will see underneath.

Also, with the successful run on the 20.06 (customer said the breaking change started the 24.06), there was one detected License:

scan_results:
    'SpdxDocumentFile::spdx-dependency:':
    - provenance:
        vcs_info:
          type: "Git"
          url: "https://github.com/nnobelis/spdx-dependency.git"
          revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
          path: ""
        resolved_revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
      scanner:
        name: "ScanCode"
        version: "30.1.0"
        configuration: "--copyright --license --info --strip-root --timeout 300 --max-in-memory\
          \ 5000 --json-pp"
      summary:
        start_time: "2023-06-30T12:19:24.000603742Z"
        end_time: "2023-06-30T12:19:29.000479778Z"
        package_verification_code: ""
        licenses:
        - license: "GPL-3.0-only"
          location:
            path: "README.md"
            start_line: 4
            end_line: 4
          score: 100.0

This findings was addressed with the following package configuration:

package_configurations:
  - id: "SpdxDocumentFile::spdx-dependency:"
    vcs:
      type: "Git"
      url: "https://github.com/nnobelis/spdx-dependency.git"
    license_finding_curations:
      - path: "README.md"
        start_lines: "4"
        line_count: 1
        detected_license: "GPL-3.0-only"
        concluded_license: "NONE"
        reason: "INCORRECT"
        comment: "This is an example how to fix a detected license"

With latest ORT, there are no additional findings for this file in the scan-result.yml.

However I see a rule violation in the Webapp report:

How to fix

If this is a false-positive or ineffective finding, it can be fixed in your .ort.yml file:

---
package_configurations:
- id: "SpdxDocumentFile::spdx-project:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-project.git"
    revision: "8ff8f3e99b06319a110e32b4c1593315cb6fbaf7"
  license_finding_curations:
  - path: "spdx-dependency/README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

If I remove the .ort.yml package configuration, I have now two violations. This one and:

How to fix

If this is a false-positive or ineffective finding, it can be fixed in your .ort.yml file:

---
package_configurations:
- id: "SpdxDocumentFile::spdx-dependency:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-dependency.git"
    revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
  license_finding_curations:
  - path: "README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

Therefore IIUC, there is no regression in terms of license findings, The regression is how the Evaluator interprets the license findings to create the rules violations. What do you think ?

NOTE: IIUC the new rule violation is for a project not for a package. Package configurations are only for packages. For projects the path excludes and license finding curations from the ort.yml get applied.

This is right, but this rule violation for the project didn't appear before and, as said above, the license findings didn't change.

@fviernau
Copy link
Member

fviernau commented Jul 3, 2023

@nnobelis I believe we should first really understand the impact on the detected licenses of the code changes which are suspicious wrt to this bug. Have the detected licenses (exact findings) changed or not for "SpdxDocumentFile::spdx-dependency: and SpdxDocumentFile::spdx-project:. While it also make sense to have a look at the ScannerRun in OrtResult, I believe it is best to focus first on OrtResult.getScanResultForId().

So, please proof that the findings returned by OrtResult.getScanResultForId() have changed or haven't changed for

  1. SpdxDocumentFile::spdx-project:
  2. SpdxDocumentFile::spdx-dependency:

Once this is clear, we surely know whether the issue is caused by behavior change in scanner.

@nnobelis
Copy link
Member Author

nnobelis commented Jul 3, 2023

@fviernau

Here are the results of getScanResultForId for both packages

ORT version a8d1cc6:

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=ssh://git@github.com/nnobelis/spdx-project.git, revision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9, path=), resolvedRevision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T12:39:46.000322356Z, endTime=2023-07-03T12:41:46.000056533Z, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=spdx-dependency/README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=36bbd4db42c153961d82e5b93564389f79bb0129, path=), resolvedRevision=36bbd4db42c153961d82e5b93564389f79bb0129), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T12:41:37.000420794Z, endTime=2023-07-03T12:41:46.000056533Z, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

ORT version fda27e2:

`ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=ssh://git@github.com/nnobelis/spdx-project.git, revision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9, path=), resolvedRevision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T13:19:46.000129005Z, endTime=2023-07-03T13:20:52.000284948Z, packageVerificationCode=, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=spdx-dependency/README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=36bbd4db42c153961d82e5b93564389f79bb0129, path=), resolvedRevision=36bbd4db42c153961d82e5b93564389f79bb0129), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T13:20:40.000888988Z, endTime=2023-07-03T13:20:52.000284948Z, packageVerificationCode=, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

So you see the results are the same. However:

  • I still get violations with ORT fda27e2, which is not the case of our customer. Therefore, I fear my test project does not capture the full scenario.
  • During scanning, I get the following warning Git submodule at 'spdx-dependency' not initialized. Cannot recursively list its submodules. I am in contact with @mnonnenmacher to try to solve this issue.

I will search further.

@nnobelis
Copy link
Member Author

nnobelis commented Jul 5, 2023

Some news:

  1. I updated the test project
  2. I managed to replicate the bug with the test project, in our infrastructure. An important factor is that we run --scanners ScanCode --project-scanners FossId, therefore there is a split between projects and packages. The bug originates from a finding from scancode, also for a package.
  3. I try to replicate locally with --scanners ScanCode --package-types PACKAGE and to evaluate getScanResultForId for both a working version (c05ec08) and a one with the bug (main). Unfortunaly, results are inconclusive as the function returns the same* results for both ORT versions:
    getScanResultForId(spdx-project) -> []
    getScanResultForId(spdx-dependency) -> [ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})]

*: there is just the removal of the packageVerificationCode property.

Now, I will try to test with the command --scanners ScanCode --project-scanners FossId locally.

@nnobelis
Copy link
Member Author

nnobelis commented Jul 5, 2023

So finally I managed to reproduce locally a discrepancy 🥳

Running --scanners ScanCode --project-scanners FossId.

6e68575 (Main) - bug :

getScanResultForId(spdx-project):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-project.git, revision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71, path=), resolvedRevision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71), scanner=ScannerDetails(name=FossId, version=2022.1.1, configuration=), summary=ScanSummary(startTime=2023-07-05T11:49:44.621656Z, endTime=2023-07-05T11:50:11.000057Z, licenseFindings=[], copyrightFindings=[], snippetFindings=[...], issues=[2023-07-05T11:50:10.945823Z [HINT]: FossId - Pending identification for 'src/internal.c'., 2023-07-05T11:50:10.945862Z [HINT]: FossId - Pending identification for 'src/sniffer.c'., 2023-07-05T11:50:10.945865Z [HINT]: FossId - Pending identification for 'src/statemap.h'., 2023-07-05T11:50:10.989310Z [HINT]: FossId - Failed to map license 'mpl11' as an SPDX expression.]), additionalData={scancode=bosch_nino2_spdx-project_spdx-project_main_20230705_134946_origin, scanid=55921, serverurl=<url of fossid instance>})
ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-project.git, revision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71, path=), resolvedRevision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=spdx-dependency/src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=spdx-dependency/src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})

getScanResultForId(spdx-dependency):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})
ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=FossId, version=2022.1.1, configuration=), summary=ScanSummary(startTime=2023-07-05T11:49:44.621656Z, endTime=2023-07-05T11:50:11.000057Z, licenseFindings=[], copyrightFindings=[], snippetFindings=[...], issues=[2023-07-05T11:50:10.945823Z [HINT]: FossId - Pending identification for 'src/internal.c'., 2023-07-05T11:50:10.945862Z [HINT]: FossId - Pending identification for 'src/sniffer.c'., 2023-07-05T11:50:10.945865Z [HINT]: FossId - Pending identification for 'src/statemap.h'., 2023-07-05T11:50:10.989310Z [HINT]: FossId - Failed to map license 'mpl11' as an SPDX expression.]), additionalData={scancode=bosch_nino2_spdx-project_spdx-project_main_20230705_134946_origin, scanid=55921, serverurl=<url of fossid instance>})

c05ec08 - no bug:

getScanResultForId(spdx-project):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-project.git, revision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71, path=), resolvedRevision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71), scanner=ScannerDetails(name=FossId, version=2022.1.1, configuration=), summary=ScanSummary(startTime=2023-07-05T11:58:04.331687Z, endTime=2023-07-05T11:58:30.881329Z, packageVerificationCode=, licenseFindings=[], copyrightFindings=[], snippetFindings=[...], issues=[2023-07-05T11:58:30.833402Z [HINT]: FossId - Pending identification for 'src/internal.c'., 2023-07-05T11:58:30.833428Z [HINT]: FossId - Pending identification for 'src/sniffer.c'., 2023-07-05T11:58:30.833430Z [HINT]: FossId - Pending identification for 'src/statemap.h'., 2023-07-05T11:58:30.874951Z [HINT]: FossId - Failed to map license 'mpl11' as an SPDX expression.]), additionalData={scancode=bosch_nino2_spdx-project_spdx-project_main_20230705_135806_delta, scanid=55922, serverurl=<url of fossid instance>})

getScanResultForId(spdx-dependency):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, packageVerificationCode=, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})

The FossID snippets have been removed from the results for briefness.
I can provide the Analyzer, Scan or Evaluator results if needed.

@fviernau
Copy link
Member

fviernau commented Jul 6, 2023

@nnobelis I only had a short look, but this now is highly likely now going into the right direction. Using different scanners for packages and projects is something I totally forgot to consider during the implementation. So, currently you can specify project and package scanners, and it would scan as specified - but the scan results for any package would always contain all scan results related to the respective provenance of the package.

So, this is the (edge?) case where a project A and a package B have some overlap in their (nested) provenance. E.g. at least one pair of provenance is identical. Can you confirm that this is the case? (e.g. you could look it up in the scan result's entries for ScannerRun.provenances)

@nnobelis
Copy link
Member Author

nnobelis commented Jul 6, 2023

Yes there is an overlap. Here is the ScannerRun.provenances for 6e68575 (Main):

  provenances:
  - id: "SpdxDocumentFile::spdx-dependency:"
    package_provenance:
      vcs_info:
        type: "Git"
        url: "https://github.com/nnobelis/spdx-dependency.git"
        revision: "78c8a5b12ae2d0f613fff25f2d1eb746a768ece7"
        path: ""
      resolved_revision: "78c8a5b12ae2d0f613fff25f2d1eb746a768ece7"
  - id: "SpdxDocumentFile::spdx-project:"
    package_provenance:
      vcs_info:
        type: "Git"
        url: "https://github.com/nnobelis/spdx-project.git"
        revision: "8fae7f2b0efa8addeadfd15041e3437d92ff6b71"
        path: ""
      resolved_revision: "8fae7f2b0efa8addeadfd15041e3437d92ff6b71"
    sub_repositories:
      spdx-dependency:
        type: "Git"
        url: "https://github.com/nnobelis/spdx-dependency.git"
        revision: "78c8a5b12ae2d0f613fff25f2d1eb746a768ece7"
        path: ""

@fviernau
Copy link
Member

@nnobelis I've assigned the ticket to you as you'll be the one implementing the fix.

@mnonnenmacher
Copy link
Member

This topic was discussed in the core dev meeting today: The root cause of the issue is that the relation between package id and scanner is lost in the ORT result when different scanners are used for projects and packages. As a result, if there are packages and projects with the same provenance, they get the scan results from both project and package scanners assigned. The solution is to store for each package id which scanners have been run, then this information can be used to find the correct scan results for a package id later on.

fviernau added a commit that referenced this issue Jul 12, 2023
The logic for merging scan results has been made a little more strict
when it got factored out into a separate function by [1]. It's been
changed to throw an `IllegalStateException` in case no result is
associated with the empty path in the given `scanResultsByPath`.

Already before [1] there wasn't any guarantee that `scanResultsByPath`
would be "complete" in the sense that it would always contain at least
one scan result (per scanner) for each provenance which is part of the
respective nested provenance. In particular, it also wasn't guaranteed
that a scan result would be present for the root provenance. Incomplete
scan results just were merged regardlessly and silently / without the
propagation of any issue. So, [1] introduced a regression.

After [1] `ScannerRun` got changed to store results by provenance which
introduced another caller (in `scannerRun`) to above mentioned logic aka.
/ now called `mergeScanResultsByScanner()`. For this new caller it's
been observed that it may crash the entire scan command, see [3]. The
same problem should also for the existing callers, in particular the one
in `ScanResultsStorage.write()`.

Fix the issue by reverting the added strictness of [1], effectively
switching back to tolerating incomplete scan results in the same way as
before [1].

Note: It would make sense to implement a better handling for incomplete
      scan results. At least an issue should be flagged, while it is
      debatable whether any (incomplete) findings or no findings should
      be returned. A proper solution might depend on knowing which
      scanners are supposed to be run for each package, which is not
      present yet but is planned to be added in context of fixing [4].

[1] 65c6612
[2] d6846d6
[3] #7263
[4] #7231

Fixes #7263.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jul 12, 2023
The logic for merging scan results has been made a little more strict
when it got factored out into a separate function by [1]. It's been
changed to throw an `IllegalStateException` in case no result is
associated with the empty path in the given `scanResultsByPath`.

Already before [1] there wasn't any guarantee that `scanResultsByPath`
would be "complete" in the sense that it would always contain at least
one scan result (per scanner) for each provenance which is part of the
respective nested provenance. In particular, it also wasn't guaranteed
that a scan result would be present for the root provenance. Incomplete
scan results just were merged regardlessly and silently / without the
propagation of any issue. So, [1] introduced a regression.

After [1] `ScannerRun` got changed to store results by provenance [2]
which introduced another caller (in `scannerRun`) to above mentioned logic
aka. / now called `mergeScanResultsByScanner()`. For this new caller it's
been observed that it may crash the entire scan command, see [3]. The
same problem should also for the existing callers, in particular the one
in `ScanResultsStorage.write()`.

Fix the issue by reverting the added strictness of [1], effectively
switching back to tolerating incomplete scan results in the same way as
before [1].

Note: It would make sense to implement a better handling for incomplete
      scan results. At least an issue should be flagged, while it is
      debatable whether any (incomplete) findings or no findings should
      be returned. A proper solution might depend on knowing which
      scanners are supposed to be run for each package, which is not
      present yet but is planned to be added in context of fixing [4].

[1] 65c6612
[2] d6846d6
[3] #7263
[4] #7231

Fixes #7263.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jul 12, 2023
The logic for merging scan results has been made a little more strict
when it got factored out into a separate function by [1]. It's been
changed to throw an `IllegalStateException` in case no result is
associated with the empty path in the given `scanResultsByPath`.

Already before [1] there wasn't any guarantee that `scanResultsByPath`
would be "complete" in the sense that it would always contain at least
one scan result (per scanner) for each provenance which is part of the
respective nested provenance. In particular, it also wasn't guaranteed
that a scan result would be present for the root provenance. Incomplete
scan results just were merged regardlessly and silently / without the
propagation of any issue. So, [1] introduced a regression.

After [1] `ScannerRun` got changed to store results by provenance [2]
which introduced another caller (in `scannerRun`) to above mentioned logic
aka. / now called `mergeScanResultsByScanner()`. For this new caller it's
been observed that it may crash the entire scan command, see [3]. It was
reproducible by scanning a package which has a cached package provenance
resolution result. The resolved provenance pointed to a no more existing
repository, so resolving the nested provenance failed, and thus the scan
result for that package were incomplete. An analog problem should also
exist for the existing callers, in particular the one in
`ScanResultsStorage.write()`.

Fix the issue by reverting the added strictness of [1], effectively
switching back to tolerating incomplete scan results in the same way as
before [1].

Note: It would make sense to implement a better handling for incomplete
      scan results. At least an issue should be flagged, while it is
      debatable whether any (incomplete) findings or no findings should
      be returned. A proper solution might depend on knowing which
      scanners are supposed to be run for each package, which is not
      present yet but is planned to be added in context of fixing [4].

[1] 65c6612
[2] d6846d6
[3] #7263
[4] #7231

Fixes #7263.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 12, 2023
When a project scanner is configured alongside a package scanner, the
scan results could be duplicated if the provenances of those packages are
overlapping (e.g. in the case of repositories with Git submodules).
This commit addresses the bug by constructing a map of packages with their
scanners and filtering the scan results using this map.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 12, 2023
When a project scanner is configured alongside a package scanner, the
scan results could be duplicated if the provenances of those packages are
overlapping (e.g. in the case of repositories with Git submodules).
This commit addresses the bug by constructing a map of packages with their
scanners and filtering the scan results using this map.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 12, 2023
When a project scanner is configured alongside a package scanner, the
scan results could be duplicated if the provenances of those packages are
overlapping (e.g. in the case of repositories with Git submodules).
This commit addresses the bug by constructing a map of packages with their
scanners and filtering the scan results using this map.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 12, 2023
When a project scanner is configured alongside a package scanner, the
scan results could be duplicated if the provenances of those packages are
overlapping (e.g. in the case of repositories with Git submodules).
This commit addresses the bug by constructing a map of packages with their
scanners and filtering the scan results using this map.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 12, 2023
When a project scanner is configured alongside a package scanner, the
scan results could be duplicated if the provenances of those packages are
overlapping (e.g. in the case of repositories with Git submodules).
This commit addresses the bug by constructing a map of packages with their
scanners and filtering the scan results using this map.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 12, 2023
When a project scanner is configured alongside a package scanner, the
scan results could be duplicated if the provenances of those packages are
overlapping (e.g. in the case of repositories with Git submodules).
This commit addresses the bug by constructing a map of packages with their
scanners and filtering the scan results using this map.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
fviernau added a commit that referenced this issue Jul 12, 2023
The logic for merging scan results has been made a little more strict
when it got factored out into a separate function by [1]. It's been
changed to throw an `IllegalStateException` in case no result is
associated with the empty path in the given `scanResultsByPath`.

Already before [1] there wasn't any guarantee that `scanResultsByPath`
would be "complete" in the sense that it would always contain at least
one scan result (per scanner) for each provenance which is part of the
respective nested provenance. In particular, it also wasn't guaranteed
that a scan result would be present for the root provenance. Incomplete
scan results just were merged regardlessly and silently / without the
propagation of any issue. So, [1] introduced a regression.

After [1] `ScannerRun` got changed to store results by provenance [2]
which introduced another caller (in `scannerRun`) to above mentioned logic
aka. / now called `mergeScanResultsByScanner()`. For this new caller it's
been observed that it may crash the entire scan command, see [3]. It was
reproducible by scanning a package which has a cached package provenance
resolution result. The resolved provenance pointed to a no more existing
repository, so resolving the nested provenance failed, and thus the scan
result for that package were incomplete. An analog problem should also
exist for the existing callers, in particular the one in
`ScanResultsStorage.write()`.

Fix the issue by reverting the added strictness of [1], effectively
switching back to tolerating incomplete scan results in the same way as
before [1].

Note: It would make sense to implement a better handling for incomplete
      scan results. At least an issue should be flagged, while it is
      debatable whether any (incomplete) findings or no findings should
      be returned. A proper solution might depend on knowing which
      scanners are supposed to be run for each package, which is not
      present yet but is planned to be added in context of fixing [4].

[1] 65c6612
[2] d6846d6
[3] #7263
[4] #7231

Fixes #7263.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jul 12, 2023
The logic for merging scan results has been made a little more strict
when it got factored out into a separate function by [1]. It's been
changed to throw an `IllegalStateException` in case no result is
associated with the empty path in the given `scanResultsByPath`.

Already before [1] there wasn't any guarantee that `scanResultsByPath`
would be "complete" in the sense that it would always contain at least
one scan result (per scanner) for each provenance which is part of the
respective nested provenance. In particular, it also wasn't guaranteed
that a scan result would be present for the root provenance. Incomplete
scan results just were merged regardlessly and silently / without the
propagation of any issue. So, [1] introduced a regression.

After [1] `ScannerRun` got changed to store results by provenance [2]
which introduced another caller (in `scannerRun`) to above mentioned logic
aka. / now called `mergeScanResultsByScanner()`. For this new caller it's
been observed that it may crash the entire scan command, see [3]. It was
reproducible by scanning a package which has a cached package provenance
resolution result. The resolved provenance pointed to a no more existing
repository, so resolving the nested provenance failed, and thus the scan
result for that package were incomplete. An analog problem should also
exist for the existing callers, in particular the one in
`ScanResultsStorage.write()`.

Fix the issue by reverting the added strictness of [1], effectively
switching back to tolerating incomplete scan results in the same way as
before [1].

Note: It would make sense to implement a better handling for incomplete
      scan results. At least an issue should be flagged, while it is
      debatable whether any (incomplete) findings or no findings should
      be returned. A proper solution might depend on knowing which
      scanners are supposed to be run for each package, which is not
      present yet but is planned to be added in context of fixing [4].

[1] 65c6612
[2] d6846d6
[3] #7263
[4] #7231

Fixes #7263.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jul 13, 2023
The logic for merging scan results has been made a little more strict
when it got factored out into a separate function by [1]. It's been
changed to throw an `IllegalStateException` in case no result is
associated with the empty path in the given `scanResultsByPath`.

Already before [1] there wasn't any guarantee that `scanResultsByPath`
would be "complete" in the sense that it would always contain at least
one scan result (per scanner) for each provenance which is part of the
respective nested provenance. In particular, it also wasn't guaranteed
that a scan result would be present for the root provenance. Incomplete
scan results just were merged regardlessly and silently / without the
propagation of any issue. So, [1] introduced a regression.

After [1] `ScannerRun` got changed to store results by provenance [2]
which introduced another caller (in `scannerRun`) to above mentioned logic
aka. / now called `mergeScanResultsByScanner()`. For this new caller it's
been observed that it may crash the entire scan command, see [3]. It was
reproducible by scanning a package which has a cached package provenance
resolution result. The resolved provenance pointed to a no more existing
repository, so resolving the nested provenance failed, and thus the scan
result for that package were incomplete. An analog problem should also
exist for the existing callers, in particular the one in
`ScanResultsStorage.write()`.

Fix the issue by reverting the added strictness of [1], effectively
switching back to tolerating incomplete scan results in the same way as
before [1].

Note: It would make sense to implement a better handling for incomplete
      scan results. At least an issue should be flagged, while it is
      debatable whether any (incomplete) findings or no findings should
      be returned. A proper solution might depend on knowing which
      scanners are supposed to be run for each package, which is not
      present yet but is planned to be added in context of fixing [4].

[1] 65c6612
[2] d6846d6
[3] #7263
[4] #7231

Fixes #7263.

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
nnobelis added a commit to boschglobal/oss-review-toolkit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes oss-review-toolkit#7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
mnonnenmacher pushed a commit that referenced this issue Jul 14, 2023
The scanner may use different set of scanners for scanning different
packages. The same holds for projects. `ScannerRun` does not contain that
information, and `ScannerRun.getScanResultsById()` just returns all scans
matching the package (nested) provenance, disregarding which scanners are
to be used for the particular package.
Extend the data model to contain that information and use it to adjust the
internals of `getScanResultsById()` to adhere to the set of scanners to be
used for the particular project or package.

Fixes #7231.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues that are considered to be bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants