New detected licenses appears for SPDX project with submodule #7231

nnobelis · 2023-06-30T12:12:24Z

Our customer has a repository containing a project.spdx.yml file. In this file, Git submodules, pointing to third party repositories, are referenced as packages.

In this repository, the customer defines a license_finding_curations in a package_configurations for one of these referenced packages. Thus, it should be possible to prevent rule violation for code in the third party repositories.

Unfortunately, since the 20.06, there packages_configuration are not applied anymore and the customer get some rule violations.

A minimal project reproducing the setup:
https://github.com/nnobelis/spdx-project

Last successful build: version 47bd691..

The text was updated successfully, but these errors were encountered:

fviernau · 2023-06-30T12:39:53Z

I've tried to reproduce this with ORT version cdecb9b7232c1e9c25a7f928d2a07e3851811987 and on my machine the license finding curation from the package configuration is applied. I did the follwing:

clone the test repo via HTTP
ort --debug analyze -i . -o .
ort --info scan -i analyzer-result.yml -o .
run the evaluator against ort-config rules
compute the static HTML report for the evaluation result

nnobelis · 2023-06-30T13:33:05Z

I think I finally understand what is happening here.

The problem is not that package configurations are not applied anymore, the problem is that a new violation appears and the package configuration only cover the "previous" one.

Also now there are two rules violations::

package_configurations:
- id: "SpdxDocumentFile::spdx-project:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-project.git"
    revision: "8ff8f3e99b06319a110e32b4c1593315cb6fbaf7"
  license_finding_curations:
  - path: "spdx-dependency/README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

and

package_configurations:
- id: "SpdxDocumentFile::spdx-dependency:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-dependency.git"
    revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
  license_finding_curations:
  - path: "README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"
  - path: "README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

The second being what is already defined in the .ort.yml file (and what our customer has defined). I have no idea why there are two license_finding_curations in this how to fix hint.

Anyway, the bug is that new a violation appears, not that package configuration are not applied.
I will change the title of the issue.

fviernau · 2023-07-03T06:35:31Z

Anyway, the bug is that new a violation appears, not that package configuration are not applied. I will change the title of the issue.

I believe we should not discuss this issue on the level of "rule violations", because it adds complexity IMO without need and the actual policy you use are not specified to me, so I cannot know what is causing them. @nnobelis can you please show the differences in the detected licenses for these two? So, to me the bug is still not yet clearly specified.

NOTE: IIUC the new rule violation is for a project not for a package. Package configurations are only for packages. For projects the path excludes and license finding curations from the ort.yml get applied.

nnobelis · 2023-07-03T08:14:34Z

@fviernau yes you are right, but I won't change the title of the ticket as you will see underneath.

Also, with the successful run on the 20.06 (customer said the breaking change started the 24.06), there was one detected License:

scan_results:
    'SpdxDocumentFile::spdx-dependency:':
    - provenance:
        vcs_info:
          type: "Git"
          url: "https://github.com/nnobelis/spdx-dependency.git"
          revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
          path: ""
        resolved_revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
      scanner:
        name: "ScanCode"
        version: "30.1.0"
        configuration: "--copyright --license --info --strip-root --timeout 300 --max-in-memory\
          \ 5000 --json-pp"
      summary:
        start_time: "2023-06-30T12:19:24.000603742Z"
        end_time: "2023-06-30T12:19:29.000479778Z"
        package_verification_code: ""
        licenses:
        - license: "GPL-3.0-only"
          location:
            path: "README.md"
            start_line: 4
            end_line: 4
          score: 100.0

This findings was addressed with the following package configuration:

package_configurations:
  - id: "SpdxDocumentFile::spdx-dependency:"
    vcs:
      type: "Git"
      url: "https://github.com/nnobelis/spdx-dependency.git"
    license_finding_curations:
      - path: "README.md"
        start_lines: "4"
        line_count: 1
        detected_license: "GPL-3.0-only"
        concluded_license: "NONE"
        reason: "INCORRECT"
        comment: "This is an example how to fix a detected license"

With latest ORT, there are no additional findings for this file in the scan-result.yml.

However I see a rule violation in the Webapp report:

How to fix

If this is a false-positive or ineffective finding, it can be fixed in your .ort.yml file:

---
package_configurations:
- id: "SpdxDocumentFile::spdx-project:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-project.git"
    revision: "8ff8f3e99b06319a110e32b4c1593315cb6fbaf7"
  license_finding_curations:
  - path: "spdx-dependency/README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

If I remove the .ort.yml package configuration, I have now two violations. This one and:

How to fix

If this is a false-positive or ineffective finding, it can be fixed in your .ort.yml file:

---
package_configurations:
- id: "SpdxDocumentFile::spdx-dependency:"
  vcs:
    type: "Git"
    url: "https://github.com/nnobelis/spdx-dependency.git"
    revision: "36bbd4db42c153961d82e5b93564389f79bb0129"
  license_finding_curations:
  - path: "README.md"
    start_lines: "4"
    line_count: 1
    detected_license: "GPL-3.0-only"
    concluded_license: <Insert correct license.>
    reason: <Choose one of the reason from the list [CODE, DATA_OF, DOCUMENTATION_OF, INCORRECT, NOT_DETECTED, REFERENCE].>
    comment: "<Describe the reason for the license finding curation.>"

Therefore IIUC, there is no regression in terms of license findings, The regression is how the Evaluator interprets the license findings to create the rules violations. What do you think ?

NOTE: IIUC the new rule violation is for a project not for a package. Package configurations are only for packages. For projects the path excludes and license finding curations from the ort.yml get applied.

This is right, but this rule violation for the project didn't appear before and, as said above, the license findings didn't change.

fviernau · 2023-07-03T10:52:22Z

@nnobelis I believe we should first really understand the impact on the detected licenses of the code changes which are suspicious wrt to this bug. Have the detected licenses (exact findings) changed or not for "SpdxDocumentFile::spdx-dependency: and SpdxDocumentFile::spdx-project:. While it also make sense to have a look at the ScannerRun in OrtResult, I believe it is best to focus first on OrtResult.getScanResultForId().

So, please proof that the findings returned by OrtResult.getScanResultForId() have changed or haven't changed for

SpdxDocumentFile::spdx-project:
SpdxDocumentFile::spdx-dependency:

Once this is clear, we surely know whether the issue is caused by behavior change in scanner.

nnobelis · 2023-07-03T13:39:44Z

@fviernau

Here are the results of getScanResultForId for both packages

ORT version a8d1cc6:

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=ssh://git@github.com/nnobelis/spdx-project.git, revision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9, path=), resolvedRevision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T12:39:46.000322356Z, endTime=2023-07-03T12:41:46.000056533Z, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=spdx-dependency/README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=36bbd4db42c153961d82e5b93564389f79bb0129, path=), resolvedRevision=36bbd4db42c153961d82e5b93564389f79bb0129), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T12:41:37.000420794Z, endTime=2023-07-03T12:41:46.000056533Z, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

ORT version fda27e2:

`ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=ssh://git@github.com/nnobelis/spdx-project.git, revision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9, path=), resolvedRevision=bd2d4eb1a01e028f09dd131edb55a2055e1018b9), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T13:19:46.000129005Z, endTime=2023-07-03T13:20:52.000284948Z, packageVerificationCode=, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=spdx-dependency/README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=36bbd4db42c153961d82e5b93564389f79bb0129, path=), resolvedRevision=36bbd4db42c153961d82e5b93564389f79bb0129), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-03T13:20:40.000888988Z, endTime=2023-07-03T13:20:52.000284948Z, packageVerificationCode=, licenseFindings=[LicenseFinding(license=GPL-3.0-only, location=TextLocation(path=README.md, startLine=4, endLine=4), score=95.0)], copyrightFindings=[], snippetFindings=[], issues=[]), additionalData={})

So you see the results are the same. However:

I still get violations with ORT fda27e2, which is not the case of our customer. Therefore, I fear my test project does not capture the full scenario.
During scanning, I get the following warning Git submodule at 'spdx-dependency' not initialized. Cannot recursively list its submodules. I am in contact with @mnonnenmacher to try to solve this issue.

I will search further.

nnobelis · 2023-07-05T11:37:46Z

Some news:

I updated the test project
I managed to replicate the bug with the test project, in our infrastructure. An important factor is that we run --scanners ScanCode --project-scanners FossId, therefore there is a split between projects and packages. The bug originates from a finding from scancode, also for a package.
I try to replicate locally with --scanners ScanCode --package-types PACKAGE and to evaluate getScanResultForId for both a working version (c05ec08) and a one with the bug (main). Unfortunaly, results are inconclusive as the function returns the same* results for both ORT versions:
getScanResultForId(spdx-project) -> []
getScanResultForId(spdx-dependency) -> [ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})]

*: there is just the removal of the packageVerificationCode property.

Now, I will try to test with the command --scanners ScanCode --project-scanners FossId locally.

nnobelis · 2023-07-05T12:23:18Z

So finally I managed to reproduce locally a discrepancy 🥳

Running --scanners ScanCode --project-scanners FossId.

6e68575 (Main) - bug :

getScanResultForId(spdx-project):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-project.git, revision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71, path=), resolvedRevision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71), scanner=ScannerDetails(name=FossId, version=2022.1.1, configuration=), summary=ScanSummary(startTime=2023-07-05T11:49:44.621656Z, endTime=2023-07-05T11:50:11.000057Z, licenseFindings=[], copyrightFindings=[], snippetFindings=[...], issues=[2023-07-05T11:50:10.945823Z [HINT]: FossId - Pending identification for 'src/internal.c'., 2023-07-05T11:50:10.945862Z [HINT]: FossId - Pending identification for 'src/sniffer.c'., 2023-07-05T11:50:10.945865Z [HINT]: FossId - Pending identification for 'src/statemap.h'., 2023-07-05T11:50:10.989310Z [HINT]: FossId - Failed to map license 'mpl11' as an SPDX expression.]), additionalData={scancode=bosch_nino2_spdx-project_spdx-project_main_20230705_134946_origin, scanid=55921, serverurl=<url of fossid instance>})
ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-project.git, revision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71, path=), resolvedRevision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=spdx-dependency/src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=spdx-dependency/src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})

getScanResultForId(spdx-dependency):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})
ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=FossId, version=2022.1.1, configuration=), summary=ScanSummary(startTime=2023-07-05T11:49:44.621656Z, endTime=2023-07-05T11:50:11.000057Z, licenseFindings=[], copyrightFindings=[], snippetFindings=[...], issues=[2023-07-05T11:50:10.945823Z [HINT]: FossId - Pending identification for 'src/internal.c'., 2023-07-05T11:50:10.945862Z [HINT]: FossId - Pending identification for 'src/sniffer.c'., 2023-07-05T11:50:10.945865Z [HINT]: FossId - Pending identification for 'src/statemap.h'., 2023-07-05T11:50:10.989310Z [HINT]: FossId - Failed to map license 'mpl11' as an SPDX expression.]), additionalData={scancode=bosch_nino2_spdx-project_spdx-project_main_20230705_134946_origin, scanid=55921, serverurl=<url of fossid instance>})

c05ec08 - no bug:

getScanResultForId(spdx-project):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-project.git, revision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71, path=), resolvedRevision=8fae7f2b0efa8addeadfd15041e3437d92ff6b71), scanner=ScannerDetails(name=FossId, version=2022.1.1, configuration=), summary=ScanSummary(startTime=2023-07-05T11:58:04.331687Z, endTime=2023-07-05T11:58:30.881329Z, packageVerificationCode=, licenseFindings=[], copyrightFindings=[], snippetFindings=[...], issues=[2023-07-05T11:58:30.833402Z [HINT]: FossId - Pending identification for 'src/internal.c'., 2023-07-05T11:58:30.833428Z [HINT]: FossId - Pending identification for 'src/sniffer.c'., 2023-07-05T11:58:30.833430Z [HINT]: FossId - Pending identification for 'src/statemap.h'., 2023-07-05T11:58:30.874951Z [HINT]: FossId - Failed to map license 'mpl11' as an SPDX expression.]), additionalData={scancode=bosch_nino2_spdx-project_spdx-project_main_20230705_135806_delta, scanid=55922, serverurl=<url of fossid instance>})

getScanResultForId(spdx-dependency):

ScanResult(provenance=RepositoryProvenance(vcsInfo=VcsInfo(type=Git, url=https://github.com/nnobelis/spdx-dependency.git, revision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7, path=), resolvedRevision=78c8a5b12ae2d0f613fff25f2d1eb746a768ece7), scanner=ScannerDetails(name=ScanCode, version=31.2.4, configuration=--copyright --license --info --strip-root --timeout 300 --json-pp), summary=ScanSummary(startTime=2023-07-05T10:51:03.000807680Z, endTime=2023-07-05T10:51:08.000618024Z, packageVerificationCode=, licenseFindings=[LicenseFinding(license=BSD-1-Clause, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=22, endLine=41), score=100.0)], copyrightFindings=[CopyrightFinding(statement=(c) 1995 - 2019 SEGGER Microcontroller GmbH, location=TextLocation(path=src/SEGGER_RTT_Conf.h, startLine=6, endLine=6))], snippetFindings=[], issues=[]), additionalData={})

The FossID snippets have been removed from the results for briefness.
I can provide the Analyzer, Scan or Evaluator results if needed.

fviernau · 2023-07-06T07:14:06Z

@nnobelis I only had a short look, but this now is highly likely now going into the right direction. Using different scanners for packages and projects is something I totally forgot to consider during the implementation. So, currently you can specify project and package scanners, and it would scan as specified - but the scan results for any package would always contain all scan results related to the respective provenance of the package.

So, this is the (edge?) case where a project A and a package B have some overlap in their (nested) provenance. E.g. at least one pair of provenance is identical. Can you confirm that this is the case? (e.g. you could look it up in the scan result's entries for ScannerRun.provenances)

nnobelis · 2023-07-06T07:50:19Z

Yes there is an overlap. Here is the ScannerRun.provenances for 6e68575 (Main):

  provenances:
  - id: "SpdxDocumentFile::spdx-dependency:"
    package_provenance:
      vcs_info:
        type: "Git"
        url: "https://github.com/nnobelis/spdx-dependency.git"
        revision: "78c8a5b12ae2d0f613fff25f2d1eb746a768ece7"
        path: ""
      resolved_revision: "78c8a5b12ae2d0f613fff25f2d1eb746a768ece7"
  - id: "SpdxDocumentFile::spdx-project:"
    package_provenance:
      vcs_info:
        type: "Git"
        url: "https://github.com/nnobelis/spdx-project.git"
        revision: "8fae7f2b0efa8addeadfd15041e3437d92ff6b71"
        path: ""
      resolved_revision: "8fae7f2b0efa8addeadfd15041e3437d92ff6b71"
    sub_repositories:
      spdx-dependency:
        type: "Git"
        url: "https://github.com/nnobelis/spdx-dependency.git"
        revision: "78c8a5b12ae2d0f613fff25f2d1eb746a768ece7"
        path: ""

fviernau · 2023-07-10T12:20:47Z

@nnobelis I've assigned the ticket to you as you'll be the one implementing the fix.

mnonnenmacher · 2023-07-10T19:17:34Z

This topic was discussed in the core dev meeting today: The root cause of the issue is that the relation between package id and scanner is lost in the ORT result when different scanners are used for projects and packages. As a result, if there are packages and projects with the same provenance, they get the scan results from both project and package scanners assigned. The solution is to store for each package id which scanners have been run, then this information can be used to find the correct scan results for a package id later on.

The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. The same problem should also for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>

The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. The same problem should also for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>

The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. It was reproducible by scanning a package which has a cached package provenance resolution result. The resolved provenance pointed to a no more existing repository, so resolving the nested provenance failed, and thus the scan result for that package were incomplete. An analog problem should also exist for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>

When a project scanner is configured alongside a package scanner, the scan results could be duplicated if the provenances of those packages are overlapping (e.g. in the case of repositories with Git submodules). This commit addresses the bug by constructing a map of packages with their scanners and filtering the scan results using this map. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>

The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. It was reproducible by scanning a package which has a cached package provenance resolution result. The resolved provenance pointed to a no more existing repository, so resolving the nested provenance failed, and thus the scan result for that package were incomplete. An analog problem should also exist for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>

The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>

The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes #7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>

nnobelis changed the title ~~Packages configuration are not applied anymore~~ Package configurations are not applied anymore Jun 30, 2023

fviernau self-assigned this Jun 30, 2023

fviernau added the bug Issues that are considered to be bugs label Jun 30, 2023

nnobelis changed the title ~~Package configurations are not applied anymore~~ New rules violation appears for SPDX project with submodule Jun 30, 2023

nnobelis changed the title ~~New rules violation appears for SPDX project with submodule~~ New detected licenses appears for SPDX project with submodule Jul 3, 2023

fviernau mentioned this issue Jul 10, 2023

Report file findings in SpdxDocuments / support complete file listings in SBOMs #6945

Closed

11 tasks

fviernau assigned nnobelis and unassigned fviernau Jul 10, 2023

fviernau mentioned this issue Jul 12, 2023

fix(scanner): Do not crash when merging "incomplete" scan results #7277

Merged

fviernau mentioned this issue Jul 13, 2023

fix(scanner): Filter scan results by scanner name #7276

Merged

mnonnenmacher closed this as completed in #7276 Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New detected licenses appears for SPDX project with submodule #7231

New detected licenses appears for SPDX project with submodule #7231

nnobelis commented Jun 30, 2023

fviernau commented Jun 30, 2023

nnobelis commented Jun 30, 2023

fviernau commented Jul 3, 2023 •

edited

Loading

nnobelis commented Jul 3, 2023

fviernau commented Jul 3, 2023 •

edited

Loading

nnobelis commented Jul 3, 2023

nnobelis commented Jul 5, 2023

nnobelis commented Jul 5, 2023 •

edited

Loading

fviernau commented Jul 6, 2023 •

edited

Loading

nnobelis commented Jul 6, 2023

fviernau commented Jul 10, 2023

mnonnenmacher commented Jul 10, 2023

New detected licenses appears for SPDX project with submodule #7231

New detected licenses appears for SPDX project with submodule #7231

Comments

nnobelis commented Jun 30, 2023

fviernau commented Jun 30, 2023

nnobelis commented Jun 30, 2023

fviernau commented Jul 3, 2023 • edited Loading

nnobelis commented Jul 3, 2023

fviernau commented Jul 3, 2023 • edited Loading

nnobelis commented Jul 3, 2023

nnobelis commented Jul 5, 2023

nnobelis commented Jul 5, 2023 • edited Loading

fviernau commented Jul 6, 2023 • edited Loading

nnobelis commented Jul 6, 2023

fviernau commented Jul 10, 2023

mnonnenmacher commented Jul 10, 2023

fviernau commented Jul 3, 2023 •

edited

Loading

fviernau commented Jul 3, 2023 •

edited

Loading

nnobelis commented Jul 5, 2023 •

edited

Loading

fviernau commented Jul 6, 2023 •

edited

Loading