-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New detected licenses appears for SPDX project with submodule #7231
Comments
I've tried to reproduce this with ORT version
|
I think I finally understand what is happening here. The problem is not that package configurations are not applied anymore, the problem is that a new violation appears and the package configuration only cover the "previous" one. Also now there are two rules violations::
and
The second being what is already defined in the Anyway, the bug is that new a violation appears, not that package configuration are not applied. |
I believe we should not discuss this issue on the level of "rule violations", because it adds complexity IMO without need and the actual policy you use are not specified to me, so I cannot know what is causing them. @nnobelis can you please show the differences in the detected licenses for these two? So, to me the bug is still not yet clearly specified. NOTE: IIUC the new rule violation is for a |
@fviernau yes you are right, but I won't change the title of the ticket as you will see underneath. Also, with the successful run on the 20.06 (customer said the breaking change started the 24.06), there was one detected License:
This findings was addressed with the following package configuration:
With latest ORT, there are no additional findings for this file in the However I see a rule violation in the Webapp report:
If I remove the
Therefore IIUC, there is no regression in terms of license findings, The regression is how the Evaluator interprets the license findings to create the rules violations. What do you think ?
This is right, but this rule violation for the project didn't appear before and, as said above, the license findings didn't change. |
@nnobelis I believe we should first really understand the impact on the detected licenses of the code changes which are suspicious wrt to this bug. Have the detected licenses (exact findings) changed or not for So, please proof that the findings returned by
Once this is clear, we surely know whether the issue is caused by behavior change in scanner. |
Here are the results of ORT version a8d1cc6:
ORT version fda27e2:
So you see the results are the same. However:
I will search further. |
Some news:
*: there is just the removal of the Now, I will try to test with the command |
So finally I managed to reproduce locally a discrepancy 🥳 Running 6e68575 (Main) - bug :
c05ec08 - no bug:
The FossID snippets have been removed from the results for briefness. |
@nnobelis I only had a short look, but this now is highly likely now going into the right direction. Using different scanners for packages and projects is something I totally forgot to consider during the implementation. So, currently you can specify project and package scanners, and it would scan as specified - but the scan results for any package would always contain all scan results related to the respective provenance of the package. So, this is the (edge?) case where a project A and a package B have some overlap in their (nested) provenance. E.g. at least one pair of provenance is identical. Can you confirm that this is the case? (e.g. you could look it up in the scan result's entries for |
Yes there is an overlap. Here is the
|
@nnobelis I've assigned the ticket to you as you'll be the one implementing the fix. |
This topic was discussed in the core dev meeting today: The root cause of the issue is that the relation between package id and scanner is lost in the ORT result when different scanners are used for projects and packages. As a result, if there are packages and projects with the same provenance, they get the scan results from both project and package scanners assigned. The solution is to store for each package id which scanners have been run, then this information can be used to find the correct scan results for a package id later on. |
The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. The same problem should also for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. The same problem should also for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. It was reproducible by scanning a package which has a cached package provenance resolution result. The resolved provenance pointed to a no more existing repository, so resolving the nested provenance failed, and thus the scan result for that package were incomplete. An analog problem should also exist for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
When a project scanner is configured alongside a package scanner, the scan results could be duplicated if the provenances of those packages are overlapping (e.g. in the case of repositories with Git submodules). This commit addresses the bug by constructing a map of packages with their scanners and filtering the scan results using this map. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
When a project scanner is configured alongside a package scanner, the scan results could be duplicated if the provenances of those packages are overlapping (e.g. in the case of repositories with Git submodules). This commit addresses the bug by constructing a map of packages with their scanners and filtering the scan results using this map. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
When a project scanner is configured alongside a package scanner, the scan results could be duplicated if the provenances of those packages are overlapping (e.g. in the case of repositories with Git submodules). This commit addresses the bug by constructing a map of packages with their scanners and filtering the scan results using this map. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
When a project scanner is configured alongside a package scanner, the scan results could be duplicated if the provenances of those packages are overlapping (e.g. in the case of repositories with Git submodules). This commit addresses the bug by constructing a map of packages with their scanners and filtering the scan results using this map. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
When a project scanner is configured alongside a package scanner, the scan results could be duplicated if the provenances of those packages are overlapping (e.g. in the case of repositories with Git submodules). This commit addresses the bug by constructing a map of packages with their scanners and filtering the scan results using this map. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
When a project scanner is configured alongside a package scanner, the scan results could be duplicated if the provenances of those packages are overlapping (e.g. in the case of repositories with Git submodules). This commit addresses the bug by constructing a map of packages with their scanners and filtering the scan results using this map. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. It was reproducible by scanning a package which has a cached package provenance resolution result. The resolved provenance pointed to a no more existing repository, so resolving the nested provenance failed, and thus the scan result for that package were incomplete. An analog problem should also exist for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. It was reproducible by scanning a package which has a cached package provenance resolution result. The resolved provenance pointed to a no more existing repository, so resolving the nested provenance failed, and thus the scan result for that package were incomplete. An analog problem should also exist for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The logic for merging scan results has been made a little more strict when it got factored out into a separate function by [1]. It's been changed to throw an `IllegalStateException` in case no result is associated with the empty path in the given `scanResultsByPath`. Already before [1] there wasn't any guarantee that `scanResultsByPath` would be "complete" in the sense that it would always contain at least one scan result (per scanner) for each provenance which is part of the respective nested provenance. In particular, it also wasn't guaranteed that a scan result would be present for the root provenance. Incomplete scan results just were merged regardlessly and silently / without the propagation of any issue. So, [1] introduced a regression. After [1] `ScannerRun` got changed to store results by provenance [2] which introduced another caller (in `scannerRun`) to above mentioned logic aka. / now called `mergeScanResultsByScanner()`. For this new caller it's been observed that it may crash the entire scan command, see [3]. It was reproducible by scanning a package which has a cached package provenance resolution result. The resolved provenance pointed to a no more existing repository, so resolving the nested provenance failed, and thus the scan result for that package were incomplete. An analog problem should also exist for the existing callers, in particular the one in `ScanResultsStorage.write()`. Fix the issue by reverting the added strictness of [1], effectively switching back to tolerating incomplete scan results in the same way as before [1]. Note: It would make sense to implement a better handling for incomplete scan results. At least an issue should be flagged, while it is debatable whether any (incomplete) findings or no findings should be returned. A proper solution might depend on knowing which scanners are supposed to be run for each package, which is not present yet but is planned to be added in context of fixing [4]. [1] 65c6612 [2] d6846d6 [3] #7263 [4] #7231 Fixes #7263. Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes oss-review-toolkit#7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
The scanner may use different set of scanners for scanning different packages. The same holds for projects. `ScannerRun` does not contain that information, and `ScannerRun.getScanResultsById()` just returns all scans matching the package (nested) provenance, disregarding which scanners are to be used for the particular package. Extend the data model to contain that information and use it to adjust the internals of `getScanResultsById()` to adhere to the set of scanners to be used for the particular project or package. Fixes #7231. Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.io>
Our customer has a repository containing a
project.spdx.yml
file. In this file, Git submodules, pointing to third party repositories, are referenced as packages.In this repository, the customer defines a
license_finding_curations
in apackage_configurations
for one of these referenced packages. Thus, it should be possible to prevent rule violation for code in the third party repositories.Unfortunately, since the 20.06, there
packages_configuration
are not applied anymore and the customer get some rule violations.A minimal project reproducing the setup:
https://github.com/nnobelis/spdx-project
Last successful build: version 47bd691..
The text was updated successfully, but these errors were encountered: