Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add Additional Details to License Check #2442

Merged
merged 52 commits into from
Nov 28, 2022
Merged

Conversation

shissam
Copy link
Contributor

@shissam shissam commented Nov 9, 2022

What kind of change does this PR introduce?

This is a proposed feature enhancement to the License Check which servers two purposes

  • Provide specific details as to which license was detected

  • Provide the basis to add gradation to license analysis

  • PR title follows the guidelines defined in our pull request documentation

What is the current behavior?

License check looks in the repo for specific license files and some matching content. Score is either min or max depending if a file is found - regardless of the license content.

What is the new behavior (if this is a feature change)?**

  • introduce clients/licenses.go which embodies this github api (similar to clients/languages.go) - it would be plural as it may come down to the point where gitlab returns multiple licenses for a repo (I have a real-world example).
  • This github client would populate a structure like the following:
type License struct {
        Key      string // RepositoryLicense.GetLicense().GetKey()
        Name     string // RepositoryLicense.GetLicense().GetName()
        Path     string // RepositoryLicense.GetName()
        Size     int    // RepositoryLicense.GetSize()
        SPDXId   string // RepositoryLicense.GetLicense().GetSPDXID()
        Type     string // RepositoryLicense.GetType()
}

where (example from ossf/scorecard):

{
 Key:apache-2.0
 Name:Apache License 2.0
 Path:LICENSE
 Size:11356
 SPDXId:Apache-2.0 
Type:file
}
  • modify checks/raw/license.go to follow one of two possible paths:
    -- pathOne try the repoClient (as in c.RepoClient.ListLicenses()), if a license is reported by github USE that and return
    -- pathTwo if the repoClient returns nil, continue with the legacy logic and return what is always returned
  • there are a number of advantages to this approach:
    -- speedup: for github, my test of c.RepoClient.ListLicenses() returns in less than 5 seconds, when compared to other repos with many files (like torvalds/linux this check takes over a minute
    -- extensibility: this test with github is promising, and gitlab does appear to have a api for getting this information (https://docs.gitlab.com/ee/api/templates/licenses.html)
    -- confidence: using the github api for 'sensing' which specific license is in use I assume is higher (more review on their algorithm which might use NLP)--I tried a license on my site which changed the work Apache to Apacje in the complete Apache License 2.0 file and github still sensed it was APL-2.0 (very cool).
    -- standardization: I am encouraged by the use of the SPDXId in the github API, if it is true that gitlab using the same standard identifier for licenses - this would be great for the community
    -- usability: using scorecard to retrieve the specific license detected would a) make this information more readily available to the end-user; and b) would support develop pipelines that need to sense the license (by name) as a risk management activity without having to move to other means to acquire such license information
  • with this approach, propose changing checker.LicenseData{}:
type License struct {
        Key      string // from clients.License.Key
        Name     string // from clients.License.Name
        Size     int    // clients.License.Size (Size maybe not here but in type File struct
        SPDXId   string // clients.License.SPDXId
        Attribution     AttributionType // license sourced from LicenseRepoAPI or ScorecardEngine
}

// one file contains one license.
type LicenseFile struct {
        License License
        File    File
}

// LicenseData contains the raw results
// for the License check.
// Some repos may have more than one license.
type LicenseData struct {
        LicenseFiles []LicenseFile
}

Which issue(s) this PR fixes

Fixes #1369

With a more complete proposal at https://github.com/ossf/scorecard/issues/1369#issuecomment-1304831531

Special notes for your reviewer

I would like to have a discussion with the owners/maintainer about idea for gradating the evaluation, with additional information from the repo API, that may lead to greater confidence in scores. This PR would take a step in that direction.

Does this PR introduce a user-facing change?

format=raw would have the entire LicenseData found by scorecard in the file or the repo API
format=[json|default] would have the rational for any such decided on gradation.

For user-facing changes, please add a concise, human-readable release note to
the release-note

(In particular, describe what changes users might need to make in their
application as a result of this pull request.)


shissam and others added 30 commits October 20, 2022 16:01
* Examines and awards points for linked content (URLs / Emails)

* Examines and awards points for hints of disclosure and vulnerability practices

* Examines and awards points for hints of elaboration of timelines

Signed-off-by: Scott Hissam <shissam@gmail.com>
…valuation

Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
…t length over the length of the linked content for urls and emails

Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
…ecks.yaml for generate-docs

Signed-off-by: Scott Hissam <shissam@gmail.com>
…nts)

* replaced reason strings with log.Info & log.Warn (as seen in --show-details)

* internal assertion check for nil (*pinfo) and empty pfile

* internal switched to FileTypeText over FileTypeSource

* internal implement type SecurityPolicyInformationType/SecurityPolicyInformation revised SecurityPolicyData to support only one file

* revised expected unit-test results and revised unit-test to reflect the new SecurityPolicyData type

Signed-off-by: Scott Hissam <shissam@gmail.com>
…or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <shissam@gmail.com>
…or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <shissam@gmail.com>
…or one email(s) found; e2e tests update accordingly

Signed-off-by: Scott Hissam <shissam@gmail.com>
…licy file to track hits by line number

Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
…ity policy check

Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
…o support the potential for multiple security policy files.

Signed-off-by: Scott Hissam <shissam@gmail.com>
…y files only after future improvements to aggregating scoring across such files are designed. For now the security policy behaves as originally designed to stop once one of the expected policy files are found in the repo

Signed-off-by: Scott Hissam <shissam@gmail.com>
…s and removed unneeded break statements in the code

Signed-off-by: Scott Hissam <shissam@gmail.com>
… filename from the code and introduced FileSize to checker.File type and removed the SecurityContentLength which was used to hold that information for the new security policy assessment

Signed-off-by: Scott Hissam <shissam@gmail.com>
…s found in the org level repos

Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
* Reorganize

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Working commit

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Compile with local scorecard; go mod tidy

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Add signing code

Heavily borrowed from https://github.com/grafeas/kritis/blob/master/cmd/kritis/signer/main.go

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Update deps

* Naming
* Makefile

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Edit license, add lint.yml

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* checks: go mod tidy, license

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Address PR comments

* Split into checker/signer files
* Naming convention

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* License, remove golangci.yml

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Address PR comments

* Use cobra

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Add tests for root command

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Filter out checks that aren't needed for policy evaluation

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Add `make` targets for attestor; submit coverage stats

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Improvements

* Use sclog instead of glog
* Remove unneeded subcommands
* Formatting

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Flags: Make note-name constant and fix messaging

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Remove SupportedRequestTypes

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* go mod tidy

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* go mod tidy, makefile

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Fix GH actions run

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

Signed-off-by: Raghav Kaul <raghavkaul@google.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>
…inated vulnerability disclosure guidelines

Signed-off-by: Scott Hissam <shissam@gmail.com>
… reworked some of the code comments, added map generation to TestLicense, added an additional mutex for the regex group identifier index, removed spurious prints, revised unit test accordingly, updated documentation.

Signed-off-by: Scott Hissam <shissam@gmail.com>
@shissam shissam temporarily deployed to integration-test November 27, 2022 19:56 Inactive
@github-actions
Copy link

Integration tests success for
[1f55b7a]
(https://github.com/ossf/scorecard/actions/runs/3559735104)

@shissam shissam temporarily deployed to integration-test November 28, 2022 14:28 Inactive
@github-actions
Copy link

Integration tests success for
[1ef1703]
(https://github.com/ossf/scorecard/actions/runs/3565703124)

checker/raw_result.go Outdated Show resolved Hide resolved
…tion constants to be more meaningful, update documentation as necessary for changes

Signed-off-by: Scott Hissam <shissam@gmail.com>
auto-merge was automatically disabled November 28, 2022 17:54

Head branch was pushed to by a user without write access

@shissam shissam temporarily deployed to integration-test November 28, 2022 17:54 Inactive
@github-actions
Copy link

Integration tests success for
[c1b7373]
(https://github.com/ossf/scorecard/actions/runs/3567344961)

Copy link
Contributor

@laurentsimon laurentsimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for explaining everything in such details, and the amount of work you've put in this PR!

@laurentsimon laurentsimon merged commit 28b116f into ossf:main Nov 28, 2022
raghavkaul added a commit to raghavkaul/scorecard that referenced this pull request Feb 9, 2023
* ✨ Improved Security Policy Check (ossf#2137)

* Examines and awards points for linked content (URLs / Emails)

* Examines and awards points for hints of disclosure and vulnerability practices

* Examines and awards points for hints of elaboration of timelines

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Repaired Security Policy to correctly use linked content length for evaluation

Signed-off-by: Scott Hissam <shissam@gmail.com>

* gofmt'ed changes

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Repaired the case in the evaluation which was too sensitive to content length over the length of the linked content for urls and emails

Signed-off-by: Scott Hissam <shissam@gmail.com>

* added unit test cases for the new content-based Security Policy checks

Signed-off-by: Scott Hissam <shissam@gmail.com>

* reverted the direct (mistaken) change to checks.md and updated the checks.yaml for generate-docs

Signed-off-by: Scott Hissam <shissam@gmail.com>

* ✨ Improved Security Policy Check (ossf#2137) (revisted based on comments)

* replaced reason strings with log.Info & log.Warn (as seen in --show-details)

* internal assertion check for nil (*pinfo) and empty pfile

* internal switched to FileTypeText over FileTypeSource

* internal implement type SecurityPolicyInformationType/SecurityPolicyInformation revised SecurityPolicyData to support only one file

* revised expected unit-test results and revised unit-test to reflect the new SecurityPolicyData type

Signed-off-by: Scott Hissam <shissam@gmail.com>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <shissam@gmail.com>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <shissam@gmail.com>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; e2e tests update accordingly

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Addressed PR comments; added telemetry for policy hits in security policy file to track hits by line number

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Resolved merge conflict with checks.yaml

Signed-off-by: Scott Hissam <shissam@gmail.com>

* updated raw results to emit all the raw information for the new security policy check

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Resolved merge conflicts and lint errors with json_raw_results.go

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Addressed review comments to reorganize security policy data struct to support the potential for multiple security policy files.

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Added logic to the security policy to process multiple security policy files only after future improvements to aggregating scoring across such files are designed. For now the security policy behaves as originally designed to stop once one of the expected policy files are found in the repo

Signed-off-by: Scott Hissam <shissam@gmail.com>

* added comments regarding the capacity to support multiple policy files and removed unneeded break statements in the code

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Addressed review comments to remove the dependency on the path in the filename from the code and introduced FileSize to checker.File type and removed the SecurityContentLength which was used to hold that information for the new security policy assessment

Signed-off-by: Scott Hissam <shissam@gmail.com>

* restored reporting full security policy path and filename for policies found in the org level repos

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Resolved conflicts in checks.yaml for documentation

Signed-off-by: Scott Hissam <shissam@gmail.com>

* ✨ CLI for scorecard-attestor (ossf#2309)

* Reorganize

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Working commit

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Compile with local scorecard; go mod tidy

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Add signing code

Heavily borrowed from https://github.com/grafeas/kritis/blob/master/cmd/kritis/signer/main.go

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Update deps

* Naming
* Makefile

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Edit license, add lint.yml

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* checks: go mod tidy, license

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Address PR comments

* Split into checker/signer files
* Naming convention

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* License, remove golangci.yml

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Address PR comments

* Use cobra

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Add tests for root command

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Filter out checks that aren't needed for policy evaluation

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Add `make` targets for attestor; submit coverage stats

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Improvements

* Use sclog instead of glog
* Remove unneeded subcommands
* Formatting

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Flags: Make note-name constant and fix messaging

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Remove SupportedRequestTypes

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* go mod tidy

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* go mod tidy, makefile

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

* Fix GH actions run

Signed-off-by: Raghav Kaul <raghavkaul@google.com>

Signed-off-by: Raghav Kaul <raghavkaul@google.com>
Signed-off-by: Scott Hissam <shissam@gmail.com>

* removed whitespace before stanza for Run attestor e2e

Signed-off-by: Scott Hissam <shissam@gmail.com>

* resolved code review and doc review comments

Signed-off-by: Scott Hissam <shissam@gmail.com>

* repaired the link for the maintainer's guide for supporting the coordinated vulnerability disclosure guidelines

Signed-off-by: Scott Hissam <shissam@gmail.com>

* initial implementation of ossf#1369 (comment) to provide more license details

Signed-off-by: Scott Hissam <shissam@gmail.com>

* draft implementation to provide more information on license details

Signed-off-by: Scott Hissam <shissam@gmail.com>

* repaired a misspelling

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Changed to handle http errors with 404 not found as being a non-error for not being able to find a license

Signed-off-by: Scott Hissam <shissam@gmail.com>

* Return an error status similar to other gitlab checks

Signed-off-by: Scott Hissam <shissam@gmail.com>

* add new raw licenses data

Signed-off-by: Scott Hissam <shissam@gmail.com>

* updated e2e test as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <shissam@gmail.com>

* added numerous more test filenames and a shouldFail boolean as some filenames will fail that do not meet checks.md rules

Signed-off-by: Scott Hissam <shissam@gmail.com>

* license check now, primarily, uses the GH API for checking licenses

Signed-off-by: Scott Hissam <shissam@gmail.com>

* updated local checker as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <shissam@gmail.com>

* added draft license gradation for scoring, add a map to OSI and FSF licenses, added GH API for retrieving repo license, revamp license filename matching when not using a repo API for detecting license files.

Signed-off-by: Scott Hissam <shissam@gmail.com>

* repaired race condition for case insensitive map, improved regex matching, moved licenses to raw, raw now mimics GH API return values for key, name, etc., updated unit tests and raw results accordingly

Signed-off-by: Scott Hissam <shissam@gmail.com>

* completed disambiguation of SPDX Identifiers and filename extensions, reworked some of the code comments, added map generation to TestLicense, added an additional mutex for the regex group identifier index, removed spurious prints, revised unit test accordingly, updated documentation.

Signed-off-by: Scott Hissam <shissam@gmail.com>

* removed repo Key from LicenseInformation as unneeded, changed attribution constants to be more meaningful, update documentation as necessary for changes

Signed-off-by: Scott Hissam <shissam@gmail.com>

Signed-off-by: Scott Hissam <shissam@gmail.com>
Signed-off-by: Raghav Kaul <raghavkaul@google.com>
Co-authored-by: raghavkaul <8695110+raghavkaul@users.noreply.github.com>
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Jul 9, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Jul 9, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Jul 9, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Aug 22, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Nov 18, 2023
Enable License check for local repositories that was disabled in the
PR ossf#2442
abhiseksanyal added a commit to lineaje-labs/scorecard that referenced this pull request Oct 2, 2024
Enable License check for local repositories that was disabled in the
PR ossf#2442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add gradation to license analysis
3 participants