Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Change how scores are displayed #2466

Open
pnacht opened this issue Nov 18, 2022 · 9 comments
Open

Feature: Change how scores are displayed #2466

pnacht opened this issue Nov 18, 2022 · 9 comments
Labels
kind/enhancement New feature or request Stale

Comments

@pnacht
Copy link
Contributor

pnacht commented Nov 18, 2022

Is your feature request related to a problem? Please describe.
There's a discrepancy between how good a given score is and how it feels. A 7/10 feels like a passing grade at best, but it actually means a project is in the top ~10% of the most relevant projects (or top ~1% of all projects), for example.

There have been a few maintainers who are surprised to hear that they're actually doing a good job when they get a good score.

twbs/bootstrap#37402 (comment):

That being said, 7.2 is not good enough either

numpy/numpy#22482 (comment):

The badge gives a number, 6.2 in our case. I'm not sure many people know how to interpret that number - it feels like a low score

Describe the solution you'd like
A score that feels as good as it actually is. My proposal would be to either replace or supplement the current final score (7/10) with the respective quantile (top x%). The badge should also display the result in quantiles instead of (or as well as) final scores.

This would make everyone (maintainers and users) more accurately understand how solid a project's security posture is.

Even the top projects would have a better experience: I wager some users currently see urllib3's 9.3 and think "wow, that's pretty good, but still clearly needs to improve something!", when their actual understanding should be "wow, this is the most secure open-source project out there!"

Personally, I'd be in favor of the quantile simply supplementing the final score, precisely because (for example) urllib3 might be the most secure open-source project out there, but that missing 0.7 does also point out there's room for improvement. In simple terms:

  • the x/10 score should be more maintainer-facing, letting them know there's still work to be done
  • the quantile "score" should be more consumer-facing, letting them know how secure the project is, compared to its peers.

Additional context

A first issue may be that the histogram of project scores isn't very nuanced: it seems clear from the chart below that GitHub's defaults give projects a score around 4.5/10 (charts obtained via the public BigQuery data), so the ~1 million projects analyzed by Scorecards can basically be categorized as "did something to improve their security" (and are therefore "top ~1%" of projects) or "did something to weaken their security" (and are therefore "bottom ~1%" of projects).

quantile plot for all projects pinged by Scorecard

However, if we focus on "important" projects, the chart becomes much more useful:

quantile plot for most relevant projects pinged by Scorecard

Naturally, this chart is heavily influenced by how we define "important". For the chart above, I defined it as projects with a criticality_score > 0.5. This choice was completely arbitrary, and just so happens to include ~10,000 projects. Whether this cutoff is appropriate or whether criticality_score is the best tool is naturally something that can (should!) be discussed as well.

It is also worth mentioning that this curve is an almost perfect sigmoid, and therefore calculating the quantile would be quite straightforward, though the equation parameters may need to be updated over time (hopefully due to improving scores across the open-source ecosystem!):

comparison of relevant projects' quantile plot and an estimated sigmoid

(the vertical axis goes from -25 to 125 because the estimated curve goes slightly above 100 and below 0, but that should be easy to clamp)

@pnacht pnacht added the kind/enhancement New feature or request label Nov 18, 2022
@laurentsimon
Copy link
Contributor

I like the idea. @spencerschrock @azeemsgoogle @naveensrinivasan wdut?

@di
Copy link
Member

di commented Aug 17, 2023

Since completely replacing the X/10 score might be disruptive, we might want to explore supplementing these scores with a percentile, like:

  • 7/10 (top 90% percentile for this check)

@spencerschrock
Copy link
Contributor

Since completely replacing the X/10 score might be disruptive, we might want to explore supplementing these scores with a percentile, like:

  • 7/10 (top 90% percentile for this check)

Is this for, the badge, the result viewer, or the results themselves?

@di
Copy link
Member

di commented Aug 18, 2023

I'd say anywhere we display an X/10 score, we should do this as well -- we should file separate issues for the results viewer/badge as necessary.

@pnacht
Copy link
Contributor Author

pnacht commented Aug 21, 2023

I'm not sure how valuable quantiles are for individual checks, especially given how many checks are "binary" (0 or 10). I also suspect (without looking at any data) that the distributions will be heavily skewed/distorted, which might lead to less nuanced quantiles (i.e. only have top 1% or top 99% quantiles).

In my initial proposal, I was actually only thinking of having quantiles for the final score, where we have a pretty reasonable ("normal-ish") distribution.

But yes, I'd then show these quantiles everywhere: the CLI output, the viewer, the badge.

@github-actions
Copy link

This issue is stale because it has been open for 60 days with no activity.

Copy link

This issue is stale because it has been open for 60 days with no activity.

@raghavkaul
Copy link
Contributor

The OpenSSF Best Practices badge uses "Passing", "Silver", and "Gold" which is easy to see at a glance. libraries must pass all criteria at a level before moving on to the next level. A similar scheme for Scorecard might be: pass X probes for Silver, X + Y for gold, etc.

Copy link

This issue has been marked stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the Stale label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request Stale
Projects
Status: No status
Development

No branches or pull requests

6 participants