Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(linter): implement noSecrets #3823

Merged
merged 39 commits into from
Sep 9, 2024
Merged

Conversation

SaadBazaz
Copy link
Contributor

@SaadBazaz SaadBazaz commented Sep 8, 2024

Summary

Possibly closes #3822 . I need to use this rule in some projects, but the lack of it in biome prevents adoption. This rule is great for security and sanitation.

The rule searches for potential secrets/keys in code.

Inspired by no-secrets/no-secrets in eslint.

I've dropped TODOs in the code for further improvement.

⚠️ NOTE: This is not a replacement for using your 🧠 when building. While this rule is helpful, it's not infallible. Always review your code carefully and consider implementing additional security measures like pre-commit hooks or automated secret scanning in your CI/CD pipeline, such as GitGuardian or enable GitHub protections.

Test Plan

I've tested locally using:

cargo t quick_test

and

just test-lintrule noSecrets

@github-actions github-actions bot added A-Project Area: project A-Linter Area: linter L-JavaScript Language: JavaScript and super languages A-Diagnostic Area: diagnostocis labels Sep 8, 2024
@SaadBazaz SaadBazaz requested a review from dyc3 September 8, 2024 12:05
Copy link

codspeed-hq bot commented Sep 8, 2024

CodSpeed Performance Report

Merging #3823 will not alter performance

Comparing SaadBazaz:feat/no-secrets (efbb8fd) with main (4bc409d)

Summary

✅ 107 untouched benchmarks

@github-actions github-actions bot added the A-CLI Area: CLI label Sep 8, 2024
@SaadBazaz
Copy link
Contributor Author

I need some help to resolve the merge conflicts.

@SaadBazaz SaadBazaz changed the title feat(linter): implement noSecrets initial feat(linter): implement noSecrets Sep 8, 2024
@dyc3
Copy link
Contributor

dyc3 commented Sep 8, 2024

The files with conflicts are all generated, so IIRC you should be able rebase on main and then just gen-all to run all the codegen.

@Conaclos
Copy link
Member

Conaclos commented Sep 8, 2024

The impact on the benchmark seems to be significant. I think the use of so many regexes is one of the causes. We should try to find a way to reduce this overhead.
One idea: we could exclude all strings that have less than X characters where X is the minimum of characters required to match at least one of the regex. This could exclude many small strings.

@SaadBazaz
Copy link
Contributor Author

The impact on the benchmark seems to be significant. I think the use of so many regexes is one of the causes. We should try to find a way to reduce this overhead. One idea: we could exclude all strings that have less than X characters where X is the minimum of characters required to match at least one of the regex. This could exclude many small strings.

Great idea! Although I feel like the benchmarks haven't been updated in a while (last run was ~3 hours ago, the code's been refactored majorly since then)

@github-actions github-actions bot added the L-CSS Language: CSS label Sep 9, 2024
@github-actions github-actions bot removed the L-CSS Language: CSS label Sep 9, 2024
@SaadBazaz
Copy link
Contributor Author

The more valuable things here are quick return heuristics. Excluding strings with spaces seems to be a good one, but it might cause some false negatives (eg. google service accounts are json, which may not be minified).

I agree... Eventually (in further passes) this would be implemented on comments as well... So we definitely need better, smarter heuristics :D

@zohairhadi
Copy link

would love to have this sooner

Copy link
Contributor

@dyc3 dyc3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Would you mind adding a changelog entry for this? (and make sure to fix any ci failures)

@github-actions github-actions bot added the A-Changelog Area: changelog label Sep 9, 2024
@SaadBazaz
Copy link
Contributor Author

Looks good to me. Would you mind adding a changelog entry for this? (and make sure to fix any ci failures)

Fixed, formatted, linted and changelog updated. Ready for merge. 🦾

@SaadBazaz
Copy link
Contributor Author

(and make sure to fix any ci failures)

Getting performance failure. The difference seems minor, however, it's on React benchmark which seems like a popular one. What should be the plan-of-action here? @dyc3

@Conaclos
Copy link
Member

Conaclos commented Sep 9, 2024

Still have some perf issues...

The rule (as any other new rules) will be in the nursery group.
It is ok if we have a small regression. We can still improve it in a future PR.

Do you think we should check for spaces in a string, and ignore those strings? Those would most likely be sentences anyway.

I am unsure if it could make a big difference. We could still try it in the future.

Some other ideas:

  • We could create an enum with a variant for each secret type.
    A function could return a slice of potential secret based on the length of the string.
    This could avoid iterating over all secrets and checking for min length.
    Here we return the possible secrets based on the string length.
  • Some secrets could be checked by a hand-made implementation by avoiding completely regexes.

@ematipico
Copy link
Member

(and make sure to fix any ci failures)

Getting performance failure. The difference seems minor, however, it's on React benchmark which seems like a popular one. What should be the plan-of-action here? @dyc3

That's a flaky test that isn't part of the analyzer, it's the parser. The last run didn't show any regression in the analyzer

@SaadBazaz
Copy link
Contributor Author

SaadBazaz commented Sep 9, 2024

Still have some perf issues...

The rule (as any other new rules) will be in the nursery rule. It is ok if we have a small regression. We can still improve it in a future PR.

Do you think we should check for spaces in a string, and ignore those strings? Those would most likely be sentences anyway.

I am unsure if it could make a big difference. We could still try it in the future.

Some other ideas:

  • We could an enum with a variant for each secret type.
    A function could return a slice of potential secret based on the length of the string.
    This could avoid iterating over all secrets and checking for min length.
    Here we return the possible secrets based on the string length.
  • Some secrets could be checked by a hand-made implementation by avoiding completely regexes.

Another idea to add on:

  • StartWith and EndWith enums, for faster string checks (e.g. in the case of RSA Private Keys, etc).

I think we can take a lot of inspiration from gitleaks.

Could even parse just this file: https://github.com/gitleaks/gitleaks/blob/master/config/gitleaks.toml

@dyc3 dyc3 merged commit a66e450 into biomejs:main Sep 9, 2024
12 checks passed
@SaadBazaz
Copy link
Contributor Author

SaadBazaz commented Sep 10, 2024

Seems like we can speed-up regex by enabling nightly build with SIMD. However, I am not aware of how that'll impact other modules.
rust-lang/regex#350

With faster regex, we can eventually support many more useful regexes and make biome a possible alternative to gitleaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-CLI Area: CLI A-Diagnostic Area: diagnostocis A-Linter Area: linter A-Project Area: project L-JavaScript Language: JavaScript and super languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants