Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add external_data function to Gator Test command #2789

Closed

Conversation

fardin01
Copy link

@fardin01 fardin01 commented May 24, 2023

Signed-off-by: Fardin Khanjani fardin.khanjani@tradeshift.com

What this PR does / why we need it:
gator test should be able to handle external data providers.

gator test returns undefined function external_data if external_data is being used in a ConstraintTemplate. This means that policies that query an external data source cannot be tested except for manual tests against a live cluster.

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Fixes #2659

Special notes for your reviewer:

@fardin01
Copy link
Author

@maxsmythe @sozercan Can I please get some initial feedback here? I'll add tests afterward.

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 27.77% and project coverage change: -0.16 ⚠️

Comparison is base (9da7226) 53.56% compared to head (fdffa22) 53.41%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2789      +/-   ##
==========================================
- Coverage   53.56%   53.41%   -0.16%     
==========================================
  Files         128      128              
  Lines       11411    11462      +51     
==========================================
+ Hits         6112     6122      +10     
- Misses       4822     4861      +39     
- Partials      477      479       +2     
Flag Coverage Δ
unittests 53.41% <27.77%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/gator/reader/filereader.go 0.00% <0.00%> (ø)
pkg/gator/test/test.go 55.41% <30.95%> (-8.94%) ⬇️
cmd/gator/test/test.go 32.71% <50.00%> (+1.28%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@acpana acpana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR! Excited to see this take shape. Before acting on any feedback, please wait on max/ sertac to see what other comments they may have

Comment on lines 77 to 78
Cmd.Flags().BoolVarP(&flagEnableExternalData, "external-data", "e", false, "enable external_data function in Constraint Templates.")
Cmd.Flags().StringArrayVarP(&flagExternalDataProviders, "external-data-providers", "", []string{}, "a file containing External Data Provider. Can be specified multiple times.")
Copy link
Contributor

@acpana acpana May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the existence of external-data-providers implies that external-data is true so we could get away with just using the latter flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, probably don't need two options in test.Opts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using a different flag to source providers vs. the rest of the cluster config? Avoiding letting untrusted config reach out to arbitrary HTTP endpoints?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason. :) I agree with Alex's suggestion that the existence of external-data-providers implies that external-data is true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think security is a good consideration as far as separating out the providers folder (at least potentially). And, if we do have a providers folder, that can probably take the place of an enabled/disabled flag. Maybe users who have a hard requirement of "absolutely no reaching out to external services" might take comfort in an explicit flag though.

@sozercan thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sozercan Kind reminder. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on having an explicit flag to enable/disable external data. though I prefer something like enable-external-data

@fardin01
Copy link
Author

@maxsmythe @sozercan A kind reminder. :)

Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in review

Comment on lines 77 to 78
Cmd.Flags().BoolVarP(&flagEnableExternalData, "external-data", "e", false, "enable external_data function in Constraint Templates.")
Cmd.Flags().StringArrayVarP(&flagExternalDataProviders, "external-data-providers", "", []string{}, "a file containing External Data Provider. Can be specified multiple times.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using a different flag to source providers vs. the rest of the cluster config? Avoiding letting untrusted config reach out to arbitrary HTTP endpoints?

// this has to happen before creating the client, because rego.AddExternalDataProviderCache arg to the driver must have
// a ProviderCache already set up
providerCache := frameworksexternaldata.NewCache()
for _, obj := range objs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If having a separate flag is for security purposes, we'd also need to load in the data separately, currently providers defined in any file would be ingested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should figure out our stance vis-a-vis security in the other comment (link below). Then this comment will either be irrelevant, or very relevant.

#2789 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxsmythe Could you please give me some pointers on how to implement this?

Should ReadSources() function contain something like

s, err = readExternalDataSourceFiles(externalDataProviderFiles)
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to load the external data provider config from its dedicated directory, then we should change this line to skip over external data provider config, and then either

  1. Reuse readFiles function like: s, err = readFiles(externalDataProviderFiles)
  2. Create a new function like: readExternalDataSourceFiles(externalDataProviderFiles).

and then append the result to sources. Am I on the right path?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxsmythe @ritazh Please take another look. I'll add the tests afterward.

@sozercan
Copy link
Member

Sorry for the delay. +1 on @acpana and @maxsmythe's comments.

@fardin01
Copy link
Author

In terms of unit tests, what do we do about the external_data() function request and response?

@sozercan
Copy link
Member

sozercan commented Jun 13, 2023

@fardin01

for go http unit tests, we might be able to use https://pkg.go.dev/net/http/httptest

for mock rego tests, see #2826

@fardin01 fardin01 force-pushed the external_data_gator branch 3 times, most recently from 83e9946 to e322225 Compare June 21, 2023 18:23
@acpana
Copy link
Contributor

acpana commented Jul 17, 2023

@fardin01 thanks again your PR here and for your contributions.

I'm curious if you still had the bandwidth to follow up here.

I think there's still the issue of figuring out the right semantics for the flags here @maxsmythe @sozercan and testing IIUC.

@fardin01
Copy link
Author

@acpana No problem! I do have the bandwidth, just been waiting for input before I spend time on it.

We are actually going to go live soon with Gator CLI built off of this PR (to give our developers a tool to verify their changes before deploying), so we are very eager for this PR to be merged and released. 😃

@maxsmythe
Copy link
Contributor

To be clear, I'd prefer having a separate flag for specifying provider configs (for security purposes), and provider configs should be loaded from a different path from the rest of the code. This is to make it possible to avoid loading malicious configurations.

@ritazh
Copy link
Member

ritazh commented Jul 19, 2023

Let's also make sure to add the new flag(s) to docs.

@fardin01 fardin01 force-pushed the external_data_gator branch 3 times, most recently from 5bd79e2 to 4e07859 Compare July 27, 2023 18:55
Signed-off-by: Fardin Khanjani <fardin.khanjani@tradeshift.com>
@ritazh
Copy link
Member

ritazh commented Aug 16, 2023

@fardin01 Let us know if this is ready for review!

@fardin01
Copy link
Author

@fardin01 Let us know if this is ready for review!

@ritazh Unfortunately I will not be working on this anymore. 😞

@stale
Copy link

stale bot commented Oct 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 15, 2023
@stale stale bot closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

undefined function external_data
6 participants