-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NetworkPolicy Recommendation on Snowflake Backend #137
Conversation
fe283bf
to
de60b62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial high-level comments, let me know what you think:
- kubernetes.zip should not be checked-in. Can we version it (i.e., add the K8s version to the file name) and upload it to antrea.io?
- I am wondering if we really need the "create-udfs" command here. I think it will be easier if everything can be configured by the "onboard" command at first.
- I think the user should not have to provide
--udf-dir
and I think that the CLI binary should also be completely self-contained (i.e., not assume that the Github repository has been cloned). As a result, I believe that we should embed the UDF code into the antrea-sf binary (usinggo:embed
). At the end of "onboarding", the CLI tool will upload the code to the Snowflake stage and create the function object.
Thanks Antonin. |
If they modify a UDF (or if we modify a UDF across releases), the idea is to run |
8213ce4
to
8f45932
Compare
// stackName, stateBackendURL, secretsProviderURL, region, workdir are not provided here | ||
// because we only uses snowflake client in this command. | ||
mgr := infra.NewManager(logger, "", "", "", "", warehouseName, "", verbose) | ||
rows, err := mgr.RunUdf(ctx, query, databaseName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think you should find a way to avoid using the infra manager here. Does RunUdf
really need to be a method of the manager? You could use the Snowflake client directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I was thinking reuseing RunUdf
function in the future for other udf commands. Moved it under infra/udfs
package now.
snowflake/pkg/utils/utils.go
Outdated
) | ||
|
||
// Download a file from the given url to the current directory | ||
func DownloadFile(url string, filename string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function should use a Context
object. Maybe do some unification with
theia/snowflake/pkg/infra/manager.go
Line 92 in 5a15bb8
func downloadAndUntar(ctx context.Context, logger logr.Logger, url string, dir string) error { |
downloadAndUntar
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, merged it into downloadAndUntar
function
"time" | ||
) | ||
|
||
func ParseTimestamp(t string, now time.Time, defaultT ...time.Time) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at usage of this function, I am not seeing defaultT
being used anywhere. IMO, using a variadic function for this (passing an optional default value to the function) is also a bit strange. Something like this makes more sense IMO:
func ParseTimestamp(t string, now time.Time) (string, error) {
return ParseTimestampWithDefault(t, now, now)
}
func ParseTimestampWithDefault(t string, now time.Time, defaultT time.Time) (string, error) {
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out, I was copying this function from our gitlab repo. Remove the defaultT
argument for now since we don't have any usecases.
snowflake/pkg/infra/udfs.go
Outdated
@@ -0,0 +1,79 @@ | |||
// Copyright 2022 Antrea Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this needs to be under pkg/infra
at all. How about pkg/udfs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, moved it there.
snowflake/main.go
Outdated
|
||
// Embed the udfs directory here because go:embed doesn't support embeding in subpackages | ||
|
||
//go:embed udf/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we could rename the directory to udfs/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
snowflake/main.go
Outdated
|
||
func main() { | ||
infra.UdfFs = udfFs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still avoid doing something like this. It can be confusing to change the value of a global variable which is in a different package. Maybe check out what I did for database migrations as a reference: https://github.com/antrea-io/theia/tree/main/snowflake/database
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Antonin, I created udfs.go in a separate package like database migrations and also found some utils function like WriteMigrationsToDisk
could be shared here, changed it to WriteEmbedDirToDisk
and moved to utils
package.
snowflake/pkg/utils/utils.go
Outdated
"github.com/go-logr/logr" | ||
) | ||
|
||
func DownloadAndUntar(ctx context.Context, logger logr.Logger, url string, dir string, filename string, untar bool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to have 2 different functions: Download
and DownloadAndUntar
, and have the second one call the first one for its implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make senses, addressed.
2b32bc9
to
10e73e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small comments, otherwise LGTM
snowflake/pkg/snowflake/snowflake.go
Outdated
if !result { | ||
_, err := c.db.ExecContext(multi_statement_context, query) | ||
return nil, err | ||
} else { | ||
rows, err := c.db.QueryContext(multi_statement_context, query) | ||
return rows, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we could have 2 different functions for the 2 different cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, changed.
snowflake/pkg/snowflake/snowflake.go
Outdated
|
||
func (c *client) UseSchema(ctx context.Context, name string) error { | ||
query := fmt.Sprintf("USE SCHEMA %s", name) | ||
c.logger.Info("Snowflake query", "query", query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the verbosity values for the logs don't seem consistent (see V(2)
above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed.
snowflake/Makefile
Outdated
@@ -5,6 +5,7 @@ all: bin | |||
|
|||
.PHONY: bin | |||
bin: | |||
make -C udfs/udfs/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it doesn't make sense to have this in the bin
target
You should define a new target and add it as a requirement for the all
target so that it is built by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks for pointing out.
snowflake/pkg/infra/manager.go
Outdated
@@ -37,6 +36,8 @@ import ( | |||
|
|||
"antrea.io/theia/snowflake/database" | |||
sf "antrea.io/theia/snowflake/pkg/snowflake" | |||
utils "antrea.io/theia/snowflake/pkg/utils" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: renaming the package doesn't seem necessary here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed.
snowflake/pkg/utils/utils.go
Outdated
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Quan pointed out in the past that utils
was too generic for a package name. So I wonder if we could move these functions to pkg/utils/file/file.go
(package can then be imported as fileutils
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, changed.
snowflake/pkg/utils/utils.go
Outdated
return nil | ||
} | ||
|
||
func WriteEmbedDirToDisk(ctx context.Context, logger logr.Logger, fsys fs.FS, embedPath string, dest string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that there is nothing specific to embed in this function implementation, I suggest you rename it to writeFSDirToDisk
and rename embedPath
to fsysPath
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed.
In this commit, we add the NetworkPolicy Recommendation Application on the Snowflake backend. NetworkPolicy Recommendation is implemented as Snowflake UDFs and could be running on Snowflake warehouses. Users could start it using command policy-recommendation in theia-sf CLI tool. NetworkPolicy Recommendation UDFs are written in Python and stored under udf/ directory. Signed-off-by: Yongming Ding <dyongming@vmware.com>
Signed-off-by: Yongming Ding <dyongming@vmware.com>
Signed-off-by: Yongming Ding <dyongming@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you are done with code changes, please make sure you run the whole thing end-to-end before we merge
"time" | ||
) | ||
|
||
func ParseTimestamp(t string, now time.Time) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could add some table-driven unit tests for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, added.
Signed-off-by: Yongming Ding <dyongming@vmware.com>
Sure, I just tried e2e with Antrea Flow exporter & Aggregator locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
policyRecommendationCmd.Flags().Int("isolationMethod", 1, `Network isolation preference. Currently we have 3 options: | ||
1: Recommending allow ANP/ACNP policies, with default deny rules only on Pods which have an allow rule applied | ||
2: Recommending allow ANP/ACNP policies, with default deny rules for whole cluster | ||
3: Recommending allow K8s NetworkPolicies only`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use String instead of Int for isolationMethod to make it more readable as we do for policy-type in ClickHouse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, addressed.
fmt.Fprintf(&queryBuilder, `AND | ||
flowEndSeconds >= '%s' | ||
`, endTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong, but could you just check if this should be flowEndSeconds < '%s'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! You're right
snowflake/pkg/infra/manager.go
Outdated
@@ -488,3 +416,118 @@ func (m *Manager) Offboard(ctx context.Context) error { | |||
_, err := m.run(ctx, true) | |||
return err | |||
} | |||
|
|||
func createUdfs(ctx context.Context, logger logr.Logger, databaseName string, warehouseName string, workdir string) error { | |||
logger.Info("creating UDFs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about whether we have a convention on whether to capital the first letter for logger.Info
? I see both in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer capitaling the first letter, let me fix that.
Signed-off-by: Yongming Ding <dyongming@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/theia-test-e2e |
In this commit, we add the NetworkPolicy Recommendation Application on the Snowflake backend. NetworkPolicy Recommendation is implemented as Snowflake UDFs and could be running on Snowflake warehouses.
We add two commands in theia-sf CLI tool: create-udfs and policy-recommendation. create-udfs is used to upload and create UDFs on Snowflake database then user can call policy-recommendation command to run NetworkPolicy Recommendation application.
NetworkPolicy Recommendation UDFs are written in Python and stored under udf/ directory.
Signed-off-by: Yongming Ding dyongming@vmware.com