-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Snowflake backend for Theia #112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial reviews
to specify credentials that can be used by the AWS Go SDK. Either configure the | ||
~/.aws/credentials file or set the required environment variables. | ||
|
||
You can also export the `AWS_REGION` environment variable, set to your preferred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also export the `AWS_REGION` environment variable, set to your preferred | |
You can also export the `AWS_REGION` environment variable, and set it to your preferred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine I think, "set" is also the past participle of the "set" verb
snowflake/cmd/createKmsKey.go
Outdated
var createKmsKeyCmd = &cobra.Command{ | ||
Use: "create-kms-key", | ||
Short: "Create an AWS KMS key", | ||
Long: `This command creates a new KMS key in your AWS account. They key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long: `This command creates a new KMS key in your AWS account. They key | |
Long: `This command creates a new KMS key in your AWS account. This key |
snowflake/cmd/offboard.go
Outdated
offboardCmd.Flags().String("bucket-prefix", "antrea-flows-infra", "prefix to use to store infra state") | ||
offboardCmd.Flags().String("bucket-region", "", "region where infra bucket is defined; if omitted, we will try to get the region from AWS") | ||
offboardCmd.Flags().String("key-id", GetEnv("THEIA_SF_KMS_KEY_ID", ""), "Kms Key ID") | ||
offboardCmd.Flags().String("key-region", "", "Kms Key region") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offboardCmd.Flags().String("key-region", "", "Kms Key region") | |
offboardCmd.Flags().String("key-region", "", "Kms key region") |
to be consistent with elsewhere
snowflake/cmd/onboard.go
Outdated
onboardCmd.Flags().String("bucket-prefix", "antrea-flows-infra", "prefix to use to store infra state") | ||
onboardCmd.Flags().String("bucket-region", "", "region where infra bucket is defined; if omitted, we will try to get the region from AWS") | ||
onboardCmd.Flags().String("key-id", GetEnv("THEIA_SF_KMS_KEY_ID", ""), "Kms key ID") | ||
onboardCmd.Flags().String("key-region", "", "Kms Key region") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onboardCmd.Flags().String("key-region", "", "Kms Key region") | |
onboardCmd.Flags().String("key-region", "", "Kms key region") |
to be consistent with elsewhere
snowflake/cmd/receiveSqsMessage.go
Outdated
Snowflake account as expected. | ||
|
||
To receive a message without deleting it (message will remain in the queue and | ||
become available to consummers again after a short time interval): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
become available to consummers again after a short time interval): | |
become available to consumers again after a short time interval): |
if len(output.Messages) == 0 { | ||
return nil | ||
} | ||
message := output.Messages[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we could only see the top message using this command. Shall we have an option that could display all messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly even though this is not difficult to do, I am not sure this is needed. We can see if this is requested in the future.
for dashboard needs, can we add a few fields to the CREATE VIEW IF NOT EXISTS pods(
flowStartSeconds,
flowEndSeconds,
packetDeltaCount,
octetDeltaCount,
reversePacketDeltaCount,
reverseOctetDeltaCount,
sourcePodName,
sourcePodNamespace,
source,
destinationPodName,
destinationPodNamespace,
destination,
throughput,
reverseThroughput,
clusterUUID,
flowType,
timeInserted
) as SELECT
flowStartSeconds,
flowEndSeconds,
packetDeltaCount,
octetDeltaCount,
reversePacketDeltaCount,
reverseOctetDeltaCount,
sourcePodName,
sourcePodNamespace,
sourcePodNamespace || '/' || sourcePodName || ':' || SOURCETRANSPORTPORT,
destinationPodName,
destinationPodNamespace,
destinationPodNamespace || '/' || destinationPodName|| ':' || DESTINATIONTRANSPORTPORT,
throughput,
reverseThroughput,
flowtype,
timeInserted
FROM flows and add another CREATE VIEW IF NOT EXISTS policies
as SELECT
flowEndSeconds,
octetDeltaCount,
reverseOctetDeltaCount,
egressNetworkPolicyName,
egressNetworkPolicyNamespace,
egressNetworkPolicyRuleAction,
ingressNetworkPolicyName,
ingressNetworkPolicyNamespace,
ingressNetworkPolicyRuleAction,
sourcePodName,
sourcePodNamespace,
sourceTransportPort,
destinationIP,
destinationPodName,
destinationPodNamespace,
destinationTransportPort,
destinationServicePortName,
destinationServicePort,
throughput,
flowtype
FROM flows thanks |
@heanlan I need a clarification for one of the changes to the |
In ClickHouse dashboards, we use In ClickHouse, the schema of view does not concatenate any fields. We do the concatenation in the queries. Because we use the same view in pod-to-pod, pod-to-service, pod-to-external dashboards. The fields need to be concatenated are be different according to the dashboards. Currently in snowflake, I didn't build pod-to-service, pod-to-external dashboards, and I saw you already have created concatenated fields |
@heanlan if this is only useful for one dashboard and you can do it in the query, it may not make sense to concatenate the port in the view at the moment. I concatenate namespace + name in the view because it is useful for filtering IMO, independently of the dashboard. |
@heanlan could you review the SQL migrations changes
Could you also confirm that you don't need Let me know if you think we can remove |
This PR adds 4 Grafana dashboards with snowflake as the datasource: Home Dashboard, Flow Records Dashboard, Pod-to-Pod Flows Dashboard, and Network-Policy Flows Dashboard. These dashboards are basically the same with ClickHouse dashboards in their contents, differ in the query syntax. Currently, the snowflake dashboards are without the ad-hoc filters, as the datasource plugin does not offer the the support. We can revisit it later if we decide to work with Grafana in the long-term. In the deployment instruction, we suggest users deploy Grafana with docker, so they do not need to install Grafana locally, and it makes the custom plugins installation simple. This PR should be merged after antrea-io#112 and antrea-io#118. The documentation shall also be merged into the README.md introduced in antrea-io#112 Signed-off-by: heanlan <hanlan@vmware.com>
Sorry, I missed this one.
don't need timeInserted in policies
don't need timeInserted in pods |
3304f8f
to
aa20659
Compare
@heanlan thanks for confirming, I removed |
snowflake/database/migrations/000003_create_policies_view.up.sql
Outdated
Show resolved
Hide resolved
aa20659
to
f550077
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for adding comments incrementally. I believe it should be the last comment on the schema part.
f550077
to
dadddb7
Compare
no worries, it's my bad for having these typos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Snowflake can be used as the data store for Theia, and to run applications such as policy recommendation (using Snowflake virtual warehouses). Users need to bring their own Snowflake and AWS accounts. In this first commit, we add a new CLI tool (theia-sf), which can be used to create the necessary cloud resources in Snowflake and AWS. The CLI is built using Pulumi, an "infrastructure-as-code" tool, which is itself built using Terraform providers. No third-party tool needs to be installed directly by users, as theia-sf will install dependencies itself in a temporary directory. When running "theia-sf onboard", which will provision infrastructure, users need to: * configure AWS and Snowflake credentials * provide an S3 bucket to store infrastructure state * provide a KMS key to encrypt infrastructure state (notably secrets) The advantage of using Pulumi to provision cloud resources is the potential ability to support upgrades easily: resources can be created, updated or deleted as needed whenever "theia-sf onboard" runs. Pulumi is used to create a Snowflake database and schema. However, it is not used to create objects (tables, views, etc.) within that schema. For that we use a database migration tool. The rationale is that the Pulumi / Terraform provider for Snowflake doesn't support table schema updates very well, and our current solution is more flexible. Once again, this should enable us to support upgrades easily. Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
Signed-off-by: Antonin Bas <abas@vmware.com>
/theia-test-e2e |
Signed-off-by: Antonin Bas <abas@vmware.com>
/theia-test-e2e |
This PR adds 4 Grafana dashboards with snowflake as the datasource: Home Dashboard, Flow Records Dashboard, Pod-to-Pod Flows Dashboard, and Network-Policy Flows Dashboard. These dashboards are basically the same with ClickHouse dashboards in their contents, differ in the query syntax. Currently, the snowflake dashboards are without the ad-hoc filters, as the datasource plugin does not offer the the support. We can revisit it later if we decide to work with Grafana in the long-term. In the deployment instruction, we suggest users deploy Grafana with docker, so they do not need to install Grafana locally, and it makes the custom plugins installation simple. This PR should be merged after antrea-io#112 and antrea-io#118. The documentation shall also be merged into the README.md introduced in antrea-io#112 Signed-off-by: heanlan <hanlan@vmware.com>
This PR adds 4 Grafana dashboards with snowflake as the datasource: Home Dashboard, Flow Records Dashboard, Pod-to-Pod Flows Dashboard, and Network-Policy Flows Dashboard. These dashboards are basically the same with ClickHouse dashboards in their contents, differ in the query syntax. Currently, the snowflake dashboards are without the ad-hoc filters, as the datasource plugin does not offer the the support. We can revisit it later if we decide to work with Grafana in the long-term. In the deployment instruction, we suggest users deploy Grafana with docker, so they do not need to install Grafana locally, and it makes the custom plugins installation simple. This PR should be merged after antrea-io#112 and antrea-io#118. The documentation shall also be merged into the README.md introduced in antrea-io#112 Signed-off-by: heanlan <hanlan@vmware.com>
This PR adds 4 Grafana dashboards with snowflake as the datasource: Home Dashboard, Flow Records Dashboard, Pod-to-Pod Flows Dashboard, and Network-Policy Flows Dashboard. These dashboards are basically the same with ClickHouse dashboards in their contents, differ in the query syntax. Currently, the snowflake dashboards are without the ad-hoc filters, as the datasource plugin does not offer the the support. We can revisit it later if we decide to work with Grafana in the long-term. In the deployment instruction, we suggest users deploy Grafana with docker, so they do not need to install Grafana locally, and it makes the custom plugins installation simple. This PR should be merged after #112 and #118. The documentation shall also be merged into the README.md introduced in #112 Signed-off-by: heanlan <hanlan@vmware.com>
Snowflake can be used as the data store for Theia, and to run applications such as policy recommendation (using Snowflake virtual warehouses). Users need to bring their own Snowflake and AWS accounts.
In this first commit, we add a new CLI tool (theia-sf), which can be used to create the necessary cloud resources in Snowflake and AWS. The CLI is built using Pulumi, an "infrastructure-as-code" tool, which is itself built using Terraform providers. No third-party tool needs to be installed directly by users, as theia-sf will install dependencies itself in a temporary directory.
When running "theia-sf onboard", which will provision infrastructure, users need to:
The advantage of using Pulumi to provision cloud resources is the potential ability to support upgrades easily: resources can be created, updated or deleted as needed whenever "theia-sf onboard" runs.
Pulumi is used to create a Snowflake database and schema. However, it is not used to create objects (tables, views, etc.) within that schema. For that we use a database migration tool. The rationale is that the Pulumi / Terraform provider for Snowflake doesn't support table schema updates very well, and our current solution is more flexible. Once again, this should enable us to support upgrades easily.
Signed-off-by: Antonin Bas abas@vmware.com