-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Activity dashboard] Pull PyPI downloads data nightly #664
Comments
Below is the code to generate Snowflake data; as part of this issue, I will need to inject this SQL code directly in the backend codebase.
|
1: Add workspace variables on Terraform: happy-sci-imaging->Workspaces->Variables. We want to utilize this process to add sensitive Snowflake credentials such as Note: No additional lambda will be created. This PR will be captured in the |
@klai95 One minor update. Change... This classifies any GCP installs as CI usage. |
I have added @potating-potato will help apply the change on https://github.com/chanzuckerberg/sci-imaging-infra/blob/28ecdd11f35edc431bba52420b3b9220b1e9dd82/terraform/modules/happy-napari-hub/main.tf#L31-L60 in the |
Pinpointed the necessary files ( |
Based on discussions from our code pairing session, @klai95 and I decided to use secret manager to store the credentials for snowflake. The reason for this was to increase the security for the credentials, by preventing the credentials from being readable as an environment variable of the lambda. This will be achieved by having the lambda fetch the credentials from the secret manager before making the snowflake query. Steps we have identified to implement this: 1. Create a new secret to allow for storing credentialsCurrently we are planning on creating the secret manually with
We could also explore adding this using terraform in the future, as we could leverage the terraform variables for it. This would allow for having a single place to manage secrets. 2. Manually add the secret name to the existing environment configThe environment config is currently being maintained as
This is based on the assumption that 3. Accessing the secret's details in terraformWe can access the the secret's details from the env-config secret that is already being fetched and decoded. We can assign it to the locals in here
4. Updating the backend lambda access policyTo allow the lambda to be able to fetch the secret values from the secret manager, we have to update the existing execution role to add policies allowing get secret value from the lambda. The current execution role definition can be found here. The proposed policy addition:
Reference to IAM policy documentation: 5. Add the secret name to the lambda environment variablesTo prevent hard coding the secret name and to allow for environment specific secrets we could use the environment variables in the lambda. We have to update the terraform here to add that. We can use 6. Fetching credentials from secrets manager in codeBefore making the call to snowflake as referenced here, we can call the secrets manager to fetch the username and password. Reference to api: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/secretsmanager.html#SecretsManager.Client.get_secret_value We could have the session_manager calls be a separated class, to allow for it to be used in other use-cases in the future. Proposed method signature:
|
@manasaV3 It's definitely worth verifying, but I don't believe those are being maintained manually. I believe all values are set in Terraform once and then generated in AWS. Would it be possible to follow the current process of settings credentials via Terraform, and then proposing a change as a separate issue? I'm hesitant to have different variables be set in different places, and want to keep this ticket as narrowly scoped as possible. Our infra team can also help with setting up the variables according to our current process if needed. |
This is good to know. My reasoning for suggesting the usage of the secret manager in code over having the terraform was because it didn't feel like the best practice to have the db credentials easily accessible. But, I do see your point on having different variables being sourced from different sources. Also, not having clarity on the sourcing of the env-config secret might take a some more time to unblock. P.S. As we are going the route of having the credentials surface in lambda env variables, we should ensure that the snowflake user is specific to our team's workflows, and having its password rotated at a later point wouldn't impact any other workflows. |
Broke out the credentials piece into its own issue - #695 - so that we can focus specifically on the python/cron job piece in this ticket! |
In #638, we introduced API endpoints that connect to PyPI downloads data. As part of that work, we did a one-time pull of data from Snowflake. Now, let's make that a recurring cron job that runs once every 24 hours.
Additional details
Earlier notes from Kevin
Snowflake
andS3
in our existing infrastructure that enables for automatic updates. Further more, additional data sources can be integrated seamlessly as well should we ever need to pull from new external sources.The text was updated successfully, but these errors were encountered: