Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-82] Initialize dynamic table: projectsetting_latest #87

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

jaymedina
Copy link
Contributor

@jaymedina jaymedina commented Nov 20, 2024

problem

About the data: The Project Setting object on Synapse is a project-based setting. The projectsettingsnapshots table contains records of all the states of a given project setting object. This includes when it was created, who it was created by, how it was last changed, and by whom, as well as other metadata like the setting type and what project this setting is for.

The latest table for projectsettingsnapshots does not exist in Snowflake. We should introduce a dynamic table that will reflect the latest state of a given project setting configuration on Synapse (i.e. whether it has been created, updated, or deleted, and all metadata which reflects this change)

solution

A new Version script is created to introduce the dynamic table projectsetting_latest

testing

There are 2,282 rows in the final table; one for the latest state of each individual project setting configuration

image

From the snapshots table, we can confirm that there are 2,282 unique IDs

image

This query ensures that all IDs are accounted for

image

And this query ensures that the LATEST state is indeed the one that is reflected in the final table

image

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@jaymedina jaymedina marked this pull request as ready for review November 20, 2024 20:20
@jaymedina jaymedina requested a review from a team as a code owner November 20, 2024 20:20
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM! will defer to Phil for a final review

@jaymedina one thing we may want to consider is doing some processing on the array column for the "locations" but that's also not necessary.

@jaymedina
Copy link
Contributor Author

jaymedina commented Nov 21, 2024

@thomasyu888 I agree we could do some pre-processing and I wonder if we'd want that to be in separate dynamic tables like projectsetting_prepro, which would be the cleaned up version of the snapshots tables, and what we would derive the latest tables from.

So something like:

snapshots -> prepro -> latest

But I think this is a larger conversation related to the medallion architecture and I could make a ticket to write up a design doc for it.

cc @philerooski

@thomasyu888
Copy link
Member

@jaymedina just adding in my 2cents. That's a good idea, but being that the latest tables are mostly the silver layer tables, that's where this kind of transform could happen. The difference being: we transform every row (ETL) when it's appended into raw snapshot tables, or transform just the N=1 (transform what you need) (ELT)

Whether we actually do that transform also depends on how we will use the data in that column. If we won't use it as an array, it may not be worthwhile to transform.

Copy link
Collaborator

@philerooski philerooski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

re: pivoting out array values

Whether we actually do that transform also depends on how we will use the data in that column. If we won't use it as an array, it may not be worthwhile to transform.

I think this sentiment is spot on. Storage location settings are a relatively niche piece of project metadata and pivoting out arrays is a messy operation: each value in the array will become its own record, either in this table or a seperate table where id is a foreign key. IMO it's best to reserve an operation like this for gold layer tables where we know we will need quick access to storage location setting values.

@jaymedina jaymedina merged commit 775492b into dev Nov 22, 2024
2 of 3 checks passed
@jaymedina jaymedina deleted the snow-82-projectsettings_latest branch November 22, 2024 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants