-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-82] Initialize dynamic table: projectsetting_latest
#87
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM! will defer to Phil for a final review
@jaymedina one thing we may want to consider is doing some processing on the array column for the "locations" but that's also not necessary.
@thomasyu888 I agree we could do some pre-processing and I wonder if we'd want that to be in separate dynamic tables like So something like:
But I think this is a larger conversation related to the medallion architecture and I could make a ticket to write up a design doc for it. cc @philerooski |
@jaymedina just adding in my 2cents. That's a good idea, but being that the latest tables are mostly the silver layer tables, that's where this kind of transform could happen. The difference being: we transform every row (ETL) when it's appended into raw snapshot tables, or transform just the N=1 (transform what you need) (ELT) Whether we actually do that transform also depends on how we will use the data in that column. If we won't use it as an array, it may not be worthwhile to transform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
re: pivoting out array values
Whether we actually do that transform also depends on how we will use the data in that column. If we won't use it as an array, it may not be worthwhile to transform.
I think this sentiment is spot on. Storage location settings are a relatively niche piece of project metadata and pivoting out arrays is a messy operation: each value in the array will become its own record, either in this table or a seperate table where id
is a foreign key. IMO it's best to reserve an operation like this for gold layer tables where we know we will need quick access to storage location setting values.
problem
About the data: The Project Setting object on Synapse is a project-based setting. The
projectsettingsnapshots
table contains records of all the states of a given project setting object. This includes when it was created, who it was created by, how it was last changed, and by whom, as well as other metadata like the setting type and what project this setting is for.The latest table for
projectsettingsnapshots
does not exist in Snowflake. We should introduce a dynamic table that will reflect the latest state of a given project setting configuration on Synapse (i.e. whether it has been created, updated, or deleted, and all metadata which reflects this change)solution
A new Version script is created to introduce the dynamic table
projectsetting_latest
testing
There are 2,282 rows in the final table; one for the latest state of each individual project setting configuration
From the snapshots table, we can confirm that there are 2,282 unique IDs
This query ensures that all IDs are accounted for
And this query ensures that the LATEST state is indeed the one that is reflected in the final table