-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-175] Convert node_latest
table -> dynamic table
#89
Conversation
synapse_data_warehouse/synapse/dynamic_tables/V2.26.2__node_latest.sql
Outdated
Show resolved
Hide resolved
synapse_data_warehouse/synapse/dynamic_tables/V2.26.2__node_latest.sql
Outdated
Show resolved
Hide resolved
synapse_data_warehouse/synapse_raw/tasks/V2.26.4__delete_node_latest_tasks.sql
Outdated
Show resolved
Hide resolved
synapse_data_warehouse/synapse/dynamic_tables/V2.26.2__node_latest.sql
Outdated
Show resolved
Hide resolved
synapse_data_warehouse/synapse/dynamic_tables/V2.26.2__node_latest.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Awesome work here - I'll leave it to Phil for a final review. I'm excited about the transition to dynamic tables!
synapse_data_warehouse/synapse_raw/tasks/V2.26.1__delete_node_latest_tasks.sql
Show resolved
Hide resolved
@jaymedina / @philerooski I wouldn't merge this just yet, because if it is merged then the highest V scripts would be recognized as 2.26 and skip over the work you're currently working to fix, Phil. Done here: https://github.com/Sage-Bionetworks/snowflake/actions/runs/12266497862 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM! I'll leave it to Phil for the final review so he can learn about all the changes.
For testing, it's a bit nuanced since when we clone the databases, the cloned tasks are in an auto suspended state. We don't necessarily want to resume them, but just something we have to keep in mind.
I'll add this to our CI/CD design doc so we can explore potential test cases. |
problem
Currently the
node_latest
table is a regular table that gets updated using tasks and streams. We can simplify our data warehouse by turning this into a dynamic table that updates itself without requiring tasks & streams.solution
Following the instructions of this SOP:
node_latest
table in case of issuesnode_latest
table so we can create a dynamic table with the same namenode_latest
dynamic table that introduces some post-processing of the RAW datanode_latest
tabletesting
1. Comparing the results between the new and old
node_latest
tablesI found a discrepancy between the results from the new
node_latest
dynamic table query and the currentnode_latest
table. There are 1,413,032 rows missing from the newnode_latest
dynamic table:As suspected, the discrepancies are due to the added time window of 14 days in the dynamic table query:
✅

2. Confirm there are no duplicate
id
s✅

3. Confirm that all non-deleted nodes are accounted for
For this test I created a temp table that shows the IDs missing from the latest table that exist in the snapshots table. From visual inspection, it looks like these IDs are missing either because their
change_type
is set to DELETE, or because their last snapshot was taken outside of the 2-week window, in which case they are misrepresentations of the current state of the nodes.✅

4. Confirm that the LATEST version of each node is represented in the new
node_latest
✅
