Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-175] Convert node_latest table -> dynamic table #89

Merged
merged 9 commits into from
Dec 12, 2024

Conversation

jaymedina
Copy link
Contributor

@jaymedina jaymedina commented Dec 5, 2024

problem

Currently the node_latest table is a regular table that gets updated using tasks and streams. We can simplify our data warehouse by turning this into a dynamic table that updates itself without requiring tasks & streams.

solution

Following the instructions of this SOP:

  • Add a script to create a backup of the original node_latest table in case of issues
  • Add a script to drop the original node_latest table so we can create a dynamic table with the same name
  • Create node_latest dynamic table that introduces some post-processing of the RAW data
  • Add a script to delete the scheduled task + stream that were for the original node_latest table

testing

1. Comparing the results between the new and old node_latest tables

I found a discrepancy between the results from the new node_latest dynamic table query and the current node_latest table. There are 1,413,032 rows missing from the new node_latest dynamic table:

image

As suspected, the discrepancies are due to the added time window of 14 days in the dynamic table query:


image

2. Confirm there are no duplicate ids


image

3. Confirm that all non-deleted nodes are accounted for

For this test I created a temp table that shows the IDs missing from the latest table that exist in the snapshots table. From visual inspection, it looks like these IDs are missing either because their change_type is set to DELETE, or because their last snapshot was taken outside of the 2-week window, in which case they are misrepresentations of the current state of the nodes.


image

4. Confirm that the LATEST version of each node is represented in the new node_latest


image

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Awesome work here - I'll leave it to Phil for a final review. I'm excited about the transition to dynamic tables!

@jaymedina jaymedina marked this pull request as ready for review December 10, 2024 16:52
@jaymedina jaymedina requested a review from a team as a code owner December 10, 2024 16:52
@thomasyu888
Copy link
Member

thomasyu888 commented Dec 10, 2024

@jaymedina / @philerooski I wouldn't merge this just yet, because if it is merged then the highest V scripts would be recognized as 2.26 and skip over the work you're currently working to fix, Phil.

Done here: https://github.com/Sage-Bionetworks/snowflake/actions/runs/12266497862

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM! I'll leave it to Phil for the final review so he can learn about all the changes.

For testing, it's a bit nuanced since when we clone the databases, the cloned tasks are in an auto suspended state. We don't necessarily want to resume them, but just something we have to keep in mind.

@jaymedina
Copy link
Contributor Author

For testing, it's a bit nuanced

I'll add this to our CI/CD design doc so we can explore potential test cases.

@jaymedina jaymedina merged commit 6cfeb86 into dev Dec 12, 2024
2 of 3 checks passed
@jaymedina jaymedina deleted the snow-175-node-latest-dynamic-table branch December 12, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants