-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-253] update table comments for userprofile_latest, file_latest, and teammember_latest dynamic tables #158
Conversation
…mber_latest dynamic tables
Add line between queries.
Reorder the sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity and conciseness, I think we can remove some implementation details around how the most recent snapshot was obtained, since:
- It's not usually relevant to analysts
- Developers can look at the table definition to get a more precise definition of how the latest snapshot was obtained.
- We can reframe some of this information in a more useful way, e.g., rather than describing which columns we used to deduplicate the table, we can state that these columns are the table index.
Particularly for data types which capture delete events, it doesn't matter whether these are the most recent snapshots from the past 30 or 300 days, and actually I think specifying this can be confusing, since we go on to explain that within this more or less arbitrary time period we only keep the most recent snapshot. Here's how I think we could streamline the comments:
-- The information about NULL values would be more useful elsewhere (like as a column comment)
ALTER DYNAMIC TABLE USERPROFILE_LATEST SET COMMENT = 'This dynamic table, indexed by the ID
column, contains the most recent snapshot of user profiles from the past 14 days. The latest snapshots snapshot are is determined by the most recent CHANGE_TIMESTAMP and SNAPSHOT_TIMESTAMP. If multiple snapshots exist within this period, only the latest snapshot for each ID is retained. Since snapshotting does not capture DELETE changes events, records may still be retained if a user was deleted within the past 14 days. For easier querying, NULL values in the LOCATION, COMPANY, POSITION, and INDUSTRY columns are replaced with empty strings.';
-- updated "files" to "file handles" since this is a common point of confusion among analysts
ALTER DYNAMIC TABLE FILE_LATEST SET COMMENT = 'This dynamic table, indexed by the ID
column, contains the most recent snapshot of file handles from the past 30 days. The latest snapshots are determined by the most recent CHANGE_TIMESTAMP and SNAPSHOT_TIMESTAMP. If multiple snapshots exist within this period, only the latest snapshot for each ID is retained.';
ALTER DYNAMIC TABLE TEAMMEMBER_LATEST SET COMMENT = 'This dynamic table, indexed by the TEAM_ID
and MEMBER_ID
columns, contains the most recent snapshot of team members from the past 14 days. The latest snapshots are determined by the most recent CHANGE_TIMESTAMP and SNAPSHOT_TIMESTAMP. If multiple snapshots exist within this period, only the latest snapshot for each TEAM_ID and MEMBER_ID pair is retained. Since snapshotting does not capture DELETE changes events, records may still be retained if a team member was deleted within the past 14 days.';
lmk what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this work 🔥 Deferring to Phil for final review
I see your point, but I think specifying the time window is important for clarity. While it's true that the logic applies regardless of whether we're looking at 30 or 300 days, the time window we define is not arbitrary—it determines the retention period for snapshots. By explicitly stating it, we make it clear how long records might persist, which is especially relevant when explaining why deleted records could still appear. Without this context, it might be unclear how long outdated snapshots are retained, which could lead to confusion rather than prevent it. |
I think your objection here is around the non-capture of delete events for some data types? That's absolutely relevant -- we should keep that information as I did in my suggestions:
But what's less relevant is including that time-window information about snapshots more generally. Actually, by including that information:
You have to go on to clarify that the table doesn't contain the most recent snapshots of user profiles from the past 14 days, but actually just the most recent snapshot.
It's a more confusing and verbose way of expressing the same thing:
|
The time window thing is for file_latest table. Thanks for clarifying. I forgot snapshots were taken independent of changes. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work and discussion!
Problem:
As requested in SNOW-177, we’d like to update table comments to indicate how we deduplicate and transform data in dynamic tables.
Solution:
Altered table comments for these tables.