-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-147] Filter out deleted ACLs from ACL_LATEST
#139
Conversation
* convert file latest to dynamic table
* Create data warehouse analyst database roles * grant read privileges to data warehouse analyst roles * grant future read privileges to analyst roles
* add filter to file_latest and add comments * Update V2.38.0__reintroduce_file_latest_dynamic_table.sql Add NOT IS_PREVIEW filter * Update V2.38.0__reintroduce_file_latest_dynamic_table.sql move not is_preview filter
* Add Bishoy to users.sql * Update grants.sql
* Create proxy admin database role * Grant proxy admin role ownership of *ALL_ADMIN roles * Transfer ownership of current and future internamespace objects * address PR comments * bump version to avoid conflict * grant execute managed task privilege to proxy admin
|
Marking this as RFR pending conversation here. |
FROM DATABASE ROLE SYNAPSE_DATA_WAREHOUSE_DEV.SYNAPSE_STAGE_READ | ||
FROM DATABASE ROLE SYNAPSE_DATA_WAREHOUSE_DEV.SYNAPSE_STAGE_READ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like commits unrelated to your work are now showing up in the diff :/ Is this what Phil was eluding to in the discussion from #138 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, just saw this, yes this is related to that discussion. Some duplicate commits from dev
were dropped as Phil was cleaning up the branches, and it's showing up as a new commit being introduced to dev
from here. I made a new branch just to start from a clean slate.
CREATE OR REPLACE DYNAMIC TABLE ACL_LATEST | ||
TARGET_LAG = '1 day' | ||
WAREHOUSE = compute_xsmall | ||
AS | ||
WITH dedup_acl AS ( | ||
SELECT | ||
*, | ||
parse_json(resource_access) as acl, | ||
FROM {{database_name}}.SYNAPSE_RAW.ACLSNAPSHOTS --noqa: TMP | ||
WHERE | ||
SNAPSHOT_DATE >= CURRENT_TIMESTAMP - INTERVAL '14 days' | ||
QUALIFY | ||
ROW_NUMBER() OVER ( | ||
PARTITION BY OWNER_ID | ||
ORDER BY CHANGE_TIMESTAMP DESC, SNAPSHOT_TIMESTAMP DESC | ||
) = 1 | ||
), | ||
dedup_acl_expanded AS ( | ||
SELECT | ||
CHANGE_TIMESTAMP, | ||
CHANGE_TYPE, | ||
CREATED_ON, | ||
OWNER_ID, | ||
OWNER_TYPE, | ||
SNAPSHOT_DATE, | ||
SNAPSHOT_TIMESTAMP, | ||
COALESCE( | ||
array_sort(value:"accesstype"::variant), | ||
array_sort(value:"accessType"::variant), | ||
array_sort(value:"accesstype#1"::variant), | ||
array_sort(value:"accesstype#2"::variant), | ||
array_sort(value:"accesstype#3"::variant) | ||
) AS access_type, | ||
COALESCE( | ||
value:"principalId"::number, | ||
value:"principalid"::number, | ||
value:"principalid#1"::number | ||
) AS principal_id | ||
FROM | ||
dedup_acl, | ||
LATERAL FLATTEN(acl, outer=>TRUE) | ||
) | ||
SELECT | ||
* | ||
FROM | ||
dedup_acl_expanded; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of this query makes sense! I was able to run it and I saw the results that I was expecting.
Closing this due to too many merge conflicts between this branch and the new |
problem
aclsnapshots
does not capture deleted ACLs, and there was an issue in the way the latest ACLs were retrieved. Both are addressed in the solution below.solution
ACL_LATEST
dynamic tableACL_LATEST
dynamic table, refactoredsnapshot_date
window to filter out for only the latest ACLs, which should minimize the risk of including deleted ACLsOriginal Process
Original Data Structure
In this example, principal ID 200 gets
READ
privileges removed in 2025.Data Structure (After Unpacking & Deduplicating)
Notice how unpacking before deduplicating, and defining a duplicate as a repeated entry of
owner_id
access_type
andprinicpal_id
, lets the ACL with pid 200 sneak into the final latest table, even though their permissions were removed in 1-1-2025.New Process
Data Structure (After Deduplicating & Unpacking)
When we first deduplicate only based on the
owner_id
, and then unpack theaccess_type
andprincipal_id
, only the ACL from 1-1-2025 is kept in the latest table.testing
For testing, I created a temporary table for `ACL_LATEST` that I ran all the queries against.
Next, I queried for the discrepancies between the temporary table and the current `acl_latest`
Lastly, I would arbitrarily select entities (`owner_id`) from the previous query and compare the ACL results between the new solution table and the current ACL_LATEST table, against the entity itself on Synapse.
Here are some example results that confirm the results from `temp_acl_latest` are more accurate
acl_latest
new sol:

original sol has 3 ACLs that were removed:

Leveraging NODE_LATEST/TEAM_LATEST for further verification
I looked to see what nodes might be missing from the new solution, and found that all the ones listed are nodes that don't exist anymore (a separate problem for
node_latest
).Likeswise, I did the same for teams and saw that MOST of the missing IDs are because they don't exist in the original
aclsnapshots
table OR because of the 14 day cutoff in our new solution (see below for the ones that ARE in aclsnapshots, but don't make it through the 14 day cutoff):