-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-29] Add support for row-level security #8699
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8699 +/- ##
=======================================
Coverage 59.13% 59.13%
=======================================
Files 372 372
Lines 11938 11938
Branches 2925 2925
=======================================
Hits 7059 7059
Misses 4699 4699
Partials 180 180 Continue to review full report at Codecov.
|
Great initiative! I looked at the PR for this SIP and a few things came to mind: To address the issue you raise in the docs where belonging to two departments results in zero rows ( {
"dept": "dept_id = 'Finance'"
} "Risk" role: {
"dept": "dept_id = 'Risk'"
} "30 days" role: {
"history": "report_date >= current_timestamp() - 30"
} "90 days" role: {
"history": "report_date >= current_timestamp() - 90"
} Now, if a user belonged to all four groups, this would result in the following WHERE clause: ((dept_id = 'Finance') OR (dept_id = 'Risk')) AND
((report_date >= current_timestamp() - 30) OR (report_date >= current_timestamp() - 90)) In this case the user would see 90 days of history, and also all rows for Finance AND Risk. Also, I think it would be good to be able to define a default WHERE clause if a user doesn't belong to any of the specified groups. |
Hey - thanks for looking it over! I wanted to point that out in the docs for clarity, I think everything you've mentioned can be mostly accomplished as-is, by taking a slightly
Which leads me to realize that there was a bug in my initial implementation - filters are now
My worry is that we'd be increasing complexity, in the code, as well as in ease-of-understanding on the I would like to eventually add (probably as a separate SIP?) arbitrary user attributes that |
I think it is important to support belonging to multiple roles early on. Think AD/LDAP in a corporate setting; not uncommon to belong to hundreds of groups. Regarding implementation, I would propose just adding a column "role_based_filters" or similar to the tables table with the metadata: {
"dept": {
"default": "false",
"roles": {
"finance": "dept_id = 1",
"risk": "dept_id = 2"
}
},
"duration": {
"default": "report_date >= current_timestamp() - 1",
"roles": {
"finance": "report_date >= current_timestamp() - 30"
}
}
} In this example, users that don't belong to any group would get a WHERE clause that returns zero rows due to the "false" clause ( One could later add the same column to the charts table, making it possible to introduce the same functionality on a per chart basis. With regards to the filter statements, I would propose using the same filter format that's currently used for |
@villebro I've been looking through your suggestions and I believe that there are a number of potential bugs / unintended consequences:
For someone that has Role A and Role B, ORing together at a key-level would give the user access to not only reports on Freedonia Apple exports and Ruritania Orange exports (as you might expect) but also Ruritania Apple exports and Freedonia Orange exports (which may not be expected/desired in a lot of use cases).
In answer to the above, I would suggest one of the following two approaches: A. Accept the pull request as-is, continuing to force users to explicitly define the permissions for any intersection cases (since there aren't any defaults that will work in all scenarios and that would lead to predictable behaviour for the user) B. Modify the current pull request to allow for roles to be combined by ORing together the entire clause in brackets, e.g.: With a preference for A. |
Thanks for the feedback, very good to have an exhaustive discussion prior to committing to any approach. I think there might be a misunderstanding about how the ORing approach should work. The basic idea is this:
In my example, the following WHERE clauses would be generated:
For your example case, providing that these filter groups were made with the same key (as I understand they should), the following WHERE clause would be generated: In the case of having a default duration of 30 days and specifying a more restrictive filter group for 1 day, the default would not be applied, i.e. the user would only see one day's worth of data. Of course, if the user belonged to two restrictive groups, 15 days and 1 days, the more permissive role of 15 days would in practice apply. However, this seems like a logical error in how the roles are assigned to users. The proposal to start by rolling out the backend functionality was merely a proposal to keep the PRs as small as possible and easier to review/develop. However, I'm sure they can be done together, assuming the person working on the PR is proficient in both the frontend and backend aspects of the codebase. |
I think that I'm following. So in your framework, my example would look something like:
I believe that functionally this is very similar to the current PR, with the following difference:
Is that accurate? How would you envision making this available to end-users in the UI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not too far from where this should be, but I'm thinking there are some important things to consider here. I think the many-to-many to role is important.
Another point is who can edit these rules and making sure security works well out of the box. Only admins should be able to edit these...
@justin-barton yes, precisely like that. I refined my original proposal somewhat, and concluded that the default value should be present in the OR clause. For instance, if you would want to give access to everyone for pear sales in Megalomania, the key-value pair My main motivation here is to ensure we can construct arbitrarily complex role based filters, and after this last amendment I believe this is pretty solid. So the basic premise is:
which I feel should be a pretty good trade-off between complexity and versatility. @mistercrunch @dpgaspar do you feel this should be done using the FAB model, or as a column under table? I was thinking one could create a React component for editing the groups/roles, that in turn leverages the existing |
@mistercrunch thanks! RLS filters should now be many-to-many to roles. I've moved the logic to Here's a screenshot of the updated RLS filters UI supporting multiple roles. I agree completely about including user attributes - that'll be such a neat and useful feature. |
Thanks @altef for quickly addressing the last change requests and your patience with the prolonged review process! |
:returns: A list of IDs. | ||
""" | ||
ids = [f.id for f in self.get_rls_filters(table)] | ||
ids.sort() # Combinations rather than permutations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems these IDs are sorted to satisfy the cache consistency. Rather than storing these as an ordered list why not return these as a set?
:rtype: List[str] | ||
""" | ||
return [ | ||
text("({})".format(template_processor.process_template(f.clause))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use an f
string here as opposed to .format()
?
Hi, I want to post our current processing ideas, but there are still two points that have not been realized. |
@altef Can U give some advice, the difference between the two treatment methods, as well as the feasibility, expectation. |
@altef I have a few questions: |
Hi @AaronCH5, I'm not completely certain what you're asking, but one difference is that in this pull request (which has been merged) instead of specifying a column and a value you specify a clause that will be added to the query. So in your example above where you specify the column name Since you can write whatever you want in there, this allows you to go pretty deep down the filter hole. You can use multiple columns, or reference other tables as necessary. For example, it would allow you to move complex permissions into its own table should you so desire:
The actual processing code where the filter clause gets added to the query is in https://github.com/apache/incubator-superset/blob/master/superset/connectors/sqla/models.py#L867 The cache key code in viz.py you referenced (https://github.com/apache/incubator-superset/blob/master/superset/viz.py#L394) adds the row level security IDs to the cache key, in order to differentiate cached query results for queries that would otherwise be the same but for different RLS rules. |
Hi @altef . First of all, thank you for your prompt answer. I can understand that your processing method has a higher degree of freedom and is more convenient; but I have not understood what you said 【in order to differentiate cached query results for queries that would otherwise be the same but for different RLS rules】. I tried your pull code in the local environment before, but found that filtering is not effective. In addition, I also want to know which version of the transformation you are based on, because I find your individual changes The file does not exist(eg:superset/app.py) in the version I am using (0.28.1).
|
Hi, Grouping works for me, using 0.36.0rc1 Might be worth mentioning JINJA scripting works as well! Thanks for this feature altef! |
@durchgedreht As far as I know they should be applied on-the-fly, and shouldn't require a re-login. Maybe take that with a grain of salt though; I don't currently have a Superset environment running so I can't test it to verify. Also, thanks! :D |
@AaronCH5 It looks from a few posts past that you're using Druid. The filtering occurs in |
@altef Thank you very much for your confusion, I will try again. Thanks |
Hi All, We are looking for a solution in which we need to show a particular account_id data to a client. So when client1 logs in then he should be able to view his data only and when client2 logs in he should be able to view his data only. Is there any workaround in apache superset for the same as 0.36 branch is yet to be released. Thanks |
@altef @villebro @justin-barton @mistercrunch Thank you for the update! It's a nice feature and we find it very use useful. Recently I was looking through the code and found a small issue with SELECT statement in the get_rls_filters() func. So I made this PR #9365, could you pls take a look? |
@axelet hey, thanks for that! It looks good to me; I must have introduced that in one of the query format changes. |
Yeah, thanks for having a look :) |
Hi, you can try to look at this(#9320), it may be helpful to you, and it will take effect after configuration. |
Do these filters get applied dynamically to existing dashboards? For example, say I have transaction data for 100 tenants and I create a dashboard with a single big number chart that simply sums the transaction amount. Ideally, I want to expose (view access only) this single dashboard to all the tenants and rest assured that the big number chart each tenant sees in the dashboard is the sum of transactions made only by the viewing tenant. |
Hey, @asen-aura . I believe that filters are applied on the query level. So, yes, your tenants will be able to see only their data, as well as all the aggregations will be done only for the tenant data. The feature basically adds WHERE clause to SQL query. Regarding dashboards question all the filters are applied for tables or slices (charts). So, if you change the filters for the dashboard's table or chart it will dynamically be applied to dashboard (as soon as you refresh the dashboard). |
@altef Hi, I have a question. At present, RLS is only applicable to Charts production, or it is also applicable to SQLLab. I am looking forward to your clarification. |
hey @Mhs-Aaron, I've never actually had reason to use SQLLab so I'm not clear on how it works. I've mostly disabled permissions to it on my instance. The RLS clause gets added in the |
@altef Thank you very much for your timely feedback, I would like to know what kind of user is appropriate for the SQLLab menu assignment, if it is open to business people, then the same RLS problem will be involved.I wonder if you have given any thought to this point. |
@Mhs-Aaron Hi, regarding RLS filters I don't think they have effect on Sqllab. SQLLab is more like an admin tool imo. But things could change since I visited the code last time |
@altef A very useful feature, so thank you for implementing. I do have a question that I am confused about. The documentation and parts of the code reference "table and roles", however, in the UI it seems that it is not actually a database table that needs to be configured in the RLS rule definition, but actually it is datasets that must be specified. Is that a true or am I misunderstanding how it works? The reason that I ask is in a scenario where I have a dozen different dashboards that each use their own unique datasets, but which all include the same common database table "users" as part of their query, I would like the same RLS rule to apply to all of these dashboards, it would be great if I could specify the table "users" in the rule, since that single table is common across all of the many datasets. However, it would seem from my review of the UI/code that it is not the actual database table name that is defined in the RLS rule, but that all dozen datasets would have to be specified, which is not very scalable since that would mean that as new datasets are added, then the rule would have to be continually modified. Am I understanding this feature correctly? Or is there some other way to do what I would like to do by specify a single common table that is used within multiple dashboard datasets to ensure the RLS is enforced across all of them? Thank you in advance. |
@shenrie If I recall correctly (and this may be out-of-date), it's on a table but not by table name - a table entry in superset, which has a reference to a specific database. I've had luck doing that sort of thing from outside superset. For example, if I had a script that added the database and tables to superset, I might have it set the RLS rules for any table called data_group_id IN (
SELECT data_group_id FROM users_to_data_groups
WHERE user_id={{ current_user_id() }}
) |
CATEGORY
Choose one
SUMMARY
Many BI applications, particularly in multi-tenancy scenarios, require support for row-level security. That is, the ability to show different slices of a table to users based on some user attribute. To accomplish this, I've added a new model to describe row level security filters, which references a Table and a Role. So when adding a row level security filter, you specify a particular Role and Table.
When querying that table, the applicable filters are added to the query. I've modified the query function here to add any relevant to the WHERE clause.
As well, I've added a UI for managing the row level security filters. And for convenience, added it as a related view for tables.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before:
After:
Row level security filters added to the Security menu:
Row level security list
Row level security interface
That allows for the management of the RowLevelSecurityFilters model. Additionally, for convenience, I've added it as a related view for tables.
After logging in with a user assigned to that role, I can still supply additional filters:
The generated SQL includes the additional filters AND the clause supplied by the row level security filter(s).
TEST PLAN
Everything seems to be working as expected on my end, but a few things should be done to verify the changes.
Gamma
role, as well as the role you've just created.Row level security filter
and assign it the table and row.ADDITIONAL INFORMATION
REVIEWERS