Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-29] Add support for row-level security #8699

Merged
merged 49 commits into from
Feb 22, 2020

Conversation

altef
Copy link
Contributor

@altef altef commented Nov 30, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Many BI applications, particularly in multi-tenancy scenarios, require support for row-level security. That is, the ability to show different slices of a table to users based on some user attribute. To accomplish this, I've added a new model to describe row level security filters, which references a Table and a Role. So when adding a row level security filter, you specify a particular Role and Table.

When querying that table, the applicable filters are added to the query. I've modified the query function here to add any relevant to the WHERE clause.

As well, I've added a UI for managing the row level security filters. And for convenience, added it as a related view for tables.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before:

menu-before

After:

  1. Row level security filters added to the Security menu:
    menu

  2. Row level security list
    list

  3. Row level security interface
    edit

That allows for the management of the RowLevelSecurityFilters model. Additionally, for convenience, I've added it as a related view for tables.

table-related-view

After logging in with a user assigned to that role, I can still supply additional filters:

additional-filters

The generated SQL includes the additional filters AND the clause supplied by the row level security filter(s).

generated-sql

TEST PLAN

Everything seems to be working as expected on my end, but a few things should be done to verify the changes.

  1. Set up a limited user
    1. Create a new role and give it access to a table.
    2. Create a new user and assign it the Gamma role, as well as the role you've just created.
    3. Create a Row level security filter and assign it the table and row.
  2. Ensure that the table is still working as expected for you (there should be no change)
  3. Create a simple dashboard built on that table
  4. Verify the limited user is only seeing the filtered data
    1. Log in as the limited user
    2. Check the table to ensure the filter is being applied.
    3. Check the dashboard to ensure the data is being limited.

ADDITIONAL INFORMATION

  • Has associated issue: #8644
  • Changes UI
  • Requires DB Migration.
    • To add a new model.)
  • Confirm DB Migration upgrade and downgrade tested.
    • I tested it, but presumably this doesn't apply to me?
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@codecov-io
Copy link

codecov-io commented Nov 30, 2019

Codecov Report

Merging #8699 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8699   +/-   ##
=======================================
  Coverage   59.13%   59.13%           
=======================================
  Files         372      372           
  Lines       11938    11938           
  Branches     2925     2925           
=======================================
  Hits         7059     7059           
  Misses       4699     4699           
  Partials      180      180

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a131a72...1451c05. Read the comment docs.

@altef altef changed the title [WIP][SIP-29] Add support for row-level security [SIP-29] Add support for row-level security Nov 30, 2019
@villebro
Copy link
Member

Great initiative! I looked at the PR for this SIP and a few things came to mind:

To address the issue you raise in the docs where belonging to two departments results in zero rows (WHERE dept = 1 AND dept = 2), I propose changing the Clause field to JSON, putting each clause behind a key, which are later ORed together. Example: There are four RLS filters specified for a table: two for departments ("Finance", "Risk") and two for time history ("30 days", "90 days").
"Finance" role:

{
    "dept": "dept_id = 'Finance'"
}

"Risk" role:

{
    "dept": "dept_id = 'Risk'"
}

"30 days" role:

{
    "history": "report_date >= current_timestamp() - 30"
}

"90 days" role:

{
    "history": "report_date >= current_timestamp() - 90"
}

Now, if a user belonged to all four groups, this would result in the following WHERE clause:

((dept_id = 'Finance') OR (dept_id = 'Risk')) AND 
((report_date >= current_timestamp() - 30) OR (report_date >= current_timestamp() - 90))

In this case the user would see 90 days of history, and also all rows for Finance AND Risk.

Also, I think it would be good to be able to define a default WHERE clause if a user doesn't belong to any of the specified groups.

@altef
Copy link
Contributor Author

altef commented Nov 30, 2019

Hey - thanks for looking it over! I wanted to point that out in the docs for clarity,
and viewed it as more of something the user should be aware of than as an issue. I can definitely
see a certain use-case where it may not be ideal, but am of the mindset that it's better
to show no data (which can be easily detected and fixed), should the user introduce a
conflict in the filters, than to be overly permissive and allow a user to see data they otherwise
shouldn't be able to see. Should that occur, it may not be readily apparent.

I think everything you've mentioned can be mostly accomplished as-is, by taking a slightly
different approach to the role/filter composition.

  1. Filters clauses can contain OR. You could simply make a filter with the
    clause dept_id = 'Finance' OR dept_id = 'Risk' and assign it to a particular role. Obviously
    this is not ideal if you want to assign a user to more than one department by assigning them
    two roles. What I'm not sure of is how much this needs to come up. If someone has access to
    both the Finance and the Risk, are they really in each department, or are they some
    higher-up who should have greater access? The current method, I hope, seems analogous to superset's Restricting access to a subset of data sources recommendation.

Which leads me to realize that there was a bug in my initial implementation - filters are now
wrapped in parentheses in case one has an OR in it.

  1. To reference the Restricting access to a subset of data sources again, it's recommended
    to assign a limited user to the gamma role, and to make a role giving them access to the
    datasource in question. In the case of creating a default filter, you could just assign a
    filter to that role. However, it is not exactly what you have in mind, since it wouldn't
    be overwritten by additional filters.

My worry is that we'd be increasing complexity, in the code, as well as in ease-of-understanding on the
user's part, for minimal gains. However, it could be that I'm taking the wrong approach here.
Is this something others would find useful?

I would like to eventually add (probably as a separate SIP?) arbitrary user attributes that
can then be referenced in the row level security filters. But I'd want to talk to someone
about that, as I have many questions in that regard.

@villebro
Copy link
Member

villebro commented Dec 1, 2019

I think it is important to support belonging to multiple roles early on. Think AD/LDAP in a corporate setting; not uncommon to belong to hundreds of groups. Regarding implementation, I would propose just adding a column "role_based_filters" or similar to the tables table with the metadata:

{
  "dept": {
    "default": "false",
    "roles": {
      "finance": "dept_id = 1",
      "risk": "dept_id = 2"
    }
  },
  "duration": {
    "default": "report_date >= current_timestamp() - 1",
    "roles": {
      "finance": "report_date >= current_timestamp() - 30"
    }
  }
}

In this example, users that don't belong to any group would get a WHERE clause that returns zero rows due to the "false" clause (SELECT col FROM table WHERE FALSE -> no rows), and by default only the last days data would be available. If the user belongs to the "risk" Role, they would see only "dept_id = 2" for the last day (default clause for "duration"), whereas "finance" would see "dept_id = 1" for the last 30 days. Belonging to both would return data for both departments with 30 days of data.

One could later add the same column to the charts table, making it possible to introduce the same functionality on a per chart basis. With regards to the filter statements, I would propose using the same filter format that's currently used for adhoc_filters, which would enable us to leverage existing React components that allow for a much more user friendly means to add filters. To introduce the functionality, I would break the SIP into two parts; first introducing the backend functionality, i.e. adding the new column to table, making it possible to edit the filters by poking at the table metadata, and later adding proper UI functionality for editing the metadata.

@justin-barton
Copy link

I think it is important to support belonging to multiple roles early on. Think AD/LDAP in a corporate setting; not uncommon to belong to hundreds of groups. Regarding implementation, I would propose just adding a column "role_based_filters" or similar to the tables table with the metadata:

{
  "dept": {
    "default": "false",
    "roles": {
      "finance": "dept_id = 1",
      "risk": "dept_id = 2"
    }
  },
  "duration": {
    "default": "report_date >= current_timestamp() - 1",
    "roles": {
      "finance": "report_date >= current_timestamp() - 30"
    }
  }
}

In this example, users that don't belong to any group would get a WHERE clause that returns zero rows due to the "false" clause (SELECT col FROM table WHERE FALSE -> no rows), and by default only the last days data would be available. If the user belongs to the "risk" Role, they would see only "dept_id = 2" for the last day (default clause for "duration"), whereas "finance" would see "dept_id = 1" for the last 30 days. Belonging to both would return data for both departments with 30 days of data.

One could later add the same column to the charts table, making it possible to introduce the same functionality on a per chart basis. With regards to the filter statements, I would propose using the same filter format that's currently used for adhoc_filters, which would enable us to leverage existing React components that allow for a much more user friendly means to add filters. To introduce the functionality, I would break the SIP into two parts; first introducing the backend functionality, i.e. adding the new column to table, making it possible to edit the filters by poking at the table metadata, and later adding proper UI functionality for editing the metadata.

@villebro I've been looking through your suggestions and I believe that there are a number of potential bugs / unintended consequences:

  1. When someone belongs to multiple groups, setting the default behaviour to OR together all the permissions seems like a dangerous assumption. This errs on the side of being maximally permissive, which is usually not advisable. The current setup forces the user to define what happens at intersections by creating another role for any combined roles and explicitly setting the clauses.

  2. Even if we were to OR together permissions, using a key-based JSON approach would likely not be the right one. By way of example, let's say that on table exports:

Role A has filters: country='Freedonia' AND item='Apples'
Role B has filters: country='Ruritania' AND item='Oranges'

For someone that has Role A and Role B, ORing together at a key-level would give the user access to not only reports on Freedonia Apple exports and Ruritania Orange exports (as you might expect) but also Ruritania Apple exports and Freedonia Orange exports (which may not be expected/desired in a lot of use cases).

  1. Having users directly manage large JSON objects to control row-based security is unduly cumbersome and inaccessible to some users, so a UI / SQL approach is preferable.

  2. The concept of defaults (and how to use them) again requires assumptions that will not fit a lot of use cases. For example, your proposed setup only works where the default is more restrictive than all of the roles. If there were a case where the default was to show last 15 days (report_date >= current_timestamp() - 15) but one of the roles was to restrict to only seeing the most recent day's data (report_date >= current_timestamp() - 1) then your setup would actually escalate permissions.

  3. The proposal to break the SIP into two parts, the first being only a back-end change to tables, means that most users will not have use of row-level security when this is released, which defeats the purpose of the SIP.

In answer to the above, I would suggest one of the following two approaches:

A. Accept the pull request as-is, continuing to force users to explicitly define the permissions for any intersection cases (since there aren't any defaults that will work in all scenarios and that would lead to predictable behaviour for the user)

B. Modify the current pull request to allow for roles to be combined by ORing together the entire clause in brackets, e.g.:
Role A & Role B: WHERE (country='Freedonia' AND item='Apples') OR (country='Ruritania' AND item='Oranges')

With a preference for A.

@villebro
Copy link
Member

villebro commented Dec 1, 2019

Thanks for the feedback, very good to have an exhaustive discussion prior to committing to any approach.

I think there might be a misunderstanding about how the ORing approach should work. The basic idea is this:

  1. AND across different keys
  2. OR within keys
  3. apply default if user doesn't belong to any roles within a certain key

In my example, the following WHERE clauses would be generated:

  1. doesn't belong to any groups: ((false)) AND ((report_date >= current_timestamp() - 1))
  2. belongs to "finance": ((dept_id = 1)) AND ((report_date >= current_timestamp() - 30))
  3. belongs to "risk": ((dept_id = 2)) AND ((report_date >= current_timestamp() - 1))
  4. belongs to "finance" and "risk": ((dept_id = 1) OR (dept_id = 2)) AND ((report_date >= current_timestamp() - 30))

For your example case, providing that these filter groups were made with the same key (as I understand they should), the following WHERE clause would be generated: ((country='Freedonia' AND item='Apples') OR (country='Ruritania' AND item='Oranges')), i.e. the user would not see Ruritania Apple exports.

In the case of having a default duration of 30 days and specifying a more restrictive filter group for 1 day, the default would not be applied, i.e. the user would only see one day's worth of data. Of course, if the user belonged to two restrictive groups, 15 days and 1 days, the more permissive role of 15 days would in practice apply. However, this seems like a logical error in how the roles are assigned to users.

The proposal to start by rolling out the backend functionality was merely a proposal to keep the PRs as small as possible and easier to review/develop. However, I'm sure they can be done together, assuming the person working on the PR is proficient in both the frontend and backend aspects of the codebase.

@justin-barton
Copy link

Thanks for the feedback, very good to have an exhaustive discussion prior to committing to any approach.

I think there might be a misunderstanding about how the ORing approach should work. The basic idea is this:

  1. AND across different keys
  2. OR within keys
  3. apply default if user doesn't belong to any roles within a certain key

In my example, the following WHERE clauses would be generated:

  1. doesn't belong to any groups: ((false)) AND ((report_date >= current_timestamp() - 1))
  2. belongs to "finance": ((dept_id = 1)) AND ((report_date >= current_timestamp() - 30))
  3. belongs to "risk": ((dept_id = 2)) AND ((report_date >= current_timestamp() - 1))
  4. belongs to "finance" and "risk": ((dept_id = 1) OR (dept_id = 2)) AND ((report_date >= current_timestamp() - 30))

For your example case, providing that these filter groups were made with the same key (as I understand they should), the following WHERE clause would be generated: ((country='Freedonia' AND item='Apples') OR (country='Ruritania' AND item='Oranges')), i.e. the user would not see Ruritania Apple exports.

In the case of having a default duration of 30 days and specifying a more restrictive filter group for 1 day, the default would not be applied, i.e. the user would only see one day's worth of data. Of course, if the user belonged to two restrictive groups, 15 days and 1 days, the more permissive role of 15 days would in practice apply. However, this seems like a logical error in how the roles are assigned to users.

The proposal to start by rolling out the backend functionality was merely a proposal to keep the PRs as small as possible and easier to review/develop. However, I'm sure they can be done together, assuming the person working on the PR is proficient in both the frontend and backend aspects of the codebase.

I think that I'm following. So in your framework, my example would look something like:

{
  "somekey": {
    "default": "false",
    "roles": {
      "freedonia-orchards": "country='Freedonia' AND item='Apples'",
      "ruritania-groves": "country='Ruritania' AND item='Oranges'"
    }
  }
}

I believe that functionally this is very similar to the current PR, with the following difference:

  • In the current PR, combinations of roles must be created as new roles and the filter clauses explicitly defined
  • In your proposed setup the logic for how the clauses within roles should be combined is explicitly defined by the user at the time of creation in the AND/OR hierarchy of the JSON structure

Is that accurate? How would you envision making this available to end-users in the UI?

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not too far from where this should be, but I'm thinking there are some important things to consider here. I think the many-to-many to role is important.

Another point is who can edit these rules and making sure security works well out of the box. Only admins should be able to edit these...

superset/connectors/sqla/models.py Outdated Show resolved Hide resolved
superset/connectors/sqla/models.py Outdated Show resolved Hide resolved
superset/connectors/sqla/models.py Outdated Show resolved Hide resolved
superset/connectors/sqla/models.py Outdated Show resolved Hide resolved
@villebro
Copy link
Member

villebro commented Dec 4, 2019

@justin-barton yes, precisely like that. I refined my original proposal somewhat, and concluded that the default value should be present in the OR clause. For instance, if you would want to give access to everyone for pear sales in Megalomania, the key-value pair "default": "country='Megalomania' AND item='Pears'" should be present under "somekey" to ensure users in the ruritania-groves Role aren't left out.

My main motivation here is to ensure we can construct arbitrarily complex role based filters, and after this last amendment I believe this is pretty solid. So the basic premise is:

  • Default value for key -> what everyone should see
  • Additional keys -> increased security
  • Additional roles within a key -> relaxed security,

which I feel should be a pretty good trade-off between complexity and versatility.

@mistercrunch @dpgaspar do you feel this should be done using the FAB model, or as a column under table? I was thinking one could create a React component for editing the groups/roles, that in turn leverages the existing AdhocFilterEditPopover React component.

@altef
Copy link
Contributor Author

altef commented Dec 4, 2019

@mistercrunch thanks! RLS filters should now be many-to-many to roles. I've moved the logic to get_sqla_query, using SQLA expression toolkit's text.

Here's a screenshot of the updated RLS filters UI supporting multiple roles.
image

I agree completely about including user attributes - that'll be such a neat and useful feature.

@villebro
Copy link
Member

Thanks @altef for quickly addressing the last change requests and your patience with the prolonged review process!

@villebro villebro merged commit dee16de into apache:master Feb 22, 2020
:returns: A list of IDs.
"""
ids = [f.id for f in self.get_rls_filters(table)]
ids.sort() # Combinations rather than permutations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems these IDs are sorted to satisfy the cache consistency. Rather than storing these as an ordered list why not return these as a set?

:rtype: List[str]
"""
return [
text("({})".format(template_processor.process_template(f.clause)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use an f string here as opposed to .format()?

@zpi1223
Copy link

zpi1223 commented Mar 12, 2020

Hi, I want to post our current processing ideas, but there are still two points that have not been realized.
The interface is similar to @altef and the main processing is filtering during drawing (including the user's regular Role and new permission at the line level). The operation interface is as follows:
1.image
2.image
3.image
The query interface after authorization is as follows:
4.image

@zpi1223
Copy link

zpi1223 commented Mar 12, 2020

@altef Can U give some advice, the difference between the two treatment methods, as well as the feasibility, expectation.

@zpi1223
Copy link

zpi1223 commented Mar 12, 2020

@altef I have a few questions:

  1. Which version did you modify (we used 0.28.1)?
  2. Your processing code is placed in viz.py-cache_key, how to ensure that the filtering is effective (I found that normal interaction processing is only in viz.py-query_obj)?
    image

@altef
Copy link
Contributor Author

altef commented Mar 12, 2020

Hi @AaronCH5, I'm not completely certain what you're asking, but one difference is that in this pull request (which has been merged) instead of specifying a column and a value you specify a clause that will be added to the query. So in your example above where you specify the column name avg_circulate_duration and the value toB, you would instead just write something like avg_circulate_duration='toB'.

Since you can write whatever you want in there, this allows you to go pretty deep down the filter hole. You can use multiple columns, or reference other tables as necessary. For example, it would allow you to move complex permissions into its own table should you so desire:

avg_circulate_duration IN (SELECT allowed_avg_circulate_duration FROM custompermissions WHERE user_id='{{current_user_id()}}')

The actual processing code where the filter clause gets added to the query is in https://github.com/apache/incubator-superset/blob/master/superset/connectors/sqla/models.py#L867

The cache key code in viz.py you referenced (https://github.com/apache/incubator-superset/blob/master/superset/viz.py#L394) adds the row level security IDs to the cache key, in order to differentiate cached query results for queries that would otherwise be the same but for different RLS rules.

@zpi1223
Copy link

zpi1223 commented Mar 13, 2020

Hi @altef . First of all, thank you for your prompt answer. I can understand that your processing method has a higher degree of freedom and is more convenient; but I have not understood what you said 【in order to differentiate cached query results for queries that would otherwise be the same but for different RLS rules】. I tried your pull code in the local environment before, but found that filtering is not effective.

In addition, I also want to know which version of the transformation you are based on, because I find your individual changes The file does not exist(eg:superset/app.py) in the version I am using (0.28.1).

image
image
image
Note:The last image, without adding anything to the filter below.
I didn't notice where I configured wrong, please correct me, thanks.

@durchgedreht
Copy link

Hi,
one question (that might be related): Is the filters applied on the fly also on logged in users or is a re-logon needed?

Grouping works for me, using 0.36.0rc1
https://github.com/apache/incubator-superset/releases

Might be worth mentioning JINJA scripting works as well! Thanks for this feature altef!

@altef
Copy link
Contributor Author

altef commented Mar 13, 2020

@durchgedreht As far as I know they should be applied on-the-fly, and shouldn't require a re-login. Maybe take that with a grain of salt though; I don't currently have a Superset environment running so I can't test it to verify. Also, thanks! :D

@altef
Copy link
Contributor Author

altef commented Mar 13, 2020

@AaronCH5 It looks from a few posts past that you're using Druid. The filtering occurs in sqla/models.py so it probably isn't applicable to Druid (I'm not familiar with Druid). Looking briefly at the the druid/models.py file, there appears to be a function called run_query - maybe you could add the filtering code there? Again, I'm not familiar with Druid so I could be way off in that regard.

@zpi1223
Copy link

zpi1223 commented Mar 16, 2020

@altef Thank you very much for your confusion, I will try again. Thanks

@sahiljain001
Copy link

Hi All,

We are looking for a solution in which we need to show a particular account_id data to a client. So when client1 logs in then he should be able to view his data only and when client2 logs in he should be able to view his data only. Is there any workaround in apache superset for the same as 0.36 branch is yet to be released.

Thanks

@axelet
Copy link
Contributor

axelet commented Mar 25, 2020

@altef @villebro @justin-barton @mistercrunch Thank you for the update! It's a nice feature and we find it very use useful.

Recently I was looking through the code and found a small issue with SELECT statement in the get_rls_filters() func. So I made this PR #9365, could you pls take a look?

@altef
Copy link
Contributor Author

altef commented Mar 25, 2020

@axelet hey, thanks for that! It looks good to me; I must have introduced that in one of the query format changes.

@axelet
Copy link
Contributor

axelet commented Mar 27, 2020

Yeah, thanks for having a look :)

@zpi1223
Copy link

zpi1223 commented Mar 27, 2020

Hi All,

We are looking for a solution in which we need to show a particular account_id data to a client. So when client1 logs in then he should be able to view his data only and when client2 logs in he should be able to view his data only. Is there any workaround in apache superset for the same as 0.36 branch is yet to be released.

Thanks

Hi, you can try to look at this(#9320), it may be helpful to you, and it will take effect after configuration.

@asen-aura
Copy link

Do these filters get applied dynamically to existing dashboards? For example, say I have transaction data for 100 tenants and I create a dashboard with a single big number chart that simply sums the transaction amount. Ideally, I want to expose (view access only) this single dashboard to all the tenants and rest assured that the big number chart each tenant sees in the dashboard is the sum of transactions made only by the viewing tenant.

@axelet
Copy link
Contributor

axelet commented Apr 14, 2020

Hey, @asen-aura . I believe that filters are applied on the query level. So, yes, your tenants will be able to see only their data, as well as all the aggregations will be done only for the tenant data.

The feature basically adds WHERE clause to SQL query.
For, example:
SELECT * FROM your_table;
transforms to
SELECT * FROM your_table
WHERE tenant = 'foo';

Regarding dashboards question all the filters are applied for tables or slices (charts). So, if you change the filters for the dashboard's table or chart it will dynamically be applied to dashboard (as soon as you refresh the dashboard).

@zpi1223
Copy link

zpi1223 commented Dec 25, 2020

@altef Hi, I have a question. At present, RLS is only applicable to Charts production, or it is also applicable to SQLLab. I am looking forward to your clarification.

@altef
Copy link
Contributor Author

altef commented Dec 25, 2020

hey @Mhs-Aaron, I've never actually had reason to use SQLLab so I'm not clear on how it works. I've mostly disabled permissions to it on my instance. The RLS clause gets added in the get_sqla_query() function and I'm not sure if SQLLab makes use of that function or not (though I somewhat doubt it).

@zpi1223
Copy link

zpi1223 commented Dec 25, 2020

@altef Thank you very much for your timely feedback, I would like to know what kind of user is appropriate for the SQLLab menu assignment, if it is open to business people, then the same RLS problem will be involved.I wonder if you have given any thought to this point.
In addition, there is a new function which I am interested in Multiple tables join queries to produce a single chart

@axelet
Copy link
Contributor

axelet commented Dec 25, 2020

@Mhs-Aaron Hi, regarding RLS filters I don't think they have effect on Sqllab. SQLLab is more like an admin tool imo. But things could change since I visited the code last time

@shenrie
Copy link

shenrie commented May 20, 2022

@altef A very useful feature, so thank you for implementing. I do have a question that I am confused about. The documentation and parts of the code reference "table and roles", however, in the UI it seems that it is not actually a database table that needs to be configured in the RLS rule definition, but actually it is datasets that must be specified. Is that a true or am I misunderstanding how it works?

The reason that I ask is in a scenario where I have a dozen different dashboards that each use their own unique datasets, but which all include the same common database table "users" as part of their query, I would like the same RLS rule to apply to all of these dashboards, it would be great if I could specify the table "users" in the rule, since that single table is common across all of the many datasets. However, it would seem from my review of the UI/code that it is not the actual database table name that is defined in the RLS rule, but that all dozen datasets would have to be specified, which is not very scalable since that would mean that as new datasets are added, then the rule would have to be continually modified.

Am I understanding this feature correctly? Or is there some other way to do what I would like to do by specify a single common table that is used within multiple dashboard datasets to ensure the RLS is enforced across all of them?

Thank you in advance.

@altef
Copy link
Contributor Author

altef commented May 21, 2022

@shenrie If I recall correctly (and this may be out-of-date), it's on a table but not by table name - a table entry in superset, which has a reference to a specific database.

I've had luck doing that sort of thing from outside superset. For example, if I had a script that added the database and tables to superset, I might have it set the RLS rules for any table called users. This of course doesn't help at all if you're adding them manually. I settled on a simple, consistent rule which could be set and never adjusted, restricting data by user ID based on a convention I maintain on the DB side. It was in the vein of :

data_group_id IN (
	SELECT data_group_id FROM users_to_data_groups 
	WHERE user_id={{ current_user_id() }}
)

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:db-migration PRs that require a DB migration size/L 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.