Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(security): Updating assert logic #10034

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Jun 10, 2020

SUMMARY

This PR updates the security manager assertion logic making it more flexible for deployments to configure their security manager. Currently there exists two types of methods:

  1. can_access_*, e.g. can_access_datasource, which returns a boolean in response to whether the user can access said resource.
  2. assert_*_permission, e.g. assert_datasource_permission, which returns None but raises an exception if the user cannot access said resource.

The issue with this approach is that some methods like access_datasource have all the logic and assert_datasource_permission calls can_access_datasource and raises an exception if the response is False, whereas others like can_access_table have no assert equivalent.

Additional context as to why a user cannot access said resource is lost when the logic resides in the can_access_* and thus the preferred approach is to have the logic in the assert_*_permission method and thus the can_access_* methods (which are mostly just convenience methods) can take the form,

def can_access_datasource(self, datasource) -> bool:
    try:
        self.assert_datasource_permission(datasource)
    except SupersetSecurityException:
        return False

    return True

Note I’ve only translated a few of the can_access_* methods. I think in the future it could make sense that more of these methods could leverage this pattern depending on how deployments with custom security managers would want to proceed.

Secondly the term assert seems strange as one would expect it to raise an AssertionError if the assertion fails, whereas these methods raise a SupersetSecurityException, hence I opted for the requests approach (which uses raise_for_status) to call these methods raise_for_access and thus it seems clearer than an exception may be raised (and thus should be handled if necessary). I used the term access rather than permission for consistency with the can_access_* methods. I also added a wrapper function to various classes so rather than,

query_context = QueryContext(**json.loads(request.form["query_context"]))
security_manager.assert_query_context_permission(query_context)

the logic is now,

query_context = QueryContext(**json.loads(request.form["query_context"]))
query_context.raise_for_access()

Finally rather than having a slew of raise_for_*_access which can add to the confusion (and introduce unnecessary complexity) if a deployment needs to override these, i.e., they intrinsically need to know how these interact, I defined a single raise_for_access method which contains all the logic regardless of the resource type (table, datasource, visualization, etc.). This can be helpful say if a user cannot access to an outright datasource but due to column level security can access the datasource in the context of a visualization (where the columns are defined).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

CI and added unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch from 9f9c526 to 4eb4e21 Compare June 10, 2020 21:22
@codecov-commenter
Copy link

codecov-commenter commented Jun 10, 2020

Codecov Report

Merging #10034 into master will increase coverage by 4.84%.
The diff coverage is 92.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10034      +/-   ##
==========================================
+ Coverage   64.08%   68.93%   +4.84%     
==========================================
  Files         584      584              
  Lines       31054    31077      +23     
  Branches     3180     3180              
==========================================
+ Hits        19901    21422    +1521     
+ Misses      10975     9546    -1429     
+ Partials      178      109      -69     
Flag Coverage Δ
#cypress 53.92% <ø> (?)
#javascript 59.48% <ø> (ø)
#python 67.40% <92.15%> (+0.06%) ⬆️
Impacted Files Coverage Δ
superset/viz_sip38.py 0.00% <0.00%> (ø)
superset/views/api.py 66.66% <50.00%> (ø)
superset/views/core.py 75.39% <75.00%> (ø)
superset/charts/api.py 80.31% <100.00%> (ø)
superset/common/query_context.py 79.62% <100.00%> (+0.25%) ⬆️
superset/connectors/base/models.py 86.57% <100.00%> (+0.14%) ⬆️
superset/security/manager.py 91.17% <100.00%> (+2.13%) ⬆️
superset/viz.py 57.11% <100.00%> (+0.05%) ⬆️
superset/connectors/sqla/models.py 89.10% <0.00%> (+0.14%) ⬆️
...rontend/src/SqlLab/components/AceEditorWrapper.tsx 56.98% <0.00%> (+1.07%) ⬆️
... and 147 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9532bff...06efb7d. Read the comment docs.

if not self.datasource_access(datasource):
raise SupersetSecurityException(
self.get_datasource_access_error_object(datasource),
from superset import db
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should be any assertion logic to ensure that the right combination of parameters are defined. Note except for database and table these are all mutually exclusive. I did consider having an argument database_and_table which would be an Optional[Tuple["Database, "Table']] but I wasn't sold on the idea and thus opted for the additional docstring context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optional approach here would be to define DatasourceMixin which provides a datasource, and add it to BaseViz, QueryContext and Datasource. in this case we could simplify the signature of this method by combining viz, datasource and query_context into one single datasource: DatasourceMixin. The same could be done to query and database (DatabaseMixin). This could help clarify what the diffefent arguments mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tempting to think of something like use a **kwargs then use a dict to route the named parameters to smaller methods that would implement the security assertion logic. A bit crazy, but has it's upsides, cleaner signature, forced breaking logic into smaller chunks, easy to extend.

Not 100% sure if it's doable,

@@ -2656,8 +2668,7 @@ def fetch_datasource_metadata(self) -> FlaskResponse:
if not datasource:
return json_error_response(DATASOURCE_MISSING_ERR)

# Check permission for datasource
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annexed self explanatory comment.

@john-bodley john-bodley marked this pull request as ready for review June 10, 2020 21:27
@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch 3 times, most recently from 94a0fc7 to 7c05667 Compare June 11, 2020 01:25
Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one other question: does this handle queries run within SQL Lab too? I'm assume that's what the "datasource" part does, but wanted to confirm

)
try:
self.raise_for_access(datasource=datasource)
except SupersetSecurityException:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a different exception thrown, then this fails open. Is that secure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 in theory (by construction) the raise_for_access method should only throw a SupersetSecurityException exception. Generally catching scoped (as opposed to general) exceptions is preferred.

@john-bodley
Copy link
Member Author

john-bodley commented Jun 11, 2020

@etr2460 SQL Lab uses the rejected_tables method which calls can_access_datasource (renamed to can_access_table in #10031) which leverages this the raise_for_access method.

@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch from 7c05667 to 06efb7d Compare June 11, 2020 22:16
@john-bodley john-bodley requested a review from dpgaspar June 15, 2020 16:31
@john-bodley
Copy link
Member Author

john-bodley commented Jun 15, 2020

@dpgaspar and @villebro I was wondering whether you had any updated thoughts regarding this approach.

Note I do believe that long term many constructs of the existing security manager (from a data access perspective) will need to change as the constructs of database, schema, datasource access is somewhat regimented especially within the world of row level access (exists within Apache Superset) and column level access (exists at Airbnb in the form of metric level access).

@@ -450,7 +450,7 @@ def data(self) -> Response:
except KeyError:
return self.response_400(message="Request is incorrect")
try:
security_manager.assert_query_context_permission(query_context)
query_context.raise_for_access()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not changed in this PR, but this try/catch seems kinda weird, I would've expected this api to be wrapped in the handle_api_exception decorator that i believe automatically generates the proper json error response from an exception

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 I think this merely highlights the inconsistencies with how backend errors are handled. This is detailed in SIP-41.

Copy link
Member

@dpgaspar dpgaspar Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 #9529 kind of addresses that, it's a POC for SIP-41, uses flask error handling

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good, although i'm not sure if @villebro or @dpgaspar still needed to take a look

@etr2460 etr2460 added the v0.37 label Jun 19, 2020
@@ -232,7 +233,8 @@ def can_access_all_databases(self) -> bool:

def can_access_database(self, database: "Database") -> bool:
"""
Return True if the user can access the Superset database, False otherwise.
Return True if the user can access all the tables within the Superset database,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though there was merit in being more explicit about what it means to be able to access a database, schema, etc. It's not overly apparent without the comment.

@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch 4 times, most recently from b4ae299 to 997f1fd Compare June 20, 2020 06:05
@villebro
Copy link
Member

@john-bodley this needs a rebase

@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch from 997f1fd to 3c5f3ef Compare June 21, 2020 05:21
@john-bodley
Copy link
Member Author

john-bodley commented Jun 21, 2020

@villebro I've rebased. Note in the second commit I also extended the logic to replace the rejected_tables method. I think this provides clarity/simplicity in the views but possibly adds complexity to the raise_for_access method.

@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch 2 times, most recently from 634943c to 95fe0ec Compare June 21, 2020 05:39
@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch 2 times, most recently from 1aef994 to bfb0965 Compare June 21, 2020 21:58
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My only concern is the fairly loaded method raise_for_access method, which in current form feels slightly difficult to approach for first-timers, and was slightly difficult to follow with many branching if-statements. I'm not sure how you feel about the mixin approach, but it could help remove some complexity in this method, although would arguably introduce some bloat elsewhere.

if not self.datasource_access(datasource):
raise SupersetSecurityException(
self.get_datasource_access_error_object(datasource),
from superset import db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optional approach here would be to define DatasourceMixin which provides a datasource, and add it to BaseViz, QueryContext and Datasource. in this case we could simplify the signature of this method by combining viz, datasource and query_context into one single datasource: DatasourceMixin. The same could be done to query and database (DatabaseMixin). This could help clarify what the diffefent arguments mean.

@john-bodley
Copy link
Member Author

john-bodley commented Jun 22, 2020

@villebro I agree that the raise_for_access method is fairly loaded, though in general I like the standardization it has provided in terms of checking for access from the perspective of a query, viz, etc. It also has reduced a significant amount of the spaghetti security logic, i.e., pervious assert_datasource_permission , can_access_datasource, can_access_table, rejected_tables, etc. whereas now there's just the raise_for_access and a convenience method.

The mixin is an interesting idea, though I sense it mightn’t simply the logic in raise_for_access as one would need to do a number of isinstance checks to work out whether the DatasourceMixin is a query context, viz, etc to pull out the appropriate attributes. Also this doesn’t address the issue of the Table argument as there’s no mixin related to that.

@dpgaspar
Copy link
Member

@etr2460, @john-bodley

I'm aware of this change, will take a look today

if not self.datasource_access(datasource):
raise SupersetSecurityException(
self.get_datasource_access_error_object(datasource),
from superset import db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tempting to think of something like use a **kwargs then use a dict to route the named parameters to smaller methods that would implement the security assertion logic. A bit crazy, but has it's upsides, cleaner signature, forced breaking logic into smaller chunks, easy to extend.

Not 100% sure if it's doable,


if not (schema_perm and self.can_access("schema_access", schema_perm)):
datasources = SqlaTable.query_datasources_by_name(
db.session, database, table_.table, schema=table_.schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the from superset import db
We could replace it by self.get_session

@john-bodley john-bodley force-pushed the john-bodley--security-manager-assert branch from bfb0965 to 24cf5fd Compare June 22, 2020 20:31
@john-bodley john-bodley merged commit aefef9c into apache:master Jun 24, 2020
@john-bodley john-bodley deleted the john-bodley--security-manager-assert branch June 24, 2020 03:49
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Jun 24, 2020
* chore(security): Updating assert logic

* Deprecating rejected_tables

Co-authored-by: John Bodley <john.bodley@airbnb.com>
(cherry picked from commit aefef9c)
@john-bodley john-bodley mentioned this pull request Jun 24, 2020
6 tasks
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
* chore(security): Updating assert logic

* Deprecating rejected_tables

Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v0.37 🚢 0.37.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants