fix(parsedquery): resolve conflict when aliases and tables have the same name #21535

victorarbuesmallada · 2022-09-21T08:52:18Z

SUMMARY

We discovered that in queries that have aliases with the same name as tables, that removes said tables from the list returned in the ParsedQuery tables method. This affects the way we prevent users to access to certain tables, since they can circumvent the restricted access by retrieving data through subqueries that have the same alias as the name of the involved table.
In essence, this won't work:
select * from forbidden_table
This will work:
select * from (select * from forbidden_table) forbidden_table

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

A normal query shows that the table can't be accessed:

A query with a forbidden table on a subquery:

TESTING INSTRUCTIONS

Create a user that has only certain tables permission and try to perform a query with a forbidden table inside a subquery with the alias the same as the table name.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

victorarbuesmallada · 2022-09-21T08:57:11Z

Related to this issue: #21207

codecov · 2022-09-21T09:57:52Z

Codecov Report

Merging #21535 (a3a44b2) into master (4d12e37) will decrease coverage by 1.33%.
The diff coverage is 42.00%.

@@            Coverage Diff             @@
##           master   #21535      +/-   ##
==========================================
- Coverage   66.68%   65.34%   -1.34%     
==========================================
  Files        1793     1794       +1     
  Lines       68538    68687     +149     
  Branches     7282     7282              
==========================================
- Hits        45702    44881     -821     
- Misses      20974    21944     +970     
  Partials     1862     1862

Flag	Coverage Δ
hive	`52.84% <38.00%> (-0.24%)`	⬇️
mysql	`78.11% <42.00%> (-0.10%)`	⬇️
postgres	`78.16% <42.00%> (-0.11%)`	⬇️
presto	`52.74% <38.00%> (-0.24%)`	⬇️
python	`78.58% <42.00%> (-2.83%)`	⬇️
sqlite	`?`
unit	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/sql_parse.py	`62.20% <42.00%> (-34.87%)`	⬇️
superset/tables/schemas.py	`0.00% <0.00%> (-100.00%)`	⬇️
superset/columns/schemas.py	`0.00% <0.00%> (-100.00%)`	⬇️
...set/advanced_data_type/plugins/internet_address.py	`16.32% <0.00%> (-79.60%)`	⬇️
superset/utils/pandas_postprocessing/boxplot.py	`20.51% <0.00%> (-79.49%)`	⬇️
superset/charts/post_processing.py	`11.76% <0.00%> (-77.95%)`	⬇️
...perset/advanced_data_type/plugins/internet_port.py	`18.75% <0.00%> (-77.09%)`	⬇️
superset/utils/pandas_postprocessing/rolling.py	`21.87% <0.00%> (-68.75%)`	⬇️
...perset/utils/pandas_postprocessing/contribution.py	`34.61% <0.00%> (-65.39%)`	⬇️
superset/utils/pandas_postprocessing/resample.py	`37.50% <0.00%> (-62.50%)`	⬇️
... and 66 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

giftig · 2023-04-20T11:34:51Z

Hi @zhaoyongjie @john-bodley @michael-s-molina I'm looking to upgrade to the superset 2.1 release but we've had to make a number of small patches and security fixes to our fork of the 2.0 release patch issues such as this one, which was contributed here by a former member of our team.

I'm trying to figure out if this patch is still needed or if a similar fix has made it into the 2.1 release as there hasn't been any activity on the issue or PR here. Are you able to confirm whether the issue described here has been resolved?

We'd also be interested in any guidance you can offer in how we can avoid straying too far from what's being worked on by the open source team. What we'd really like to do is work with your team to find sensible ways of discussing and contributing some of the features and short-term fixes we've made back to the project for the benefit of both our organisation and the wider community. As we've been working at a fairly fast pace we've unfortunately diverged to get patches and customisations in quick and avoid waiting for releases, and we haven't previously allowed time to reach out to your team to see what we can give back other than by raising a couple of PRs like this one. We'll try to reach out via your slack and discuss a way forward, but wanted to add a heads up here in case you can advise who best to speak to and how best to get in touch.

Apologies for tagging you directly, I took your names from the top recent contributor list, and saw @zhaoyongjie previously had eyes on this PR 👀

michael-s-molina · 2023-04-20T11:52:46Z

Hi @giftig.

I'm trying to figure out if this patch is still needed or if a similar fix has made it into the 2.1 release as there hasn't been any activity on the issue or PR here. Are you able to confirm whether the issue described here has been resolved?

@eschutho will be the best person for this one as she was the release manager for 2.1

We'd also be interested in any guidance you can offer in how we can avoid straying too far from what's being worked on by the open source team. What we'd really like to do is work with your team to find sensible ways of discussing and contributing some of the features and short-term fixes we've made back to the project for the benefit of both our organisation and the wider community. As we've been working at a fairly fast pace we've unfortunately diverged to get patches and customisations in quick and avoid waiting for releases, and we haven't previously allowed time to reach out to your team to see what we can give back other than by raising a couple of PRs like this one. We'll try to reach out via your slack and discuss a way forward, but wanted to add a heads up here in case you can advise who best to speak to and how best to get in touch.

This is really nice. Contributions are always welcome. The best person to help you here is @rusackas who established a similar model with CCCS.

giftig · 2023-04-25T15:18:57Z

Hi @eschutho would you be able to weigh in on the above?

eschutho · 2023-04-28T22:37:18Z

Hi @eschutho would you be able to weigh in on the above?

HI @giftig, thanks for the PR. I didn't read through it thoroughly so excuse me if I missed anything in the implementation, but at first glance, I do know that we added rls support to subqueries in 2.1.0. What version are you currently on?

Also I agree on the suggestion to connect with @rusackas on contributing back and working more closely with the team here. Evan will be a great resource for getting you plugged in.

rusackas · 2023-04-28T22:42:23Z

@giftig happy to chat! I don't see you on Slack or see an email on your GitHub profile, so feel free to DM me on Slack if you join, and we can take it from there!

giftig · 2023-05-02T08:34:58Z

@eschutho thanks, I'll take a look at the updates around there and try to retest the issue we were having here. We're currently upgrading to the 2.1 release (from 2.0.0), and merging in the changes we had on our fork in order to do the upgrade. Unfortunately with some of those changes, like this one, I don't have an especially good idea of what the problem was, how the fix works, or whether the 2.1 release has independently fixed it, so I thought I'd check here and if this is fixed in the 2.1 release we can close this PR and it's one fewer commit to try to merge in.

I put this one at the end of my list to test while I checked with you so I'll try to verify the problem described here against 2.1 and if it's already fixed we can close this. Thanks to pointing me to the relevant changes in 2.1.

Thanks @rusackas, I believe our product owner has already reached out to you and we have a call scheduled this week, so we'll speak shortly and I'll sign up to your slack as well.

giftig · 2023-05-04T13:53:46Z

Hi, I've just run into this issue in 2.1 so I can confirm that the issue still exists. I'll attempt to rebase this patch onto the latest state of the project and raise it as a new PR. I'd like to flag that this is a serious security issue around bypassing datasource access permissions, though, so it'd be good to get some eyes on this issue from your side if possible.

As @victorarbuesmallada has raised here, currently in superset a user with no permissions to access forbidden_table can do so regardless using a query like

SELECT * FROM (SELECT * FROM forbidden_table) AS forbidden_table;

giftig · 2023-05-05T08:59:04Z

@rusackas do you know who would be best to take a look at what's being done here? I see in git blame that several names have made major contributions to the SQL parse logic in this file. Obviously this is some of the most complex / low-level code so I'd like to get opinions on whether this fix is the "right" fix and what we can do to get this merged.

Surprisingly I was able to cherry-pick this into 2.1 without any conflicts and am just retesting that the fix still works on our fork; it doesn't seem like the sql parse logic has changed much at all between 2.0 and 2.1. Hopefully this will make it easier to get this looked at and merged. Happy to make changes where needed and submit a new PR as Victor is no longer in our team and unlikely to be active here, and I notice your build is complaining about lack of test coverage in some parts / I'm sure someone on your team will have the context to come up with improvements for this fix.

villebro

Left a first round of comments

villebro · 2023-05-09T12:35:18Z

superset/sql_parse.py

+    class TableTuple(NamedTuple):
+        table: Table
+        level: int
+
+    class AliasTuple(NamedTuple):
+        name: str
+        level: int


Nit: while I'm having a hard time coming up with a better name for these, I'm not a big fan of Hungarian notation. An alternative could be TableWithLevel and AliasWithLevel to clarify the contents rather than the type.

As a fellow hater of Hungarian notation, I'd like to stress I'm not the original author :D I'll change these to something nicer.

villebro · 2023-05-09T12:38:50Z

superset/sql_parse.py

-            }
+                result = self._extract_from_token(statement)
+                if result:
+                    (table_tuples, alias_name_tuples) = result


Continuing my naming rant.. 😄 Here we coud probably just call then tables and aliases, and then later extract the table_names/table_levels from them etc.

villebro · 2023-05-09T12:45:35Z

tests/unit_tests/sql_parse_tests.py

+"""
+        )
+        == {Table("q2"), Table("src")}
+    )


are we really sure that the q2 reference in the q1 CTE will in fact reference the table q2 in the database? I did the following test on my devenv which has examples data and the superset metadata in the same database, and

with q1 as ( select distinct gender from birth_names), birth_names as ( select * from ab_role) select * from (select gender from q1) a

produces the following error:

sqlite error: no such column: gender

This would indicate to me that it was looking for gender in ab_role, not the actual birth_names table. So to me it appears as if only src is in fact only referenced here, right?

I just tried something similar but it's definitely referencing the database table, not the alias which came after it:

postgres=# CREATE TABLE hodor(age INT, name VARCHAR); CREATE TABLE postgres=# INSERT INTO hodor(age, name) VALUES (35, 'Hodor'); INSERT 0 1 postgres=# CREATE TABLE potatoes(count INT); CREATE TABLE postgres=# INSERT INTO potatoes(count) VALUES (100); INSERT 0 1 postgres=# WITH result AS (SELECT * FROM hodor), hodor AS (SELECT * FROM potatoes) SELECT * FROM result; age | name -----+------- 35 | Hodor (1 row)

Not sure what the point of the extra CTE in the final SELECT was here, but it didn't change the result for me either:

postgres=# WITH result AS (SELECT * FROM hodor), hodor AS (SELECT * FROM potatoes) SELECT * FROM (SELECT * FROM result) final; age | name -----+------- 35 | Hodor (1 row) postgres=# WITH result AS (SELECT * FROM hodor), hodor AS (SELECT * FROM potatoes) SELECT * FROM (SELECT age FROM result) final; age ----- 35 (1 row)

Having said that, we're using different databases. It wouldn't surprise me at all if postgres and sqlite handled CTEs slightly differently. If that's the case this endeavour is slightly doomed though; we'd need a lot more logic here to handle various different databases.

Arguably it's impossible to maintain this without somehow asking the database for the table list instead of trying to work it out ourselves. Unfortunately for the db my team really cares about, trino, that's not readily possible either.

I just tried it with sqlite and indeed it behaves differently:

sqlite> CREATE TABLE hodor(age INT, name VARCHAR); sqlite> INSERT INTO hodor(age, name) VALUES (35, 'Hodor'); sqlite> CREATE TABLE potatoes(count INT); sqlite> INSERT INTO potatoes(count) VALUES (100); sqlite> WITH result AS (SELECT * FROM hodor), hodor AS (SELECT * FROM potatoes) SELECT * FROM result; ...> ...> ...> 100

Ok good to know, this is likely one of those cases where different dbs behave differently. To support both use cases we could probably add a flag on BaseEngineSpec that specifies which variant the db in question supports. Something like cte_forward_alias_reference that we'd set to False for SQLite and True for Postgres (we should probably do some additional testing to see which is the more common flavor before setting that on BaseEngineSpec).

betodealmeida · 2023-05-09T22:34:33Z

Not in the scope of this PR, but in the neat future it might be good to switch to sqloxide to check for tables. We're already using it were performance is important, but it's also more robust in determining the accessed tables (from my experience).

giftig · 2023-05-15T13:26:25Z

@betodealmeida my team had similar ideas but suggested sqlglot, for similar reasons. I haven't got much experience with either tool so not sure which one would do the job better, but I think it's clear from the conversation above about different results with different databases that the issue can't be fixed without a major change to the approach, e.g. tackling this with a more robust tool for the job like sqlglot or sqloxide.

I'm in two minds about this PR now as it does have value to my team: we're very readily falling afoul of this particular issue as one of our jinja templates uses the alias-overriding-table pattern described in the issue, so we've kept it in our fork so that users can't escalate their permissions purely by accident. However, given @villebro 's comment on the test I'm not certain this doesn't worsen the experience for some users / backends by restricting legitimate access, and if that's a poor trade off in that sense. And what's clear is that this PR won't fix the real issue, we'd need a more fundamental change in approach to make this work as intended.

villebro · 2023-05-15T14:59:14Z

@betodealmeida my team had similar ideas but suggested sqlglot, for similar reasons. I haven't got much experience with either tool so not sure which one would do the job better, but I think it's clear from the conversation above about different results with different databases that the issue can't be fixed without a major change to the approach, e.g. tackling this with a more robust tool for the job like sqlglot or sqloxide.

I'm in two minds about this PR now as it does have value to my team: we're very readily falling afoul of this particular issue as one of our jinja templates uses the alias-overriding-table pattern described in the issue, so we've kept it in our fork so that users can't escalate their permissions purely by accident. However, given @villebro 's comment on the test I'm not certain this doesn't worsen the experience for some users / backends by restricting legitimate access, and if that's a poor trade off in that sense. And what's clear is that this PR won't fix the real issue, we'd need a more fundamental change in approach to make this work as intended.

@giftig @betodealmeida Incidentally, I've also been playing around with sqlglot lately and been very impressed with it (check the benchmark results which are pretty impressive). However I can't comment on which is better/worse, but my gut feeling is that if either makes it possible to deprecate sqlparse I think we should consider switching (having said that, it seems like 0.4.2-0.4.4 have come out with a pretty healthy cadence after a long hiatus).

tobymao · 2023-05-16T05:51:57Z

creator of sqlglot here. happy to chat. sqlglot has very advanced capabilities like multi dialect support, scope traversal, column qualification, and other optimizations.

not exactly sure what you're looking to do but you can also do column level lineage, so you can determine if any select is selecting from any source.

giftig · 2023-12-19T10:41:18Z

Going to close this one as it was always a bit too specific to my team's problem and the change has some issues which are complex to solve. We've moved in a different direction to solve the underlying issue we were seeing.

Might still be an idea to discuss the polyglot approach to generally improving the logic / performance here in future though; sounds like a good candidate for a SIP.

resolve conflict when aliases and tables have the same name

7115cc2

pull-request-size bot added the size/L label Sep 21, 2022

victorarbuesmallada mentioned this pull request Sep 21, 2022

ParsedQuery to return tables within subselects #21207

Closed

Merge branch 'master' into parse-subqueries

f2f98b9

fix: cleanup

42a52b6

victorarbuesmallada force-pushed the parse-subqueries branch from e205f72 to 42a52b6 Compare September 21, 2022 09:10

victorarbuesmallada force-pushed the parse-subqueries branch 6 times, most recently from 81c34cf to 0a92c17 Compare September 22, 2022 13:47

fix failing test

5a507bb

victorarbuesmallada force-pushed the parse-subqueries branch from 0a92c17 to 5a507bb Compare September 22, 2022 15:35

fix test assertion

a3a44b2

rusackas requested review from dpgaspar, villebro and eschutho May 8, 2023 21:56

villebro reviewed May 9, 2023

View reviewed changes

giftig closed this Dec 19, 2023

betodealmeida mentioned this pull request Jan 12, 2024

fix(sqlparse): improve table parsing #26476

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parsedquery): resolve conflict when aliases and tables have the same name #21535

fix(parsedquery): resolve conflict when aliases and tables have the same name #21535

victorarbuesmallada commented Sep 21, 2022 •

edited by betodealmeida

Loading

victorarbuesmallada commented Sep 21, 2022

codecov bot commented Sep 21, 2022 •

edited

Loading

giftig commented Apr 20, 2023 •

edited

Loading

michael-s-molina commented Apr 20, 2023

giftig commented Apr 25, 2023

eschutho commented Apr 28, 2023

rusackas commented Apr 28, 2023

giftig commented May 2, 2023 •

edited

Loading

giftig commented May 4, 2023 •

edited

Loading

giftig commented May 5, 2023

villebro left a comment

villebro May 9, 2023

giftig May 9, 2023

villebro May 9, 2023

villebro May 9, 2023 •

edited

Loading

giftig May 9, 2023

giftig May 9, 2023

giftig May 9, 2023 •

edited

Loading

giftig May 9, 2023 •

edited

Loading

villebro May 15, 2023

betodealmeida commented May 9, 2023

giftig commented May 15, 2023

villebro commented May 15, 2023

tobymao commented May 16, 2023

giftig commented Dec 19, 2023

fix(parsedquery): resolve conflict when aliases and tables have the same name #21535

fix(parsedquery): resolve conflict when aliases and tables have the same name #21535

Conversation

victorarbuesmallada commented Sep 21, 2022 • edited by betodealmeida Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

victorarbuesmallada commented Sep 21, 2022

codecov bot commented Sep 21, 2022 • edited Loading

Codecov Report

giftig commented Apr 20, 2023 • edited Loading

michael-s-molina commented Apr 20, 2023

giftig commented Apr 25, 2023

eschutho commented Apr 28, 2023

rusackas commented Apr 28, 2023

giftig commented May 2, 2023 • edited Loading

giftig commented May 4, 2023 • edited Loading

giftig commented May 5, 2023

villebro left a comment

Choose a reason for hiding this comment

villebro May 9, 2023

Choose a reason for hiding this comment

giftig May 9, 2023

Choose a reason for hiding this comment

villebro May 9, 2023

Choose a reason for hiding this comment

villebro May 9, 2023 • edited Loading

Choose a reason for hiding this comment

giftig May 9, 2023

Choose a reason for hiding this comment

giftig May 9, 2023

Choose a reason for hiding this comment

giftig May 9, 2023 • edited Loading

Choose a reason for hiding this comment

giftig May 9, 2023 • edited Loading

Choose a reason for hiding this comment

villebro May 15, 2023

Choose a reason for hiding this comment

betodealmeida commented May 9, 2023

giftig commented May 15, 2023

villebro commented May 15, 2023

tobymao commented May 16, 2023

giftig commented Dec 19, 2023

victorarbuesmallada commented Sep 21, 2022 •

edited by betodealmeida

Loading

codecov bot commented Sep 21, 2022 •

edited

Loading

giftig commented Apr 20, 2023 •

edited

Loading

giftig commented May 2, 2023 •

edited

Loading

giftig commented May 4, 2023 •

edited

Loading

villebro May 9, 2023 •

edited

Loading

giftig May 9, 2023 •

edited

Loading

giftig May 9, 2023 •

edited

Loading