Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure table uniqueness on update #15909

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Jul 27, 2021

SUMMARY

The DAO model which @dpgaspar, @villebro, et al. added goes a long way to making #5449 obsolete, i.e., it has the necessary existence checks when creating or updating a table.

There are however at least two issues,

  1. The Datasource Editor still uses the legacy /datasource/save/ endpoint (per here) rather than the DAO model and so the existence logic is not invoked on update.

  2. The legacy CRUD Editor does not use the DAO model.

meaning it is currently possible to rename a dataset resulting in multiple datasets with the same schema and table name.

Though these issues should be resolved once we fully deprecate the CRUD editor and migrate over to the DAO model, I felt it was prudent to add an interim check which enforces uniqueness. Given then various vectors I thought the simplest and safest path was to leverage the approach in #5449 and add a SQLAlchemy before update listener. Note although the UX may not be ideal, my sense is it is suffice as an interim measure.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Editing/Updating via the Datasource Editor

Screen Shot 2021-07-27 at 8 48 11 AM

Editing/Updating via the legacy CRUD Editor

Though this UX is not ideal, i.e., it routes you to explorer, the dataset is not updated and there is messaging indicating that the dataset already exists.

Screen Shot 2021-07-27 at 8 48 36 AM

TESTING INSTRUCTIONS

CI and manual testing.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley force-pushed the john-bodley--table-uniqueness branch from 74a6352 to 915c28c Compare July 27, 2021 15:37
@john-bodley john-bodley marked this pull request as ready for review July 27, 2021 16:04
@john-bodley john-bodley requested a review from a team as a code owner July 27, 2021 16:04
@john-bodley john-bodley force-pushed the john-bodley--table-uniqueness branch 9 times, most recently from b8a102c to 3735c81 Compare July 27, 2021 22:43
@@ -254,13 +245,31 @@ def get_slice(
return slc

@staticmethod
def get_table_by_name(name: str) -> SqlaTable:
return get_table_by_name(name)
def get_table(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup to ensure consistency.



class TestDatasource(SupersetTestCase):
def setUp(self):
self.original_attrs = {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really have no idea why this logic is here and the before_update method exposed its fragility. Rather than using instance variables to save state and restore state, using a sub-transaction and rolling back seems considerably cleaner.

@@ -168,25 +158,21 @@ def test_save(self):
else:
self.assertEqual(resp[k], datasource_post[k])

def save_datasource_from_dict(self, datasource_dict):
Copy link
Member Author

@john-bodley john-bodley Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. The local variable was never used but instead the global variable was. Seems like a mistake.

data = dict(data=json.dumps(datasource_post))
resp = self.get_json_resp("/datasource/save/", data)
return resp

@pytest.mark.usefixtures("load_birth_names_dashboard_with_slices")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how this test worked without first loading the example data.


datasource_post = get_datasource_post()
datasource_post["id"] = tbl.id
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! This was updating the global variable.

@@ -42,11 +42,6 @@
from tests.integration_tests.test_app import app


def get_table_by_name(name: str) -> SqlaTable:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be redefined. Adhering to the DRY principle.

@john-bodley john-bodley force-pushed the john-bodley--table-uniqueness branch 2 times, most recently from 0a7a81e to 9175b6d Compare July 27, 2021 23:17
@@ -351,13 +345,6 @@ def test_import_2_slices_for_same_table(self):
self.assert_slice_equals(slc_2, imported_slc_2)
self.assertEqual(imported_slc_2.datasource.perm, imported_slc_2.perm)

def test_import_slices_for_non_existent_table(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the purpose of this test was. If the table doesn't exist the get_table(...) precursor method with throw.

@john-bodley john-bodley force-pushed the john-bodley--table-uniqueness branch 2 times, most recently from 9a8e25c to f76f2b5 Compare July 28, 2021 00:02
@codecov
Copy link

codecov bot commented Jul 28, 2021

Codecov Report

Merging #15909 (444ac0d) into master (39db6a7) will increase coverage by 0.19%.
The diff coverage is 54.54%.

❗ Current head 444ac0d differs from pull request most recent head c08f541. Consider uploading reports for the commit c08f541 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15909      +/-   ##
==========================================
+ Coverage   76.69%   76.88%   +0.19%     
==========================================
  Files         995      995              
  Lines       52774    52797      +23     
  Branches     6691     6695       +4     
==========================================
+ Hits        40475    40594     +119     
+ Misses      12074    11978      -96     
  Partials      225      225              
Flag Coverage Δ
hive 81.32% <93.33%> (?)
mysql 81.61% <93.33%> (+<0.01%) ⬆️
postgres 81.63% <93.33%> (+<0.01%) ⬆️
presto 81.42% <93.33%> (?)
python 82.15% <93.33%> (+0.42%) ⬆️
sqlite 81.28% <93.33%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...frontend/src/dashboard/components/Header/index.jsx 63.74% <14.28%> (-1.74%) ⬇️
...tend/src/explore/components/ExploreChartHeader.jsx 56.33% <14.28%> (-3.96%) ⬇️
.../ReportModal/HeaderReportActionsDropdown/index.tsx 25.71% <33.33%> (+0.71%) ⬆️
superset/connectors/sqla/models.py 89.73% <92.85%> (+1.73%) ⬆️
...perset-frontend/src/addSlice/AddSliceContainer.tsx 78.78% <100.00%> (+0.66%) ⬆️
superset/models/slice.py 86.18% <100.00%> (ø)
superset-frontend/src/reports/actions/reports.js 10.86% <0.00%> (-10.87%) ⬇️
superset-frontend/src/components/Select/Select.tsx 73.05% <0.00%> (-1.37%) ⬇️
superset/common/query_context.py 90.74% <0.00%> (-0.47%) ⬇️
superset/utils/core.py 88.30% <0.00%> (+0.12%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39db6a7...c08f541. Read the comment docs.

@john-bodley john-bodley force-pushed the john-bodley--table-uniqueness branch from f76f2b5 to cade3cb Compare July 28, 2021 00:57
# not exist in the migrations, but is required by `import_from_dict` to ensure the
# correct filters are applied in order to identify uniqueness.
#
# The reason it does not physically exist is MySQL, PostgreSQL have a different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar nit: ...is MySQL and PostgreSQL

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with one nit. thanks for cleaning up this edge case

@@ -53,9 +53,9 @@
logger = logging.getLogger(__name__)


class Slice(
class Slice( # pylint: disable=too-many-instance-attributes,too-many-public-methods
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why pylint was complaining on this file as it was unchanged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also faced this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. :-P

@john-bodley john-bodley merged commit c0615c5 into apache:master Aug 2, 2021
@john-bodley john-bodley deleted the john-bodley--table-uniqueness branch August 2, 2021 19:45
@exemplary-citizen exemplary-citizen mentioned this pull request Aug 3, 2021
8 tasks
opus-42 pushed a commit to opus-42/incubator-superset that referenced this pull request Nov 14, 2021
* fix: Ensure table uniqueness on update

* Update models.py

* Update slice.py

* Update datasource_tests.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
cccs-RyanS pushed a commit to CybercentreCanada/superset that referenced this pull request Dec 17, 2021
* fix: Ensure table uniqueness on update

* Update models.py

* Update slice.py

* Update datasource_tests.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
* fix: Ensure table uniqueness on update

* Update models.py

* Update slice.py

* Update datasource_tests.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
* fix: Ensure table uniqueness on update

* Update models.py

* Update slice.py

* Update datasource_tests.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.3.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 1.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants