Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: alter sql columns to long text #7463

Merged

Conversation

khtruong
Copy link
Contributor

@khtruong khtruong commented May 7, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

There were issues executing long queries because we saved the strings to text columns in the query table. The queries would be truncated based on the storage size of the text column (i.e. Text in mysql can hold 65k characters - 64 KB). I have added a migration script to update the columns to hold longer strings.

TEST PLAN

Tested that the columns were long text columns in a mysql table after migration.

REVIEWERS

@betodealmeida @xtinec @DiggidyDave

def upgrade():
with op.batch_alter_table('query') as batch_op:
batch_op.alter_column(
'sql', existing_type=sa.Text, type_=sa.Text(2 ** 32 - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, hooray for CI... postgresql is failing to migrate, looks like it has a 1GB limit on text. https://www.postgresql.org/docs/9.1/datatype-character.html

unclear to me if that GB is of the 2^30 or 10^9 variety... the latter/smaller one is probably enough for most people 🤣

batch_op.alter_column(
'executed_sql', existing_type=sa.Text, type_=sa.Text(length=text_len))
except:
# Many databases do not have a length on text objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer having a MySQL-specific if block then a try silencing all errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually after reading https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.Text I'm not sure either way. It's probably ok as is since catching a more specific exception might be hard because driver-specific...

Copy link
Contributor

@DiggidyDave DiggidyDave May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel uncomfortable leaving a migration in an indeterminate state thought... this is a bit ugly but since there is divergent behavior between dbs for this column type, what about:

try:
   do_upgrade(with_limit=True)
except:
  do_upgrade(with_limit=False)

# if it fails after that, ok...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, are you suggesting that if it errors out then it didn't wan't/need the migration in the first place? That makes sense to me I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if it errors out, it doesn't need the migration in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought for the record: If we get rid of the batch_alter_table, it will fail after the first one and we will skip it gracefully. I think that's what we want. Not sure we need the batch here. A failure will fail as a batch anyway. Some really weird degenerate case would be if it is a mysql db and gets through 1 or 2 then the db crashes leaving one unmigrated. But that is not a catastrophic bad state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up making the migration script only update the query table for mysql dbs. I know that it is very specific to Lyft's situation but I felt that my original implementation was not a good solution since every db has different requirements and data storage sizes for Text columns. So I don't think it is a good idea for me to try to set an arbitrary size and I wouldn't be surprised if it messed up for some other dbs.

@codecov-io
Copy link

Codecov Report

Merging #7463 into lyft-release-sp8 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@                Coverage Diff                @@
##           lyft-release-sp8    #7463   +/-   ##
=================================================
  Coverage             64.95%   64.95%           
=================================================
  Files                   433      433           
  Lines                 21357    21357           
  Branches               2352     2352           
=================================================
  Hits                  13873    13873           
  Misses                 7356     7356           
  Partials                128      128

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae409d6...35d41f5. Read the comment docs.

Copy link
Contributor

@DiggidyDave DiggidyDave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is targeting the one db type that we KNOW has an issue, and is avoiding others that we know might actively reject the migration, this seems sane to me.

DiggidyDave pushed a commit to lyft/incubator-superset that referenced this pull request May 8, 2019
Merge tip of lyft-release-sp8
DiggidyDave pushed a commit to lyft/incubator-superset that referenced this pull request May 8, 2019
Merge lyft-develop to lyft-master
@betodealmeida
Copy link
Member

+1

@betodealmeida betodealmeida merged commit 7bfe7bc into apache:lyft-release-sp8 May 8, 2019
khtruong added a commit to lyft/incubator-superset that referenced this pull request May 8, 2019
Merge lyft-release-sp8@7bfe7bc to master
betodealmeida pushed a commit that referenced this pull request May 8, 2019
Merge lyft-release-sp8@7bfe7bc to master
DiggidyDave added a commit to lyft/incubator-superset that referenced this pull request May 8, 2019
DiggidyDave added a commit to lyft/incubator-superset that referenced this pull request May 8, 2019
xtinec pushed a commit that referenced this pull request May 10, 2019
* filter out all nan series (#7313)

* improve not rich tooltip (#7345)

* Create issue_label_bot.yaml (#7341)

* fix: do not save colors without a color scheme (#7347)

* [wtforms] Strip leading/trailing whitespace (#7084)

* [schema] Updating the datasources schema (#5451)

* limit tables/views returned if schema is not provided (#7358)

* limit tables/views returned if schema is not provided

* fix typo

* improve code performance

* handle the case when table name or view name does not present a schema

* Add type anno (#7342)

* Updated local dev instructions to include missing step

* First pass at type annotations

* [schema] Updating the base column schema (#5452)

* Update 937d04c16b64_update_datasources.py (#7361)

* Feature flag for client cache (#7348)

* Feature flag for client cache

* Fix integration test

* Revert "Fix integration test"

This reverts commit 58434ab.

* Feature flag for client cache

* Fix integration tests

* Add feature flag to config.py

* Add another feature check

* Fix more integration tests

* Fix raw HTML in SliceAdder (#7338)

* remove backendSync.json (#7331)

* [bubbles] issue when using duplicated metrics (#7087)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359)

* SUPERSET-8: Update text in docs copyright footer (#7360)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04

* SUPERSET-8: Extra text in docs copyright footer

* [schema] Adding commits and removing unnecessary foreign-key definitions (#7371)

*  Store last selected dashboard in sessionStorage (#7181)

* Store last selected dashboard in sessionStorage

* Fix tests

* [schema] Updating the base metric schema (#5453)

* Fix NoneType bug & fill the test recipients with original recipients if empty (#7365)

* Added living goods as among the users of Superset (#7407)

* Added living goods as among the users of Superset

Living Goods is a non profit organisation with operation in africa and the middle east. We work in community health use data heavily on day to day. Superset is our platform of choice for dashboards.

* Update README.md

* [dashboard] allow user re-order top-level tabs (#7390)

* [SQL Lab] Increase timeout threshold for offline check (#7411)

* Bump FAB to 2.0.0 (#7323)

* Bump FAB to 2.0.0

* [tests] whitelist SecurityApi login and refresh endpoints

* [style] Fix, C812 missing trailing commas

* [security] Remove SUPERSET_UPDATE_PERMS flag

Registering sources needs to be performed after the views are
initialized on UPDATE_PERMS=False configuration

* [docs] New, FAB_UPDATE_PERMS and flask fab cli

* [docs] Fix, db upgrade needs to come first, create-admin needs a db

* [cli] New, superset init bootstraps all permissions for FAB and Superset

* [style] Fix, flakes

* [annotations] Improves UX on annotation validation, start_dttm, end_dttm (#7326)

* Setting renderTrigger on label_colors (#7410)

* Refactor out controlUtils.js module + unit tests (#7350)

* [WiP]refactor out a controlUtils.js file

* unit tests

* add missing license

* Addressing comments

* feature: see Presto row and array data types (#7413)

* Merge lastest from master into lyft-release-sp8 (#7405)

* filter out all nan series (#7313)

* improve not rich tooltip (#7345)

* Create issue_label_bot.yaml (#7341)

* fix: do not save colors without a color scheme (#7347)

* [wtforms] Strip leading/trailing whitespace (#7084)

* [schema] Updating the datasources schema (#5451)

* limit tables/views returned if schema is not provided (#7358)

* limit tables/views returned if schema is not provided

* fix typo

* improve code performance

* handle the case when table name or view name does not present a schema

* Add type anno (#7342)

* Updated local dev instructions to include missing step

* First pass at type annotations

* [schema] Updating the base column schema (#5452)

* Update 937d04c16b64_update_datasources.py (#7361)

* Feature flag for client cache (#7348)

* Feature flag for client cache

* Fix integration test

* Revert "Fix integration test"

This reverts commit 58434ab.

* Feature flag for client cache

* Fix integration tests

* Add feature flag to config.py

* Add another feature check

* Fix more integration tests

* Fix raw HTML in SliceAdder (#7338)

* remove backendSync.json (#7331)

* [bubbles] issue when using duplicated metrics (#7087)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359)

* SUPERSET-8: Update text in docs copyright footer (#7360)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04

* SUPERSET-8: Extra text in docs copyright footer

* [schema] Adding commits and removing unnecessary foreign-key definitions (#7371)

*  Store last selected dashboard in sessionStorage (#7181)

* Store last selected dashboard in sessionStorage

* Fix tests

* [schema] Updating the base metric schema (#5453)

* Fix NoneType bug & fill the test recipients with original recipients if empty (#7365)

* feat: see Presto row and array data types (#7391)

* feat: see Presto row and array data types

* fix: address PR comments

* fix: lint and build issues

* fix: add types

* add stronger type hints where possible

* fix: lint issues and add select_star func in Hive

* add missing pkg init

* fix: build issues

* fix: pylint issues

* fix: use logging instead of print

* Removed --console-log and superset runserver (#7421)

* Fixes dashboard export button missing download and #7353 (#7427)

* Added additional German translations to string file (#6604)

* Added additional German translations to string file

Updates to German translation files as per directions

* Removed messages.json

* [fix] Fixing SQL parsing issue (#7374)

* add chinese translate (#7402)

* Quick fix to address deadlock issue (#7434)

* feat: view presto row objects in data grid (#7445)

* Merge lastest from master into lyft-release-sp8 (#7405)

* filter out all nan series (#7313)

* improve not rich tooltip (#7345)

* Create issue_label_bot.yaml (#7341)

* fix: do not save colors without a color scheme (#7347)

* [wtforms] Strip leading/trailing whitespace (#7084)

* [schema] Updating the datasources schema (#5451)

* limit tables/views returned if schema is not provided (#7358)

* limit tables/views returned if schema is not provided

* fix typo

* improve code performance

* handle the case when table name or view name does not present a schema

* Add type anno (#7342)

* Updated local dev instructions to include missing step

* First pass at type annotations

* [schema] Updating the base column schema (#5452)

* Update 937d04c16b64_update_datasources.py (#7361)

* Feature flag for client cache (#7348)

* Feature flag for client cache

* Fix integration test

* Revert "Fix integration test"

This reverts commit 58434ab.

* Feature flag for client cache

* Fix integration tests

* Add feature flag to config.py

* Add another feature check

* Fix more integration tests

* Fix raw HTML in SliceAdder (#7338)

* remove backendSync.json (#7331)

* [bubbles] issue when using duplicated metrics (#7087)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359)

* SUPERSET-8: Update text in docs copyright footer (#7360)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04

* SUPERSET-8: Extra text in docs copyright footer

* [schema] Adding commits and removing unnecessary foreign-key definitions (#7371)

*  Store last selected dashboard in sessionStorage (#7181)

* Store last selected dashboard in sessionStorage

* Fix tests

* [schema] Updating the base metric schema (#5453)

* Fix NoneType bug & fill the test recipients with original recipients if empty (#7365)

* feat: see Presto row and array data types (#7391)

* feat: see Presto row and array data types

* fix: address PR comments

* fix: lint and build issues

* fix: add types

* Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415)

* add stronger type hints where possible

* fix: lint issues and add select_star func in Hive

* add missing pkg init

* fix: build issues

* fix: pylint issues

* fix: use logging instead of print

* feat: view presto row objects in data grid

* fix: address feedback

* fix: spacing

* feat: Scheduling queries from SQL Lab (#7416) (#7446)

* Merge lastest from master into lyft-release-sp8 (#7405)

* filter out all nan series (#7313)

* improve not rich tooltip (#7345)

* Create issue_label_bot.yaml (#7341)

* fix: do not save colors without a color scheme (#7347)

* [wtforms] Strip leading/trailing whitespace (#7084)

* [schema] Updating the datasources schema (#5451)

* limit tables/views returned if schema is not provided (#7358)

* limit tables/views returned if schema is not provided

* fix typo

* improve code performance

* handle the case when table name or view name does not present a schema

* Add type anno (#7342)

* Updated local dev instructions to include missing step

* First pass at type annotations

* [schema] Updating the base column schema (#5452)

* Update 937d04c16b64_update_datasources.py (#7361)

* Feature flag for client cache (#7348)

* Feature flag for client cache

* Fix integration test

* Revert "Fix integration test"

This reverts commit 58434ab.

* Feature flag for client cache

* Fix integration tests

* Add feature flag to config.py

* Add another feature check

* Fix more integration tests

* Fix raw HTML in SliceAdder (#7338)

* remove backendSync.json (#7331)

* [bubbles] issue when using duplicated metrics (#7087)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04 (#7359)

* SUPERSET-8: Update text in docs copyright footer (#7360)

* SUPERSET-7: Docker compose config version breaks on Ubuntu 16.04

* SUPERSET-8: Extra text in docs copyright footer

* [schema] Adding commits and removing unnecessary foreign-key definitions (#7371)

*  Store last selected dashboard in sessionStorage (#7181)

* Store last selected dashboard in sessionStorage

* Fix tests

* [schema] Updating the base metric schema (#5453)

* Fix NoneType bug & fill the test recipients with original recipients if empty (#7365)

* feat: see Presto row and array data types (#7391)

* feat: see Presto row and array data types

* fix: address PR comments

* fix: lint and build issues

* fix: add types

* Incorporate feedback from initial PR (prematurely merged to lyft-release-sp8) (#7415)

* add stronger type hints where possible

* fix: lint issues and add select_star func in Hive

* add missing pkg init

* fix: build issues

* fix: pylint issues

* fix: use logging instead of print

* feat: view presto row objects in data grid

* fix: address feedback

* fix: spacing

* Workaround for no results returned (#7442)

* feat: view presto row objects in data grid (#7436)

* feat: view presto row objects in data grid

* fix: address feedback

* fix: spacing

* feat: Scheduling queries from SQL Lab (#7416)

* Lightweight pipelines POC

* Add docs

* Minor fixes

* Remove Lyft URL

* Use enum

* Minor fix

* Fix unit tests

* Mark props as required

* feat: Add `validate_sql_json` endpoint for checking that a given sql query is valid for the chosen database (#7422) (#7462)

merge from lyft-release-sp8 to master

* Adds missing metric sum__SP_RUR_TOTL (#7452)

* Late import for optional lib pyhive (#7471)

* Late import for optional lib pyhive

* fix

* fix: calendar heatmap examples (#7375)

Fixing a set of examples that trip on ValueError vs TypeError

* bugfix: Improve support for special characters in schema and table names (#7297)

* Bugfix to SQL Lab to support tables and schemas with characters that require quoting

* Remove debugging prints

* Add uri encoding to secondary tables call

* Quote schema names for presto

* Quote selected_schema on Snowflake, MySQL and Hive

* Remove redundant parens

* Add python unit tests

* Add js unit test

* Fix flake8 linting error

* [dashboard] After update filter, trigger new queries when charts are visible (#7233)

* trigger query when chart is visible

* add integration test

* fix: alter sql columns to long text #7463 (#7476)

Merge lyft-release-sp8@7bfe7bc to master

* Refactor ConsoleLog (#7428)

* Revised Chinese translation (#7464)

* add chinese translate

* edit chinese translation

* druid connector: avoid using 'dimensions' for scan queries (#7377)

After the following PyDruid change (contained in version 0.5.2)
the Superset Histogram charts rendered with Druid data are
broken:

druid-io/pydruid@0a59a70

Bump the pydruid requirements accordingly in setup.py

Issue: #7368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants