-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: speed up uuid column generation #11209
Conversation
5c1155b
to
dcd4c79
Compare
Codecov Report
@@ Coverage Diff @@
## master #11209 +/- ##
==========================================
- Coverage 65.65% 61.47% -4.18%
==========================================
Files 829 829
Lines 39213 39244 +31
Branches 3593 3593
==========================================
- Hits 25744 24126 -1618
- Misses 13357 14937 +1580
- Partials 112 181 +69
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
50428bc
to
d76af43
Compare
@@ -71,20 +74,31 @@ class ImportMixin: | |||
|
|||
models["dashboards"].position_json = sa.Column(utils.MediumText()) | |||
|
|||
default_batch_size = int(os.environ.get("BATCH_SIZE", 200)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the bigger the better. I tested many different numbers (100, 200, 500, 1000, 2000) and 200 seems to be working the best (but obviously this will depend on the machine so providing a way to override it via env variables).
with op.batch_alter_table(model) as batch_op: | ||
batch_op.drop_constraint(f"uq_{table_name}_uuid") | ||
with op.batch_alter_table(model.__tablename__) as batch_op: | ||
batch_op.drop_constraint(f"uq_{table_name}_uuid", type_="unique") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MySQL will throw an error if type_
is not specified.
c63c1af
to
96e3637
Compare
5090b41
to
9162f28
Compare
9162f28
to
a2a86e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, Jessie! Thanks for improving the perf and making the migration more resilient!
got a bit too late to this pr, thanks @ktmud for the fix! |
SUMMARY
We have tens of thousands of dashboards, more than 200k slices, and 1.3 million table columns in our Superset deployment. It takes forever to run the db migration for #11098 . This PR tries to speed up the migration process by
I also tried to utilize
ThreadPoolExecutor
to parallelize the db operations and Python uuid generation, but it doesn't seem to help much.Also fixes certain API errors while downgrading in MySQL and added the option to adjust batch size from command line.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TEST PLAN
Make a copy of your very large database, then try:
ADDITIONAL INFORMATION