Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the efficiency of DbLog migration for Django #2949

Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented May 29, 2019

Fixes #2539

The old implementation was using Django's ORM to get the count and rows
of certain rows in the DbLog table. However, this was generating
extremely inefficient queries that made the migration run for days on
big databases. A similar problem was encountered with the original
SqlAlchemy implementation. That was solved in an earlier commit
80a0aa9117f399a5385a30ba0ce3e2afc7016a4e, which created a more
efficient query. We now replace the Django ORM solution with the same
custom SQL query as used for SqlAlchemy. This should significantly
improve the efficiency of this migration.

@sphuber
Copy link
Contributor Author

sphuber commented May 29, 2019

I tested this on the mat2D database and the DbLog migrations now runs in roughly 10 minutes. Good enough I would say.

The old implementation was using Django's ORM to get the count and rows
of certain rows in the `DbLog` table. However, this was generating
extremely inefficient queries that made the migration run for days on
big databases. A similar problem was encountered with the original
SqlAlchemy implementation. That was solved in an earlier commit
`80a0aa9117f399a5385a30ba0ce3e2afc7016a4e`, which created a more
efficient query. We now replace the Django ORM solution with the same
custom SQL query as used for SqlAlchemy. This should significantly
improve the efficiency of this migration.
@sphuber sphuber force-pushed the fix_2539_efficiency_dblog_migration_django branch from 8ca4f75 to 01b2eba Compare May 29, 2019 15:37
@sphuber sphuber changed the title [BLOCKED]Improve the efficiency of DbLog migration for Django Improve the efficiency of DbLog migration for Django May 29, 2019
@giovannipizzi
Copy link
Member

@sphuber just double checking that you also verified that the DbLogs are still there and correct in the mat2D, before merging

@sphuber
Copy link
Contributor Author

sphuber commented May 29, 2019

I did, but it is hard to say. The unit tests should cover this. I forgot to note the number of rows in the original database for the DbLog table (did do this for a bunch of other tables). After migration there are still 220.000 logs remaining and some have been written to the two JSON dumps. But cannot say for certain that everything is there. Open to suggestions

@giovannipizzi
Copy link
Member

I think it's ok then (maybe just run a few verdi commands to get logs and check the they work). I agree that tests should ideally already cover this

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 72.713% when pulling 01b2eba on sphuber:fix_2539_efficiency_dblog_migration_django into c4eb8a0 on aiidateam:develop.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 72.713% when pulling 01b2eba on sphuber:fix_2539_efficiency_dblog_migration_django into c4eb8a0 on aiidateam:develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 72.713% when pulling 01b2eba on sphuber:fix_2539_efficiency_dblog_migration_django into c4eb8a0 on aiidateam:develop.

@sphuber sphuber merged commit 6da891f into aiidateam:develop May 29, 2019
@sphuber sphuber deleted the fix_2539_efficiency_dblog_migration_django branch May 29, 2019 20:40
@szoupanos
Copy link
Contributor

Thanks Seb! I also believe that if the tests pass and you did a basic check that everything is as it should, it should be OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementation of DbLog migration is extremely slow
4 participants