Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema refresh review fixes. #931

Merged
merged 1 commit into from
Apr 4, 2019
Merged

Conversation

jezdez
Copy link

@jezdez jezdez commented Apr 4, 2019

Just a minor follow-up to #930.

@jezdez jezdez requested a review from emtwo April 4, 2019 12:19
@@ -55,7 +55,7 @@ class NotSupported(Exception):
class BaseQueryRunner(object):
noop_query = None
configuration_properties = None
data_sample_query = None
sample_query = None
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to stay consistent with the noop_query name.

@@ -232,12 +232,14 @@ def cleanup_query_results():
models.db.session.commit()
logger.info("Deleted %d unused query results.", deleted_count)


Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stay with PEP8 and do two linebreaks between functions.

def cleanup_data_in_table(table_model):
removed_metadata = table_model.query.filter(
table_model.exists == False,
table_model.exists.is_(False),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also something that flake8 reported in my editor.

if is_old_data:
table_model.query.filter(
table_model.id == removed_metadata_row.id,
).delete()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioneded on IRC this function should do the cleanup completely on the db side to prevent having to fetch the data to delete it afterwards. In my experience such cleanup function tend to break with race conditions if the list of things to delete exceeds memory or runtime limits.

# Update all persisted tables that exist to reflect this.
persisted_tables = TableMetadata.query.filter(
TableMetadata.name.in_(tuple(existing_tables_set)),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No needs to convert the set to a tuple for IN queries.

TableMetadata.data_source_id == ds.id,
).all()

for j, table in enumerate(all_existing_persisted_tables):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No enumerate needed here.

existing_columns_set = set()

# Clear the set for the next round
existing_columns_set.clear()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think clearing the set was the purpose of this right?

Copy link

@emtwo emtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jannis!

@emtwo emtwo merged commit 5d03b54 into master Apr 4, 2019
@jezdez jezdez deleted the schema-refresh-improvement-review branch April 15, 2019 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants