-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moved the feed data for the lookup tables from csv file into src/conf… #67
Moved the feed data for the lookup tables from csv file into src/conf… #67
Conversation
…ig file. Added custom tests for alembic migration
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
closes #66 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me, just had the one comment/question on the feeds/seed data in config.py
src/config.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this was discussed already in the other PRs or with Le, but should the individual feed lists go into their corresponding migration scripts to keep the seed data with the creation of the table the data is going into? The config class seems to be specific to env configurations we pull from either env vars or .env files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it inside its individual scripts. That's the way I initially had it, but then I needed to access the same data for testing. So, I wanted to have them just in one place.
I can go ahead and move it inside the individual scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thought it might be that because of the comment in there. It's great that the tests check that the expected seed data is in tables so hopefully the tests can access the lists in the migrations (which is a little annoying to import each especially since the module names are horrific) or at the worst copy the list to the test testing the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still have them inside the individual scripts for data integrity, considering the small data set. For testing purposes, you can test just a small sample of the data; or create test resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lchen-2101 Got it, yeah makes more sense.
…sions/seed.py; renamed filenames and functions names that had 'feed' to 'seed'
db_revisions/seed.py
Outdated
@@ -0,0 +1,119 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run into the same problem having this file rather than having the csv's. These should live within the scripts themselves; the purpose of the alembic scripts with versions is that each time the upgrade is ran, the same data is populated; we have decide to update the seed data, and we run the script in a different environment, then the 2 environments' data would become out of sync. Let's have these data within each script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if it was a big dataset, we should do the csv route, but version the file appropriately. i.e. address_state_7b6ff51002b5.csv
; but since these are small, keeping them within the alembic script itself should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
data = get_feed_data_from_file("federal_regulator") | ||
|
||
op.bulk_insert(FederalRegulatorDao.__table__, data) | ||
op.bulk_insert(FederalRegulatorDao.__table__, federal_regulator_seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we talked about keeping the table names hardcoded? same kind of data integrity concerns where as we iterate, if we decided to switch the table for the dao, the new script would succeed in the old environment at creating the new table; but it would fail in the new environment as the table would already exist after running the previous version table creation script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the data from seed.py
into the individual scripts, otherwise it creates the same problem as the csv files; where it becomes possible for two environments to have different data.
branch_labels: Union[str, Sequence[str], None] = None | ||
depends_on: Union[str, Sequence[str], None] = None | ||
|
||
address_state_table = Base.metadata.tables.get("address_state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, this is still tied to the dao, correct? so if the dao has been updated to say "address_state_new" (hypothetical, do not actually do this); this would fail, correct?
Here's a way to decouple the two:
meta = MetaData()
meta.reflect(op.get_bind())
table = Table("address_state", meta)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into it.
meta = MetaData() | ||
meta.reflect(op.get_bind()) | ||
table = Table("federal_regulator", meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guessing u r in the process of replacing these with the get_table_by_name
in utils? looks good otherwise, so ready to merge once the last changes are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ig file. Added custom tests for alembic migration