Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved the feed data for the lookup tables from csv file into src/conf… #67

Conversation

nargis-sultani
Copy link
Contributor

…ig file. Added custom tests for alembic migration

…ig file. Added custom tests for alembic migration
Copy link

github-actions bot commented Dec 21, 2023

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src
  config.py
Project Total  

This report was generated by python-coverage-comment-action

@nargis-sultani
Copy link
Contributor Author

closes #66

Copy link
Contributor

@jcadam14 jcadam14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me, just had the one comment/question on the feeds/seed data in config.py

src/config.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this was discussed already in the other PRs or with Le, but should the individual feed lists go into their corresponding migration scripts to keep the seed data with the creation of the table the data is going into? The config class seems to be specific to env configurations we pull from either env vars or .env files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it inside its individual scripts. That's the way I initially had it, but then I needed to access the same data for testing. So, I wanted to have them just in one place.

I can go ahead and move it inside the individual scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thought it might be that because of the comment in there. It's great that the tests check that the expected seed data is in tables so hopefully the tests can access the lists in the migrations (which is a little annoying to import each especially since the module names are horrific) or at the worst copy the list to the test testing the table.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still have them inside the individual scripts for data integrity, considering the small data set. For testing purposes, you can test just a small sample of the data; or create test resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lchen-2101 Got it, yeah makes more sense.

Nargis Sultani added 2 commits December 21, 2023 15:26
…sions/seed.py; renamed filenames and functions names that had 'feed' to 'seed'
@@ -0,0 +1,119 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run into the same problem having this file rather than having the csv's. These should live within the scripts themselves; the purpose of the alembic scripts with versions is that each time the upgrade is ran, the same data is populated; we have decide to update the seed data, and we run the script in a different environment, then the 2 environments' data would become out of sync. Let's have these data within each script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it was a big dataset, we should do the csv route, but version the file appropriately. i.e. address_state_7b6ff51002b5.csv; but since these are small, keeping them within the alembic script itself should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

data = get_feed_data_from_file("federal_regulator")

op.bulk_insert(FederalRegulatorDao.__table__, data)
op.bulk_insert(FederalRegulatorDao.__table__, federal_regulator_seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about keeping the table names hardcoded? same kind of data integrity concerns where as we iterate, if we decided to switch the table for the dao, the new script would succeed in the old environment at creating the new table; but it would fail in the new environment as the table would already exist after running the previous version table creation script.

Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the data from seed.py into the individual scripts, otherwise it creates the same problem as the csv files; where it becomes possible for two environments to have different data.

branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None

address_state_table = Base.metadata.tables.get("address_state")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this is still tied to the dao, correct? so if the dao has been updated to say "address_state_new" (hypothetical, do not actually do this); this would fail, correct?

Here's a way to decouple the two:

meta = MetaData()
meta.reflect(op.get_bind())
table = Table("address_state", meta)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into it.

Comment on lines 31 to 33
meta = MetaData()
meta.reflect(op.get_bind())
table = Table("federal_regulator", meta)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guessing u r in the process of replacing these with the get_table_by_name in utils? looks good otherwise, so ready to merge once the last changes are implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
Thanks

Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lchen-2101 lchen-2101 merged commit 9004692 into main Jan 4, 2024
3 checks passed
@lchen-2101 lchen-2101 deleted the feature/66_add_more_alembic_custom_tests_and_add_seed_data_inside_script branch January 4, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more custom tests for Alembic and add seed data in the scripts for the lookup tables
3 participants