Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved the feed data for the lookup tables from csv file into src/conf… #67

Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added db_revisions/__init__.py
Empty file.
58 changes: 0 additions & 58 deletions db_revisions/feed/address_state.csv

This file was deleted.

8 changes: 0 additions & 8 deletions db_revisions/feed/federal_regulator.csv

This file was deleted.

19 changes: 0 additions & 19 deletions db_revisions/feed/hmda_institution_type.csv

This file was deleted.

14 changes: 0 additions & 14 deletions db_revisions/feed/sbl_institution_type.csv

This file was deleted.

119 changes: 119 additions & 0 deletions db_revisions/seed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run into the same problem having this file rather than having the csv's. These should live within the scripts themselves; the purpose of the alembic scripts with versions is that each time the upgrade is ran, the same data is populated; we have decide to update the seed data, and we run the script in a different environment, then the 2 environments' data would become out of sync. Let's have these data within each script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it was a big dataset, we should do the csv route, but version the file appropriately. i.e. address_state_7b6ff51002b5.csv; but since these are small, keeping them within the alembic script itself should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

START seed data for the lookup tables:
address_state
federal_regulator
hmda_institution_type
sbl_institution_type
These are accessed in db_revisions/versions/* scripts and in test/migrations/test_lookup_tables_data_seed .
"""

address_state_seed = [
{"code": "AL", "name": "Alabama"},
{"code": "AK", "name": "Alaska"},
{"code": "AZ", "name": "Arizona"},
{"code": "AR", "name": "Arkansas"},
{"code": "CA", "name": "California"},
{"code": "CO", "name": "Colorado"},
{"code": "CT", "name": "Connecticut"},
{"code": "DE", "name": "Delaware"},
{"code": "FL", "name": "Florida"},
{"code": "GA", "name": "Georgia"},
{"code": "HI", "name": "Hawaii"},
{"code": "ID", "name": "Idaho"},
{"code": "IL", "name": "Illinois"},
{"code": "IN", "name": "Indiana"},
{"code": "IA", "name": "Iowa"},
{"code": "KS", "name": "Kansas"},
{"code": "KY", "name": "Kentucky"},
{"code": "LA", "name": "Louisiana"},
{"code": "ME", "name": "Maine"},
{"code": "MD", "name": "Maryland"},
{"code": "MA", "name": "Massachusetts"},
{"code": "MI", "name": "Michigan"},
{"code": "MN", "name": "Minnesota"},
{"code": "MS", "name": "Mississippi"},
{"code": "MO", "name": "Missouri"},
{"code": "MT", "name": "Montana"},
{"code": "NE", "name": "Nebraska"},
{"code": "NV", "name": "Nevada"},
{"code": "NH", "name": "New Hampshire"},
{"code": "NJ", "name": "New Jersey"},
{"code": "NM", "name": "New Mexico"},
{"code": "NY", "name": "New York"},
{"code": "NC", "name": "North Carolina"},
{"code": "ND", "name": "North Dakota"},
{"code": "OH", "name": "Ohio"},
{"code": "OK", "name": "Oklahoma"},
{"code": "OR", "name": "Oregon"},
{"code": "PA", "name": "Pennsylvania"},
{"code": "RI", "name": "Rhode Island"},
{"code": "SC", "name": "South Carolina"},
{"code": "SD", "name": "South Dakota"},
{"code": "TN", "name": "Tennessee"},
{"code": "TX", "name": "Texas"},
{"code": "UT", "name": "Utah"},
{"code": "VT", "name": "Vermont"},
{"code": "VA", "name": "Virginia"},
{"code": "WA", "name": "Washington"},
{"code": "WV", "name": "West Virginia"},
{"code": "WI", "name": "Wisconsin"},
{"code": "WY", "name": "Wyoming"},
{"code": "DC", "name": "District of Columbia"},
{"code": "AS", "name": "American Samoa"},
{"code": "GU", "name": "Guam"},
{"code": "MP", "name": "Northern Mariana Islands"},
{"code": "PR", "name": "Puerto Rico"},
{"code": "UM", "name": "United States Minor Outlying Islands"},
{"code": "VI", "name": "Virgin Islands, U.S."},
]

federal_regulator_seed = [
{"id": "FCA", "name": "Farm Credit Administration"},
{"id": "FDIC", "name": "Federal Deposit Insurance Corporation"},
{"id": "FHFA", "name": "Federal Housing Finance Agency"},
{"id": "FRS", "name": "Federal Reserve System"},
{"id": "NCUA", "name": "National Credit Union Administration"},
{"id": "OCC", "name": "Office of the Comptroller of the Currency"},
{"id": "OTS", "name": "Office of Thrift Supervision (only valid until July 21, 2011)"},
]

sbl_institution_type_seed = [
{"id": "1", "name": "Bank or savings association."},
{"id": "2", "name": "Minority depository institution."},
{"id": "3", "name": "Credit union."},
{"id": "4", "name": "Nondepository institution."},
{"id": "5", "name": "Community development financial institution (CDFI)."},
{"id": "6", "name": "Other nonprofit financial institution."},
{"id": "7", "name": "Farm Credit System institution."},
{"id": "8", "name": "Government lender."},
{"id": "9", "name": "Commercial finance company."},
{"id": "10", "name": "Equipment finance company."},
{"id": "11", "name": "Industrial loan company."},
{"id": "12", "name": "Online lender."},
{"id": "13", "name": "Other"},
]

hmda_institution_type_seed = [
{"id": "1", "name": "National Bank (OCC supervised)"},
{"id": "2", "name": "State Member Bank (FRS Supervised)"},
{"id": "3", "name": "State non-member bank (FDIC supervised)"},
{"id": "4", "name": "State Chartered Thrift (FDIC supervised)"},
{"id": "5", "name": "Federal Chartered Thrift (OCC supervised)"},
{"id": "6", "name": "Credit Union (NCUA supervised)"},
{"id": "7", "name": "Federal Branch or Agency of Foreign Banking Organization (FBO)"},
{"id": "8", "name": "Branch or Agency of FBO (FRS supervised)"},
{"id": "9", "name": "MBS of national Bank (OCC supervised)"},
{"id": "10", "name": "MBS of state member bank (FRS supervised)"},
{"id": "11", "name": "MBS of state non-member bank (FDIC supervised)"},
{"id": "12", "name": "MBS of Bank Holding Company (BHC) (FRS supervised)"},
{"id": "13", "name": "MBS of credit union (NCUA supervised)"},
{"id": "14", "name": "independent MBS, no depository affiliation"},
{"id": "15", "name": "MBS of Savings and Loan Holding Co"},
{"id": "16", "name": "MBS of state chartered Thrift"},
{"id": "17", "name": "MBS of federally chartered thrift (OCC supervised)"},
{"id": "18", "name": "Affiliate of depository institution. MBS is in the same ownership org as a depository."},
]

"""
END seed data for the lookup tables:
"""
13 changes: 0 additions & 13 deletions db_revisions/utils.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
from alembic import op
from sqlalchemy import engine_from_config
import sqlalchemy
from csv import DictReader
import os


def table_exists(table_name):
Expand All @@ -13,14 +11,3 @@ def table_exists(table_name):
inspector = sqlalchemy.inspect(engine)
tables = inspector.get_table_names()
return table_name in tables


def get_feed_data_from_file(table_name):
file_dir = os.path.dirname(os.path.realpath(__file__))
data_file_path = f"{file_dir}/feed/%s.csv" % table_name
data_file = open(data_file_path, "r")
reader = DictReader(data_file, delimiter="|")
output_list = list()
for dictionary in reader:
output_list.append(dictionary)
return output_list
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,9 @@ def downgrade() -> None:
op.drop_constraint(
constraint_name="fk_federal_regulator_financial_institutions", table_name="financial_institutions"
)
op.drop_constraint(constraint_name="fk_address_state_financial_institutions", table_name="financial_institutions")
op.drop_constraint(
constraint_name="fk_address_state_code_financial_institutions", table_name="financial_institutions"
)
op.drop_constraint(
constraint_name="fk_hmda_institution_type_financial_institutions", table_name="financial_institutions"
)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Feed Federal Regulator table
"""Seed Federal Regulator table

Revision ID: 26a742d97ad9
Revises: 7b6ff51002b5
Expand All @@ -7,8 +7,8 @@
"""
from typing import Sequence, Union
from alembic import op
from db_revisions.utils import get_feed_data_from_file
from entities.models import FederalRegulatorDao
from db_revisions.seed import federal_regulator_seed


# revision identifiers, used by Alembic.
Expand All @@ -19,9 +19,7 @@


def upgrade() -> None:
data = get_feed_data_from_file("federal_regulator")

op.bulk_insert(FederalRegulatorDao.__table__, data)
op.bulk_insert(FederalRegulatorDao.__table__, federal_regulator_seed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about keeping the table names hardcoded? same kind of data integrity concerns where as we iterate, if we decided to switch the table for the dao, the new script would succeed in the old environment at creating the new table; but it would fail in the new environment as the table would already exist after running the previous version table creation script.


def downgrade() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Feed Address State table
"""Seed Address State table

Revision ID: 7b6ff51002b5
Revises: 045aa502e050
Expand All @@ -7,8 +7,8 @@
"""
from typing import Sequence, Union
from alembic import op
from db_revisions.utils import get_feed_data_from_file
from entities.models import AddressStateDao
from db_revisions.seed import address_state_seed


# revision identifiers, used by Alembic.
Expand All @@ -19,9 +19,7 @@


def upgrade() -> None:
data = get_feed_data_from_file("address_state")

op.bulk_insert(AddressStateDao.__table__, data)
op.bulk_insert(AddressStateDao.__table__, address_state_seed)


def downgrade() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Feed SBL Institution Type table
"""Seed SBL Institution Type table

Revision ID: a41281b1e109
Revises: f4ff7d1aa6df
Expand All @@ -7,8 +7,8 @@
"""
from typing import Sequence, Union
from alembic import op
from db_revisions.utils import get_feed_data_from_file
from entities.models import SBLInstitutionTypeDao
from db_revisions.seed import sbl_institution_type_seed

# revision identifiers, used by Alembic.
revision: str = "a41281b1e109"
Expand All @@ -18,9 +18,7 @@


def upgrade() -> None:
data = get_feed_data_from_file("sbl_institution_type")

op.bulk_insert(SBLInstitutionTypeDao.__table__, data)
op.bulk_insert(SBLInstitutionTypeDao.__table__, sbl_institution_type_seed)


def downgrade() -> None:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Feed HMDA Institution Type table
"""Seed HMDA Institution Type table

Revision ID: f4ff7d1aa6df
Revises: 26a742d97ad9
Expand All @@ -7,8 +7,8 @@
"""
from typing import Sequence, Union
from alembic import op
from db_revisions.utils import get_feed_data_from_file
from entities.models import HMDAInstitutionTypeDao
from db_revisions.seed import hmda_institution_type_seed


# revision identifiers, used by Alembic.
Expand All @@ -19,9 +19,7 @@


def upgrade() -> None:
data = get_feed_data_from_file("hmda_institution_type")

op.bulk_insert(HMDAInstitutionTypeDao.__table__, data)
op.bulk_insert(HMDAInstitutionTypeDao.__table__, hmda_institution_type_seed)


def downgrade() -> None:
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ pytest-alembic = "^0.10.7"

[tool.pytest.ini_options]
asyncio_mode = "auto"
pythonpath = ["src"]
pythonpath = ["src","db_revisions"]
addopts = [
"--cov-report=term-missing",
"--cov-branch",
Expand Down
1 change: 1 addition & 0 deletions src/config.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this was discussed already in the other PRs or with Le, but should the individual feed lists go into their corresponding migration scripts to keep the seed data with the creation of the table the data is going into? The config class seems to be specific to env configurations we pull from either env vars or .env files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it inside its individual scripts. That's the way I initially had it, but then I needed to access the same data for testing. So, I wanted to have them just in one place.

I can go ahead and move it inside the individual scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thought it might be that because of the comment in there. It's great that the tests check that the expected seed data is in tables so hopefully the tests can access the lists in the migrations (which is a little annoying to import each especially since the module names are horrific) or at the worst copy the list to the test testing the table.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still have them inside the individual scripts for data integrity, considering the small data set. For testing purposes, you can test just a small sample of the data; or create test resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lchen-2101 Got it, yeah makes more sense.

Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from pydantic.types import SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict


JWT_OPTS_PREFIX = "jwt_opts_"

env_files_to_load = [".env"]
Expand Down
Loading
Loading