Skip to content

Commit

Permalink
docs: how to create initial data scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
fyliu committed Mar 23, 2023
1 parent 03837ce commit 93b50d7
Show file tree
Hide file tree
Showing 2 changed files with 144 additions and 0 deletions.
143 changes: 143 additions & 0 deletions docs/how-to/create-initial-data-migrations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Create initial data scripts

## Overview

The goal is to convert our initial data into scripts that can be loaded into the database when the backend is set up for the first time.

These are the steps:

1. Export the data into JSON
1. Generate a python script from the JSON data
1. Create a Django migration

### Prerequisites

The initial data exists in a Google spreadsheet, such as [this one for People Depot][pd-data-spreadsheet]. There should be individual sheets named after the model names the data correspond to, such as `SOC Major - Data`. The sheet name is useful for us to identify the model it corresponds to.

The sheet should be formatted like so:

- the first row contains the names of the field names in the model. The names must be exactly the same
- rows 2 to n are the initial data for the model we want to turn into a script.


## Convert the data into JSON

1. Export the data from the Google [spreadsheet][pd-data-spreadsheet]
1. Find the sheet in the document containing the data to export. Let's use the `SOC Major - Data` data as our example. It's
1. Make sure that the first row (column names) is frozen. Otherwise, freeze it by selecting the first row in the sheet, then Menu > View > Freeze > Up to row 1
1. Export to JSON. Menu > Export JSON > Export JSON for this sheet
1. Save the JSON into a file
1. Select and copy all the JSON text
1. Paste it into a new file and save it as [ModelNameInPascalCase]_export.json under app/core/initial_data/
1. The Pascal case is important in the next step to generate a python script to insert the data. It must match the model's class name for this to work.

:::{admonition} **Potential data issue**
:class: caution
There was a problem with the JSON exporter where it omitted the underscore in `occ_code`. It should be fixed now but it's good to pay attention to other column name problems and fix them in the [Google Apps script][apps-script] in the [spreadsheet][pd-data-spreadsheet]. You will find out when the data insertion fails if there's a problem.
:::

## Convert JSON into python

1. Make sure the backend is running

```bash
./scripts/buildrun.sh
```

1. Go to the project root and run this command

```bash
docker-compose exec web python scripts/convert.py core/initial_data/SOCMajor_export.json
```

1. Check that there's a new file called `app/core/scripts/socmajor_seed.py` and that it looks correct
1. You can run it to verify, but will need to remove that data if you care about restoring the database state
1. Run this command to run the script
```bash
docker-compose exec web python manage.py runscript socmajor_seed
```
1. To remove the data, go into the database and delete all rows from `core_socmajor`
```bash
docker-compose exec web python manage.py dbshell
# now we have a shell to the db
# see if all the seed data got inserted
select count(*) from core_socmajor;
# shows 22 rows
delete from core_socmajor;
# DELETE 22
select count(*) from core_socmajor;
# shows 0 rows
# ctrl-d to exit dbshell
```
## Create a Django migration
::::{dropdown} No need for this extra step
:::{admonition} Danger
:class: danger
**Do not** do this as part of creating the initial data script
:::
This portion of the documentation is for historical purposes in case we ever decide to run the scripts in migration files.
1. Create a blank migration file (for the core app, because all our models are in there)
```bash
docker-compose exec web python manage.py makemigrations --empty core --name socmajor_initial_data
```
1. Call our script from the migration file
```python
from django.db import migrations
def add_data(apps, schema_editor):
from ..scripts import socmajor_seed
socmajor_seed.run()
def delete_data(apps, schema_editor):
SOCMajor = apps.get_model("core", "SOCMajor")
SOCMajor.objects.all().delete()
class Migration(migrations.Migration):
dependencies = [
("core", "0007_socmajor"),
]
operations = [migrations.RunPython(add_data, delete_data)]
```
1. We pass 2 arguments to RunPython: functions for forward and reverse migrations
1. add_data calls the seed script
1. delete_data empties the table
1. Verify the migration works
```bash
# apply the new migration
docker-compose exec web python manage.py migrate core
# reversing to a previous migration (best to go back just 1 from the current count)
docker-compose exec web python manage.py migrate core 0004
# forwarding to the latest migration
docker-compose exec web python manage.py migrate core
```
::::
[pd-data-spreadsheet]: https://docs.google.com/spreadsheets/d/1x_zZ8JLS2hO-zG0jUocOJmX16jh-DF5dccrd_OEGNZ0/
[apps-script]: https://thenewstack.io/how-to-convert-google-spreadsheet-to-json-formatted-text/#:~:text=To%20do%20this,%20click%20Extensions,save%20your%20work%20so%20far.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Welcome to PeopleDepot's documentation!

CONTRIBUTING
how-to/add-model-and-api-endpoints
how-to/create-initial-data-migrations

.. toctree::
:maxdepth: 2
Expand Down

0 comments on commit 93b50d7

Please sign in to comment.