Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial data guide 36 #141

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added app/core/scripts/__init__.py
Empty file.
80 changes: 80 additions & 0 deletions app/scripts/json_data_to_python.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# convert an spreadsheet-exported json into a script that can be used in a django migration
# get path and read json file
# get model path
# write new json with model and fields

import json

# example:
# docker-compose exec web python core/scripts/convert.py core/fixtures/userstatus_export.json
# to apply the seed script:
# docker-compose exec web python manage.py runscript userstatus-seed
import sys
from pathlib import Path


def get_modelname(path):
"""Extract model name from file path

Assumes the name portion before the first underscore is the model name
"""
filename = Path(path).name
return filename.split("_")[0]


def to_key_eq_value_str(line):
"""Convert dictionary to string of key = value, separated by commas"""
# print(line)
values = []
for key, value in line.items():
values.append(f'{key}="{value}"')

# print(values)
return ", ".join(values)


def convert(file_path):
"""Convert valid a file of json objects into a python script which can insert the data into django

file_path file is in a subdirectory of a django app. Suggested format is
<appname>/initial_data/<ModelName>_export.json file_path ends in a filename
in the format <ModelName>_export.json where the <ModelName> matches the one
defined in the django project.

The python script will be saved to <appname>/scripts/<modelname>_seed.py
"""
json_file_path = Path(file_path)

with json_file_path.open() as json_file:
model_all = json.load(json_file)
root = json_file_path.cwd()
model_name = get_modelname(file_path)
app_name = json_file_path.parents[1].name

output = f"from core.models import {model_name}\n\n\n"
output += "def run():\n\n"
for model_dict in model_all:
values = to_key_eq_value_str(model_dict)
python_lines = f" status = {model_name}({values})\n"
python_lines += " status.save()\n"
# print(python_lines)
output += python_lines

# print(output)

output_filename = model_name.lower() + "_seed.py"
# print(output_filename)
destination = Path(root) / app_name / "scripts" / output_filename
# print(dst)
with Path(destination).open(mode="w") as outfile:
outfile.write(output)


if __name__ == "__main__":
try:
json_file_path = sys.argv[1]
except IndexError:
raise SystemExit(f"Usage: {sys.argv[0]} <input json file>")
# print(json_file_path)

convert(json_file_path)
78 changes: 78 additions & 0 deletions docs/how-to/create-initial-data-migrations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Create initial data scripts

## Overview

The goal is to convert our initial data into scripts that can be loaded into the database when the backend is set up for the first time.

These are the steps:

1. Export the data into JSON
1. Generate a python script from the JSON data

### Prerequisites

The initial data exists in a Google spreadsheet, such as [this one for People Depot][pd-data-spreadsheet]. There should be individual sheets named after the model names the data correspond to, such as `SOC Major - Data`. The sheet name is useful for us to identify the model it corresponds to.

The sheet should be formatted like so:

- the first row contains the names of the field names in the model. The names must be exactly the same
- rows 2 to n are the initial data for the model we want to turn into a script.

## Convert the data into JSON

1. Export the data from the Google [spreadsheet][pd-data-spreadsheet]
1. Find the sheet in the document containing the data to export. Let's use the `SOC Major - Data` data as our example. It's
1. Make sure that the first row (column names) is frozen. Otherwise, freeze it by selecting the first row in the sheet, then Menu > View > Freeze > Up to row 1
1. Export to JSON. Menu > Export JSON > Export JSON for this sheet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of note-taking:

at this step, I encountered Authorization Required:
image

which then brings me to Choose an Account:
image

which then, upon selecting my account, brings me to Google hasn't verified this app:
image

by clicking on Advanced, then going to Go to Untitled project (unsafe):
image

I can then give authorization when prompted "Untitled project wants to access your Google Account":
image

after authorization, I perform the same actions again, and now I get the Exported JSON:
image

This is cool, but a few comments / observations:

  1. How much of the above process should be documented?
  2. What would be a better way to handle this Google hasn't verified this app pop-up?
  3. What is the benefit of this custom script?

For example, I'm not sure how Export JSON is a Menu item for Google Sheet PD: Table and field explanations, but I'm aware of the Extensions, and I found this Export Sheet Data Add-on after a brief search on the Google Workspace Marketplace (Extensions > Add-ons > Get Add-ons), and it seems we can customize it to do the same thing.

image

image

image

Being able to export to file is neat as well, and if we name the Sheets in PascalCase to start, and we customize further, we might be able to customize it to do more and export all Sheets (tables) at once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation and great comments/questions. But the answers are not going to be great.

  1. How much of the above process should be documented?

    • Documentation is great, but this is normal general google apps scripts behavior so it's probably not up to us (as a project) to host this documentation. Maybe someone has the documentation that we can link to or we can put it in a hfla general developer docs place.
    • The other thing is that this is a temporary solution that we'll get rid of soon after this is merged. We won't need/want this script at all. But this is a working solution and I want to get something working before doing the optimizing step.
  2. What would be a better way to handle this Google hasn't verified this app pop-up?

    • There is no better way to do this. This is normal and the warning serves to tell users to examine the script closely before running it.
  3. What is the benefit of this custom script?

    • Good question. The benefit it that it works to get us JSON from the spreadsheet. Actually, Bonnie found it so it's "free" for me. The drawback is that it's a manual step and copying the JSON into a file is also manual. That's why it only serves as a stop gap solution, and because there's a much better method, I didn't spend time to look for other similar things. The one you found does go one step further than the current script, so I would have gone with that if I had known.

1. Save the JSON into a file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current steps do not indicate there is a dialog box, it reads better with sub steps, the last step is a note. If these changes are made for lines 27-30 it would look like this:

  1. Export to JSON. Menu > Export JSON > Export JSON for this sheet. A dialog box with JSON will appear.

    1. Select all the JSON and copy into the clipboard.
      ==> Add this line,
  2. Save the JSON as [ModelNameInPascalCase]_export.json under app/core/initial_data/

    ==> Change to add NOTE:. It is not a step.
    NOTE: The Pascal case is important in the next step to generate a python script to insert the data. It must match the model's class name for this to work.

1. Select and copy all the JSON text
1. Paste it into a new file and save it as [ModelNameInPascalCase]_export.json under app/core/initial_data/
1. The Pascal case is important in the next step to generate a python script to insert the data. It must match the model's class name for this to work.

**Potential data issue**
There was a problem with the JSON exporter where it omitted the underscore in `occ_code`. It should be fixed now but it's good to pay attention to other column name problems and fix them in the [Google Apps script][apps-script] in the [spreadsheet][pd-data-spreadsheet]. You will find out when the data insertion fails if there's a problem.
fyliu marked this conversation as resolved.
Show resolved Hide resolved

## Convert JSON into python

1. Make sure the backend is running

```bash
./scripts/buildrun.sh
```

1. Go to the project root and run this command

```bash
docker-compose exec web python scripts/convert.py core/initial_data/SOCMajor_export.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the convert.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that we can update this part: docker-compose exec web python scripts/json_data_to_python.py core/initial_data/SOCMajor_export.json

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what the convert script became. The docs need updating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the document been updated?

```

1. Check that there's a new file called `app/core/scripts/socmajor_seed.py` and that it looks correct
1. You can run it to verify, but will need to remove that data if you care about restoring the database state
1. Run this command to run the script

```bash
docker-compose exec web python manage.py runscript socmajor_seed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to test this further, we'll need to have the SOCMajor model created.

Copy link
Member Author

@fyliu fyliu Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or some other model that has initial data. Ones like SOCMajor have lots of initial data which demonstrates why certain choices were made on the specific methods we chose here to generate the initial data.

I did create the model locally to test that this works. I also did the user_status model and it was okay too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change to use ProgramArea (or other table which exists)?

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spreadsheet uses "id" while the base abstract model has uuid. IMO simplest is to change uuid to id. Anyone who had created a db would need to recreate - I can provide steps on how to do that. Alternatively, instructions could be added to manually change this.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to doc: IMO it would be better for id to be populated in the spreadsheet for all tables in case we add joins to that data in the seed data. If you agree, I can make that change.

1. To remove the data, go into the database and delete all rows from `core_socmajor`

```bash
docker-compose exec web python manage.py dbshell

# now we have a shell to the db

# see if all the seed data got inserted
select count(*) from core_socmajor;
# shows 22 rows

delete from core_socmajor;
# DELETE 22

select count(*) from core_socmajor;
# shows 0 rows

# ctrl-d to exit dbshell
```

[pd-data-spreadsheet]: https://docs.google.com/spreadsheets/d/1x_zZ8JLS2hO-zG0jUocOJmX16jh-DF5dccrd_OEGNZ0/
[apps-script]: https://thenewstack.io/how-to-convert-google-spreadsheet-to-json-formatted-text/#:~:text=To%20do%20this,%20click%20Extensions,save%20your%20work%20so%20far.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions are needed on how to include the generated script either as part of a seed data script or include in migration. I like the option of including in migration scripts and I figured out how to do this.

5 changes: 5 additions & 0 deletions docs/tools/exportjson.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Export JSON Apps script

We're using an externally-developed apps script to export the initial data from a google spreadsheet to JSON as a step in creating a runnable script.

The updated script is in the vendor directory in the project, along with a link with installation instructions.
5 changes: 5 additions & 0 deletions vendor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Vendor section

These are code that were not developed as part of this project. They are being tracked because we have made improvements or customized them for our needs.

Keep in mind that these code may not fall under the same software license as the rest of the project. Changes to these codes should be made in commits independent of any project code.
19 changes: 19 additions & 0 deletions vendor/pamelafox/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# ExportJSON script

## Functionality

This is a Google Apps script that's meant to be run inside a Google spreadsheet to export data in JSON format. It can export a single sheet or all sheets.

## Usage

See the [blog post][blog-post].

## Original source

This script was first imported from a [github gist](https://gist.githubusercontent.com/pamelafox/1878143/raw/6c23f71231ce1fa09be2d515f317ffe70e4b19aa/exportjson.js). It was referenced from a [blog post][blog-post].

## Changes (most recent last)

- Fix handling of underscore in column names

[blog-post]: https://thenewstack.io/how-to-convert-google-spreadsheet-to-json-formatted-text/
Loading