-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial data guide 36 #141
Changes from all commits
8dda7c4
5f569ec
c609754
bb4ae18
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# convert an spreadsheet-exported json into a script that can be used in a django migration | ||
# get path and read json file | ||
# get model path | ||
# write new json with model and fields | ||
|
||
import json | ||
|
||
# example: | ||
# docker-compose exec web python core/scripts/convert.py core/fixtures/userstatus_export.json | ||
# to apply the seed script: | ||
# docker-compose exec web python manage.py runscript userstatus-seed | ||
import sys | ||
from pathlib import Path | ||
|
||
|
||
def get_modelname(path): | ||
"""Extract model name from file path | ||
|
||
Assumes the name portion before the first underscore is the model name | ||
""" | ||
filename = Path(path).name | ||
return filename.split("_")[0] | ||
|
||
|
||
def to_key_eq_value_str(line): | ||
"""Convert dictionary to string of key = value, separated by commas""" | ||
# print(line) | ||
values = [] | ||
for key, value in line.items(): | ||
values.append(f'{key}="{value}"') | ||
|
||
# print(values) | ||
return ", ".join(values) | ||
|
||
|
||
def convert(file_path): | ||
"""Convert valid a file of json objects into a python script which can insert the data into django | ||
|
||
file_path file is in a subdirectory of a django app. Suggested format is | ||
<appname>/initial_data/<ModelName>_export.json file_path ends in a filename | ||
in the format <ModelName>_export.json where the <ModelName> matches the one | ||
defined in the django project. | ||
|
||
The python script will be saved to <appname>/scripts/<modelname>_seed.py | ||
""" | ||
json_file_path = Path(file_path) | ||
|
||
with json_file_path.open() as json_file: | ||
model_all = json.load(json_file) | ||
root = json_file_path.cwd() | ||
model_name = get_modelname(file_path) | ||
app_name = json_file_path.parents[1].name | ||
|
||
output = f"from core.models import {model_name}\n\n\n" | ||
output += "def run():\n\n" | ||
for model_dict in model_all: | ||
values = to_key_eq_value_str(model_dict) | ||
python_lines = f" status = {model_name}({values})\n" | ||
python_lines += " status.save()\n" | ||
# print(python_lines) | ||
output += python_lines | ||
|
||
# print(output) | ||
|
||
output_filename = model_name.lower() + "_seed.py" | ||
# print(output_filename) | ||
destination = Path(root) / app_name / "scripts" / output_filename | ||
# print(dst) | ||
with Path(destination).open(mode="w") as outfile: | ||
outfile.write(output) | ||
|
||
|
||
if __name__ == "__main__": | ||
try: | ||
json_file_path = sys.argv[1] | ||
except IndexError: | ||
raise SystemExit(f"Usage: {sys.argv[0]} <input json file>") | ||
# print(json_file_path) | ||
|
||
convert(json_file_path) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# Create initial data scripts | ||
|
||
## Overview | ||
|
||
The goal is to convert our initial data into scripts that can be loaded into the database when the backend is set up for the first time. | ||
|
||
These are the steps: | ||
|
||
1. Export the data into JSON | ||
1. Generate a python script from the JSON data | ||
|
||
### Prerequisites | ||
|
||
The initial data exists in a Google spreadsheet, such as [this one for People Depot][pd-data-spreadsheet]. There should be individual sheets named after the model names the data correspond to, such as `SOC Major - Data`. The sheet name is useful for us to identify the model it corresponds to. | ||
|
||
The sheet should be formatted like so: | ||
|
||
- the first row contains the names of the field names in the model. The names must be exactly the same | ||
- rows 2 to n are the initial data for the model we want to turn into a script. | ||
|
||
## Convert the data into JSON | ||
|
||
1. Export the data from the Google [spreadsheet][pd-data-spreadsheet] | ||
1. Find the sheet in the document containing the data to export. Let's use the `SOC Major - Data` data as our example. It's | ||
1. Make sure that the first row (column names) is frozen. Otherwise, freeze it by selecting the first row in the sheet, then Menu > View > Freeze > Up to row 1 | ||
1. Export to JSON. Menu > Export JSON > Export JSON for this sheet | ||
1. Save the JSON into a file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current steps do not indicate there is a dialog box, it reads better with sub steps, the last step is a note. If these changes are made for lines 27-30 it would look like this:
|
||
1. Select and copy all the JSON text | ||
1. Paste it into a new file and save it as [ModelNameInPascalCase]_export.json under app/core/initial_data/ | ||
1. The Pascal case is important in the next step to generate a python script to insert the data. It must match the model's class name for this to work. | ||
|
||
**Potential data issue** | ||
There was a problem with the JSON exporter where it omitted the underscore in `occ_code`. It should be fixed now but it's good to pay attention to other column name problems and fix them in the [Google Apps script][apps-script] in the [spreadsheet][pd-data-spreadsheet]. You will find out when the data insertion fails if there's a problem. | ||
fyliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Convert JSON into python | ||
|
||
1. Make sure the backend is running | ||
|
||
```bash | ||
./scripts/buildrun.sh | ||
``` | ||
|
||
1. Go to the project root and run this command | ||
|
||
```bash | ||
docker-compose exec web python scripts/convert.py core/initial_data/SOCMajor_export.json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not seeing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found that we can update this part: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's what the convert script became. The docs need updating. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Has the document been updated? |
||
``` | ||
|
||
1. Check that there's a new file called `app/core/scripts/socmajor_seed.py` and that it looks correct | ||
1. You can run it to verify, but will need to remove that data if you care about restoring the database state | ||
1. Run this command to run the script | ||
|
||
```bash | ||
docker-compose exec web python manage.py runscript socmajor_seed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In order to test this further, we'll need to have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, or some other model that has initial data. Ones like I did create the model locally to test that this works. I also did the user_status model and it was okay too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you change to use ProgramArea (or other table which exists)? |
||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The spreadsheet uses "id" while the base abstract model has uuid. IMO simplest is to change uuid to id. Anyone who had created a db would need to recreate - I can provide steps on how to do that. Alternatively, instructions could be added to manually change this. |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not related to doc: IMO it would be better for id to be populated in the spreadsheet for all tables in case we add joins to that data in the seed data. If you agree, I can make that change. |
||
1. To remove the data, go into the database and delete all rows from `core_socmajor` | ||
|
||
```bash | ||
docker-compose exec web python manage.py dbshell | ||
|
||
# now we have a shell to the db | ||
|
||
# see if all the seed data got inserted | ||
select count(*) from core_socmajor; | ||
# shows 22 rows | ||
|
||
delete from core_socmajor; | ||
# DELETE 22 | ||
|
||
select count(*) from core_socmajor; | ||
# shows 0 rows | ||
|
||
# ctrl-d to exit dbshell | ||
``` | ||
|
||
[pd-data-spreadsheet]: https://docs.google.com/spreadsheets/d/1x_zZ8JLS2hO-zG0jUocOJmX16jh-DF5dccrd_OEGNZ0/ | ||
[apps-script]: https://thenewstack.io/how-to-convert-google-spreadsheet-to-json-formatted-text/#:~:text=To%20do%20this,%20click%20Extensions,save%20your%20work%20so%20far. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instructions are needed on how to include the generated script either as part of a seed data script or include in migration. I like the option of including in migration scripts and I figured out how to do this. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Export JSON Apps script | ||
|
||
We're using an externally-developed apps script to export the initial data from a google spreadsheet to JSON as a step in creating a runnable script. | ||
|
||
The updated script is in the vendor directory in the project, along with a link with installation instructions. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Vendor section | ||
|
||
These are code that were not developed as part of this project. They are being tracked because we have made improvements or customized them for our needs. | ||
|
||
Keep in mind that these code may not fall under the same software license as the rest of the project. Changes to these codes should be made in commits independent of any project code. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# ExportJSON script | ||
|
||
## Functionality | ||
|
||
This is a Google Apps script that's meant to be run inside a Google spreadsheet to export data in JSON format. It can export a single sheet or all sheets. | ||
|
||
## Usage | ||
|
||
See the [blog post][blog-post]. | ||
|
||
## Original source | ||
|
||
This script was first imported from a [github gist](https://gist.githubusercontent.com/pamelafox/1878143/raw/6c23f71231ce1fa09be2d515f317ffe70e4b19aa/exportjson.js). It was referenced from a [blog post][blog-post]. | ||
|
||
## Changes (most recent last) | ||
|
||
- Fix handling of underscore in column names | ||
|
||
[blog-post]: https://thenewstack.io/how-to-convert-google-spreadsheet-to-json-formatted-text/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of note-taking:
at this step, I encountered
Authorization Required
:which then brings me to
Choose an Account
:which then, upon selecting my account, brings me to
Google hasn't verified this app
:by clicking on
Advanced
, then going toGo to Untitled project (unsafe)
:I can then give authorization when prompted "Untitled project wants to access your Google Account":
after authorization, I perform the same actions again, and now I get the
Exported JSON
:This is cool, but a few comments / observations:
Google hasn't verified this app
pop-up?For example, I'm not sure how
Export JSON
is a Menu item for Google SheetPD: Table and field explanations
, but I'm aware of theExtensions
, and I found thisExport Sheet Data
Add-on after a brief search on the Google Workspace Marketplace (Extensions > Add-ons > Get Add-ons
), and it seems we can customize it to do the same thing.Being able to export to file is neat as well, and if we name the Sheets in PascalCase to start, and we customize further, we might be able to customize it to do more and export all Sheets (tables) at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the documentation and great comments/questions. But the answers are not going to be great.
How much of the above process should be documented?
What would be a better way to handle this Google hasn't verified this app pop-up?
What is the benefit of this custom script?