Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial data guide 36 #141

Closed
wants to merge 4 commits into from
Closed

Conversation

fyliu
Copy link
Member

@fyliu fyliu commented Mar 23, 2023

Add guide to create initial data scripts.

A team member should be able to create data insertion scripts using this guide.

@fyliu
Copy link
Member Author

fyliu commented Mar 23, 2023

I'm leaving this as draft until the sphinx PR is merged. This documentation is using the sphinx structure with an added entry in the index page.

@fyliu fyliu force-pushed the initial-data-guide-36 branch from 93b50d7 to 819ca2c Compare March 24, 2023 04:08
@fyliu fyliu force-pushed the initial-data-guide-36 branch from 819ca2c to ddfef5f Compare April 6, 2023 22:23
@fyliu fyliu marked this pull request as ready for review April 6, 2023 22:25
@fyliu fyliu force-pushed the initial-data-guide-36 branch from ddfef5f to bb4ae18 Compare April 6, 2023 22:29
@fyliu
Copy link
Member Author

fyliu commented Apr 6, 2023

I removed everything sphinx-specific from the commits in this merge. This is ready for review.

@joey-ma
Copy link
Member

joey-ma commented Apr 7, 2023

This is cool! I can try to create the SOCMajor model next, and we can resume reviewing this PR later.

@fyliu fyliu marked this pull request as draft April 7, 2023 23:14
@fyliu
Copy link
Member Author

fyliu commented Apr 10, 2023

@yoyoyojoe no need to implement the table from the guide although you can. Issue #24 is also good and is in the prioritized backlog. It should work for all tables although I only tried SOCMajor and UserStatusType. I don't think guides need to be perfect. Their purpose is to get developers most of the way there.

@fyliu fyliu mentioned this pull request Apr 10, 2023
6 tasks
@fyliu
Copy link
Member Author

fyliu commented Apr 12, 2023

@yoyoyojoe It turns out there's a blocker for #24. I ran into more problems trying to work on that. I think it'll be a while before this PR is review-able.

@fyliu fyliu mentioned this pull request Apr 14, 2023
19 tasks
Copy link
Member

@joey-ma joey-ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by Fang, we're converting this to Draft and will put this in the Ice Box for the time being.

1. Export the data from the Google [spreadsheet][pd-data-spreadsheet]
1. Find the sheet in the document containing the data to export. Let's use the `SOC Major - Data` data as our example. It's
1. Make sure that the first row (column names) is frozen. Otherwise, freeze it by selecting the first row in the sheet, then Menu > View > Freeze > Up to row 1
1. Export to JSON. Menu > Export JSON > Export JSON for this sheet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of note-taking:

at this step, I encountered Authorization Required:
image

which then brings me to Choose an Account:
image

which then, upon selecting my account, brings me to Google hasn't verified this app:
image

by clicking on Advanced, then going to Go to Untitled project (unsafe):
image

I can then give authorization when prompted "Untitled project wants to access your Google Account":
image

after authorization, I perform the same actions again, and now I get the Exported JSON:
image

This is cool, but a few comments / observations:

  1. How much of the above process should be documented?
  2. What would be a better way to handle this Google hasn't verified this app pop-up?
  3. What is the benefit of this custom script?

For example, I'm not sure how Export JSON is a Menu item for Google Sheet PD: Table and field explanations, but I'm aware of the Extensions, and I found this Export Sheet Data Add-on after a brief search on the Google Workspace Marketplace (Extensions > Add-ons > Get Add-ons), and it seems we can customize it to do the same thing.

image

image

image

Being able to export to file is neat as well, and if we name the Sheets in PascalCase to start, and we customize further, we might be able to customize it to do more and export all Sheets (tables) at once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation and great comments/questions. But the answers are not going to be great.

  1. How much of the above process should be documented?

    • Documentation is great, but this is normal general google apps scripts behavior so it's probably not up to us (as a project) to host this documentation. Maybe someone has the documentation that we can link to or we can put it in a hfla general developer docs place.
    • The other thing is that this is a temporary solution that we'll get rid of soon after this is merged. We won't need/want this script at all. But this is a working solution and I want to get something working before doing the optimizing step.
  2. What would be a better way to handle this Google hasn't verified this app pop-up?

    • There is no better way to do this. This is normal and the warning serves to tell users to examine the script closely before running it.
  3. What is the benefit of this custom script?

    • Good question. The benefit it that it works to get us JSON from the spreadsheet. Actually, Bonnie found it so it's "free" for me. The drawback is that it's a manual step and copying the JSON into a file is also manual. That's why it only serves as a stop gap solution, and because there's a much better method, I didn't spend time to look for other similar things. The one you found does go one step further than the current script, so I would have gone with that if I had known.

docs/how-to/create-initial-data-migrations.md Show resolved Hide resolved
1. Go to the project root and run this command

```bash
docker-compose exec web python scripts/convert.py core/initial_data/SOCMajor_export.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the convert.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that we can update this part: docker-compose exec web python scripts/json_data_to_python.py core/initial_data/SOCMajor_export.json

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what the convert script became. The docs need updating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the document been updated?

1. Run this command to run the script

```bash
docker-compose exec web python manage.py runscript socmajor_seed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to test this further, we'll need to have the SOCMajor model created.

Copy link
Member Author

@fyliu fyliu Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or some other model that has initial data. Ones like SOCMajor have lots of initial data which demonstrates why certain choices were made on the specific methods we chose here to generate the initial data.

I did create the model locally to test that this works. I also did the user_status model and it was okay too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change to use ProgramArea (or other table which exists)?

1. Go to the project root and run this command

```bash
docker-compose exec web python scripts/convert.py core/initial_data/SOCMajor_export.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has the document been updated?

1. Run this command to run the script

```bash
docker-compose exec web python manage.py runscript socmajor_seed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change to use ProgramArea (or other table which exists)?


```bash
docker-compose exec web python manage.py runscript socmajor_seed
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spreadsheet uses "id" while the base abstract model has uuid. IMO simplest is to change uuid to id. Anyone who had created a db would need to recreate - I can provide steps on how to do that. Alternatively, instructions could be added to manually change this.

```bash
docker-compose exec web python manage.py runscript socmajor_seed
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to doc: IMO it would be better for id to be populated in the spreadsheet for all tables in case we add joins to that data in the seed data. If you agree, I can make that change.

1. Find the sheet in the document containing the data to export. Let's use the `SOC Major - Data` data as our example. It's
1. Make sure that the first row (column names) is frozen. Otherwise, freeze it by selecting the first row in the sheet, then Menu > View > Freeze > Up to row 1
1. Export to JSON. Menu > Export JSON > Export JSON for this sheet
1. Save the JSON into a file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current steps do not indicate there is a dialog box, it reads better with sub steps, the last step is a note. If these changes are made for lines 27-30 it would look like this:

  1. Export to JSON. Menu > Export JSON > Export JSON for this sheet. A dialog box with JSON will appear.

    1. Select all the JSON and copy into the clipboard.
      ==> Add this line,
  2. Save the JSON as [ModelNameInPascalCase]_export.json under app/core/initial_data/

    ==> Change to add NOTE:. It is not a step.
    NOTE: The Pascal case is important in the next step to generate a python script to insert the data. It must match the model's class name for this to work.

```

[pd-data-spreadsheet]: https://docs.google.com/spreadsheets/d/1x_zZ8JLS2hO-zG0jUocOJmX16jh-DF5dccrd_OEGNZ0/
[apps-script]: https://thenewstack.io/how-to-convert-google-spreadsheet-to-json-formatted-text/#:~:text=To%20do%20this,%20click%20Extensions,save%20your%20work%20so%20far.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions are needed on how to include the generated script either as part of a seed data script or include in migration. I like the option of including in migration scripts and I figured out how to do this.

@ethanstrominger
Copy link
Member

Created a new pull request with additional changes from ethanstrominger initial-data-migration-36 to peopledepot main

@fyliu
Copy link
Member Author

fyliu commented Nov 27, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅Done
Development

Successfully merging this pull request may close these issues.

Document how to create initial data for tables
3 participants