Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the transformation for attachment files #3322

Closed
2 tasks
chouinar opened this issue Dec 19, 2024 · 0 comments · Fixed by #3486
Closed
2 tasks

Implement the transformation for attachment files #3322

chouinar opened this issue Dec 19, 2024 · 0 comments · Fixed by #3486
Assignees

Comments

@chouinar
Copy link
Collaborator

Summary

In a prior ticket (#3271 ) we setup the structure for the transformation logic, but didn't yet copy the files, here is where we want to do that.

The logic for this will vary depending on update/insert/delete/whether the opportunity is a draft as follows:

Insert

We need to stream the bytes from the staging table into the configured s3 location described above. The way we do that streaming should be able to re-use the same idea described in the testing setup above as well.

We need to have the opportunity record to determine whether or not the file should go to the private bucket where we store drafts.

If there are multiple files, they’ll all be uploaded to the same location: s3://<whichever_bucket>/opportunities//attachments/

Update

Same as insert except if the s3 path would change. For example, if the file name changed as part of the update, we’d want to change it in s3 as well. For this, we should insert the new file and then delete the old one. No other changes are necessary.

Delete

Delete requires deleting the files from s3, much like in the update case (just without a follow-up insert). We should make sure the delete approach is easily reused.

Opportunity Update

If is_draft on an opportunity changes from True to False, we want to move all of the attachments to the public bucket. As opportunities get processed before attachments, we may move something and then later do another update, that is perfectly fine.

Opportunity delete

When we delete an ORM object, we automatically handle recursively deleting all the related models, but that wouldn’t cascade into s3. If an opportunity needs to be deleted, we want to first delete all of its attachments from s3, and then delete the opportunity.

Acceptance criteria

  • Logic added
  • Very thorough tests added, likely a few dozen
@chouinar chouinar changed the title Implement the transformation for Implement the transformation for attachment files Dec 19, 2024
@chouinar chouinar self-assigned this Jan 6, 2025
@mxk0 mxk0 moved this from Icebox to Todo in Simpler.Grants.gov Product Backlog Jan 7, 2025
@chouinar chouinar moved this from Todo to In Progress in Simpler.Grants.gov Product Backlog Jan 9, 2025
@chouinar chouinar moved this from In Progress to In Review in Simpler.Grants.gov Product Backlog Jan 13, 2025
chouinar added a commit that referenced this issue Jan 17, 2025
## Summary
Fixes #3322

### Time to review: __10 mins__

## Changes proposed
A lot of file utilities (used in this PR) for handling
reading/writing/naming files on s3

Utility for setting up s3 file paths for the attachments

Logic to handle inserts/updates/deletes of attachments and the files
that need to move around on s3.

## Context for reviewers
There are some scenarios I haven't accounted for yet when the
opportunity itself is modified (deleted / is no longer a draft), I
originally wanted to handle this in a single PR, but I'll split that out
as this one already was getting too big.

See the ticket for details on the scenarios we need to handle.

## Additional information
Testing this is a bit tedious - there is a lot that needs to be setup
exactly to test it.

I'd recommend nuking anything you already have with `make
volume-recreate`

Set the env var to enable the job to run (add
`TRANSFORM_ORACLE_DATA_ENABLE_OPPORTUNITY_ATTACHMENT=1` to override.env)

Run `make console` and in that do
`f.StagingTsynopsisAttachmentFactory.create_batch(size=50)` and then
`exit()`

Finally you can run the job by doing `make cmd args="data-migration
load-transform --no-load --transform --no-set-current"`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

1 participant