Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Pause Erasure Execution for Manual Confirmation [#522] #571

Merged
merged 11 commits into from
Jun 1, 2022

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented May 26, 2022

➡️ Follow-up to #554. Follows same pattern, but for an erasure.

Purpose

Some data cannot be automatically masked or destroyed. During execution of a privacy request with configured erasures, pause on collections marked as manual and wait for the user to confirm that data has been removed or manually masked before proceeding.

Resume a paused erasure request by passing in the masked count for the paused collection:
POST {{host}}/privacy-request/{{privacy_request_id}}/erasure_confirm

{
    "row_count": 5
}

This mirrors what we return in an automated way when masking collections.

Changes

Big picture, changes added to store which nodes we've run erasures on as we run them, and separately to store which node we're currently paused on, as well as store manually-confirmed erasure counts. Finally, an endpoint is added to resume the privacy request from an erasure stage, more details below:

  • Add an API endpoint to resume a privacy request that is paused at the erasure stage
    • Dry up the logic to share between this and an endpoint that is paused at the access stage. These are separate endpoints because a single privacy request can run both access and erasures
  • Update graph_task.erasure_request to cache the row count erased for each collection (similar to how we currently cache data retrieved for access requests). The primary purpose of this is to track which collections we've already erased so the new graph doesn't visit them when we resume.
  • Add update_erasure_mapping_from_cache which modifies the graph if we need to restart to avoid running erasures on the collections we've already visited. This is how "pausing" is effectively handled - we don't actually pause, but we force execution to stop when we hit a paused node, and then rebuild a different graph that just runs the remaining nodes on resume.
  • Like access requests, force erasure tasks to run sequentially. With Pause Access Request and Resume with Manual Input [#521] #554, A "Pause" now throws an exception and cancels all other tasks in the graph. Running one at a time makes sure we don't close the session prematurely and affect other tasks in flight, potentially preventing execution logs from getting persisted for each collection. There are other ways to do this, but we decided in Pause Access Request and Resume with Manual Input [#521] #554 that this was a good path forward for now, seeing as we're already using Dask Delayed to execute the graph - its side effect is that an exception cancels all other tasks in the graph.
  • Separately add methods to the PrivacyRequest to both cache manual erasure confirmations and retrieve them. These are used both when we resume a privacy request manually to see if we have the information needed to proceed.
  • Add ManualConnector.mask_data which looks to see if a manual erasure confirmation has been added to the cache. Otherwise, it Pauses the graph. When we rerun after the user adds the manual confirmation via the API, it finds that and proceeds with execution.

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Worth noting

  • We have a separate ticket to surface to the user how to perform this manual request, for both access and erasure. For both access and erasure requests, the user needs to know the paused collection, and how to locate the record in question (this would be whatever data is passed to the manual node from upstream collections). In addition, they might need to be given how to mask the record in question. Surface to user how to Pause/Resume Privacy Request  #570
  • The current design makes erasures with manual data a two-step process (as erasures actually run both an access and an erasure). First, the user will need to retrieve the record in question and enter in specific fields. Second, the user will need to manually mask or destroy that data and send in a confirmation. If the data in question is something like a folder in a filing cabinet that needs to be destroyed, this feels like overkill. On the other hand, the file in the filing cabinet might be needed to find data in some other collection, so the user needs to pass in its contents. Additionally, the data may have more specific masking requirements beyond just destruction, hence the two-step process. Further product refinement may be needed here.

Ticket

Fixes #522

@pattisdr pattisdr marked this pull request as ready for review May 27, 2022 00:06
… the privacy request from the erasure step. Assert both access and erasure are called when we submit the privacy request from the access step.
@pattisdr
Copy link
Contributor Author

@ethyca/docs-authors minor docs change added here to explain how to resume a privacy request that is paused in the erasure step.

docs/fidesops/docs/guides/manual_data.md Show resolved Hide resolved
src/fidesops/api/v1/endpoints/privacy_request_endpoints.py Outdated Show resolved Hide resolved
src/fidesops/api/v1/endpoints/privacy_request_endpoints.py Outdated Show resolved Hide resolved
src/fidesops/task/task_resources.py Show resolved Hide resolved
tests/api/v1/endpoints/test_privacy_request_endpoints.py Outdated Show resolved Hide resolved
tests/api/v1/endpoints/test_privacy_request_endpoints.py Outdated Show resolved Hide resolved
tests/integration_tests/test_manual_task.py Outdated Show resolved Hide resolved
tests/integration_tests/test_manual_task.py Show resolved Hide resolved
@pattisdr
Copy link
Contributor Author

I think I've responded to all your comments Paul, ready for another pass!

Copy link
Contributor

@sanders41 sanders41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I see you have a note to the docs team so I didn't merge in case you still have something that needs looking at there.

@pattisdr
Copy link
Contributor Author

pattisdr commented Jun 1, 2022

Thank you @sanders41, i think docs team is out this week, so I'd go ahead and merge and I can make a note to him about the followup.

@sanders41 sanders41 merged commit 00427c4 into main Jun 1, 2022
@sanders41 sanders41 deleted the fidesops_522_pause_erasure branch June 1, 2022 13:34
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* First draft - add ability to resume an erasure request by manually confirming the number of rows erased.

* Have the access and erasure endpoints share code.

* Update privacy request pause failure test.

* Add tests for erasure caching.

* Update both get_manual_count and get_manual_erasure_count to be inverses of cache_manual_input and cache_manual_erasure_count.

* Fix some formatting.

* Update changelog and add a docs draft.

* Assert that the access portion of execution isn't called if we submit the privacy request from the erasure step. Assert both access and erasure are called when we submit the privacy request from the access step.

* Fix submit mock.

* Respond to CR.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for pausing an execution (Erasure Requests)
2 participants