Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli) Adds side-effect report of rollbacks #4482

Merged
merged 7 commits into from
Mar 31, 2022
Merged

feat(cli) Adds side-effect report of rollbacks #4482

merged 7 commits into from
Mar 31, 2022

Conversation

pedro93
Copy link
Collaborator

@pedro93 pedro93 commented Mar 24, 2022

This PR improves on #4405 on several fronts:

  • Reports the number of affected entities by rolling back (in other words deleting) entities. This required introducing a new property aspectsReverted in RollbackResponse's pdl definition.
  • Renames rollback cli command flags for hard & soft delete to be --soft/--nuke respectively to be in line with other flags in the command line tool.
  • Updates rollback documentation accordingly.
  • Changes rollback behaviour to report all key aspects being rolled-back regardless of limit (100 entries) & adds other aspects up to the 100 (total) entries in the cli output.
  • Fixes datahub get command default behaviour to return all aspects of a given entity if no aspect was specified.

Sample output of rolling back an ingestion that produced a dataset which had it's description changed by another run:

$ datahub ingest show --run-id file-2022_03_29-10_09_39
this run created 1 new entities and updated 7 aspects
rolling back will delete the entities created and revert the updated aspects

showing first 7 of 7 aspects touched by this run
+---------------------------------------------------------------+----------------------+----------------------------+
| urn                                                           | aspect name          | created at                 |
+===============================================================+======================+============================+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | datasetKey           | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | browsePaths          | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | datasetProperties    | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | ownership            | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | institutionalMemory  | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | schemaMetadata       | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | dataPlatformInstance | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+

Soft rollback (default) will rollback everything in the ingestion run except for dataset key

$ datahub ingest rollback --run-id file-2022_03_29-10_09_39 -s rollback.csv
Rolling back deletes the entities created by a run and reverts the updated aspects
This rollback will delete 1 entities and will roll back 6 aspects
showing first 6 of 6 aspects that will be reverted by this run
+---------------------------------------------------------------+----------------------+----------------------------+
| urn                                                           | aspect name          | created at                 |
+===============================================================+======================+============================+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | browsePaths          | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | datasetProperties    | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | ownership            | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | institutionalMemory  | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | schemaMetadata       | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | dataPlatformInstance | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
WARNING: This rollback will hide 1 aspects related to 1 entities being rolled back that are not part ingestion run id.

A hard rollback (will include key aspects):

$ datahub ingest rollback --run-id file-2022_03_29-10_09_39 --dry-run -s rollback.csv --nuke
Rolling back deletes the entities created by a run and reverts the updated aspects
This rollback will delete 1 entities and will roll back 7 aspects
showing first 7 of 7 aspects that will be reverted by this run
+---------------------------------------------------------------+----------------------+----------------------------+
| urn                                                           | aspect name          | created at                 |
+===============================================================+======================+============================+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | datasetKey           | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | browsePaths          | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | datasetProperties    | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | ownership            | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | institutionalMemory  | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | schemaMetadata       | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
| urn:li:dataset:(urn:li:dataPlatform:kafka,test-rollback,PROD) | dataPlatformInstance | 2022-03-29 10:09:40 (WEST) |
+---------------------------------------------------------------+----------------------+----------------------------+
WARNING: This rollback will hide 1 aspects related to 1 entities being rolled back that are not part ingestion run id.

The WARNING message at the end will report how many entities have aspects that will be affected by this rollback and the total number of affected aspects, if any exists.

In case such aspects exist, a folder (by default rollback-reports, can be overridden by the --report-dir flag) will be created with the following file structure rollback-reports/{time}/<files> as seen below:

rollback-reports
├── 2022-03-30 12:24:08
│   ├── config.json
│   └── unsafe_entities.csv
├── 2022-03-30 12:40:51
│   ├── config.json
│   └── unsafe_entities.csv

config.json is for now a simple json containing information of which run_id was processed.
unsafe_entities.csv is a csv with a single column (urn), for now, for datahub operators to know which entities have unsafe aspects. NOTE we have set a maximum of 1 million entities to be sent from the backend to not overload the network when sending these results back.

To know which aspects are affected, the datahub operator must run datahub get --urn <urn in unsafe_entities.csv>. The output will be set the aspects where not part of the rollback and are now considered unsafe.

The rollback.csv will save the table presented by the command into a file to datahub users to keep a reference of what was touched by the rollback.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@github-actions
Copy link

github-actions bot commented Mar 24, 2022

Unit Test Results (build & test)

  96 files  ±0    96 suites  ±0   18m 31s ⏱️ - 9m 26s
686 tests ±0  627 ✔️ ±0  59 💤 ±0  0 ±0 

Results for commit a8e5c41. ± Comparison against base commit 0be0689.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Mar 24, 2022

Unit Test Results (metadata ingestion)

       5 files  ±0         5 suites  ±0   55m 58s ⏱️ - 1m 36s
   388 tests ±0     388 ✔️ +1    0 💤 ±0  0  - 1 
1 787 runs  ±0  1 756 ✔️ +8  31 💤  - 7  0  - 1 

Results for commit a8e5c41. ± Comparison against base commit 0be0689.

♻️ This comment has been updated with latest results.

@shirshanka
Copy link
Contributor

@pedro93: Could you provide some sample output of these runs here?

@pedro93
Copy link
Collaborator Author

pedro93 commented Mar 29, 2022

@shirshanka please take a look at the commit message, I've added some sample output

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will merge after CI is green 👍

@shirshanka shirshanka merged commit 306ddff into datahub-project:master Mar 31, 2022
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
…t#4482)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants