Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brenton interactive generate #176

Merged
merged 29 commits into from
Oct 28, 2021
Merged

Conversation

brentonmallen1
Copy link
Contributor

@brentonmallen1 brentonmallen1 commented Oct 21, 2021

Closes 129

Code Changes

  • Added interactive cli for dataset file annotation
  • Added -a, --annotate option to the generate_dataset cli command
  • Changed visualization for graphs on a single html page
  • Updated visualization to allow for both default and total taxonomy to be displayed

Steps to Confirm

  • Verify both methods 1 and 2

Method 1:

  • Generate a dataset using the generate_dataset functionality and a connection to a test database (i.e. slice) without -a
  • Call the fidesctl annotate_dataset <filename> function on the generated dataset output (or any other dataset file)
  • Verify the following:
    • The UI allows the user to quit. Upon reloading after quitting, the UI takes the user back to the same field to annotate
    • Upon completion and/or quitting, the dataset yaml file has been edited to have the given data_categories, skipped ones have been left empty
      Method 2:
  • Generate a dataset using the generate_dataset functionality and a connection to a test database (i.e. slice) with -a
  • Verify that the generate_dataset with -a takes the user into the annotation UI

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation Updated

Description Of Changes

These changes allow the user to better navigate to the resource visualizations and interactively annotate a given dataset yaml file in a guided way without lose of progress if exited before completion.

Things to consider:

  • If a dataset.yml file has multiple datasets listed, the annotate_dataset functionality will output multiple resulting dataset files, one per dataset in the list. The resulting files will be in the same location as the unannotated file with new names corresponding to the name key in the dataset description. This is due to the current functionality of manifests.write_manifest to only take in a single dictionary instead of a list of dictionaries.

* Added initial UI
* Added viz url getter
* Amended mysql table filter
* Combined graphs into one html page and api endpoint
* Added associated testing
* TODO: currently only visualize default taxonomy, need to update to read all taxonomy from server
* Added commentary for annotate-all
* Added annotate option to generate_dataset
* Added new cli command annotate_dataset
* Added functionality to support full taxonomy for visualization
* Added viz title change accordingly
* Formatting
* Type Hinting
* Linting
brentonmallen1 and others added 9 commits October 21, 2021 16:10
* An attempt was made to satify Xenon but not sure that's going to happen
* Added annotate to xenon ignore (no space between comma separated items, smh)
* Reduced complexity of visualization
Copy link
Contributor

@ThomasLaPiana ThomasLaPiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few Qs to hash out, overall the flow feels great though!

@ThomasLaPiana
Copy link
Contributor

@brentonmallen1 i did some cleanup here, most notably refactoring crud.py and cleaning up some design things in cli.py

brentonmallen1 and others added 3 commits October 26, 2021 17:03
* Add option to validate user input data categories for both format and taxonomy compliance
* Fixed cli docstring to meet fastapi help expectations

[ci skip]
Copy link
Contributor

@iamkelllly iamkelllly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brentonmallen1 updated the cli documentation in github pages + the CLI inline docs to match the rest of the formatting. actually used it, pretty slick. great work 👍

* Added optional data category validation for annotating a dataset
* Removed viz hover text for tidiness
* Removed default taxonomy only option for resource viz
* Format, Type Hint, Lint
@ThomasLaPiana
Copy link
Contributor

looking really good here, just a small comments or two left and updating how it writes out files

* Main functionality worked out
* TODO: Test
* Refactored Testing
* Fix output dataset yaml formatting
* Formatting and Linting
@ThomasLaPiana
Copy link
Contributor

@brentonmallen1 is this ready for review/merge? if not, can we convert it to a draft PR until its ready for final review? i'll know you're looking for a final review when i see it change to no longer a draft, otherwise if its ready now i'll go ahead and review!

@brentonmallen1
Copy link
Contributor Author

It’s good for review.

@ThomasLaPiana
Copy link
Contributor

running into this bug, going to look into it:

####
Annotating Dataset: [Demo Users Dataset]
####
Annotating Table: [users]

Field [users.food_preference] has no data categories

Enter comma separated data categories for [food_preference] [Enter: skip, q: quit] [[]]: user.provided.identifiable
[user.provided.identifiable] contains invalid categories, please re-enter!
Enter comma separated data categories for [food_preference] [Enter: skip, q: quit] [[]]: user          
[user] contains invalid categories, please re-enter!
Enter comma separated data categories for [food_preference] [Enter: skip, q: quit] [[]]: user.provided.identifiable
[user.provided.identifiable] contains invalid categories, please re-enter!
Enter comma separated data categories for [food_preference] [Enter: skip, q: quit] [[]]: 

@brentonmallen1
Copy link
Contributor Author

brentonmallen1 commented Oct 28, 2021

Hmm, looks like it might not be seeing that category in the list of valid categories. Maybe it’s only grabbing leaf nodes

@brentonmallen1
Copy link
Contributor Author

That's not it, I see all the paths in the valid categories list

@ThomasLaPiana
Copy link
Contributor

@brentonmallen1 i got the bugs i found cleaned up, can you review the tweaks i made and lmk how they look?

@brentonmallen1
Copy link
Contributor Author

brentonmallen1 commented Oct 28, 2021

did we want to make the validation false by default?

Looks good. I only see two minor potential issues:
http://github.com/ethyca/fides/blob/c7dfce05e62cbf89dcaf85bcba6799e090df4dd0/fidesctl/src/fidesctl/core/annotate_dataset.py#L59-L59 uses input as a variable name though it's a builtin function

With the change in the loop, current_dataset here: http://github.com/ethyca/fides/blob/c7dfce05e62cbf89dcaf85bcba6799e090df4dd0/fidesctl/src/fidesctl/core/annotate_dataset.py#L169-L169 could be reference before it's assigned.

Minor things that probably aren't really an issue but just wanted to point them out. If you're good with it, I say ship it.

@ThomasLaPiana
Copy link
Contributor

did we want to make the validation false by default?

Looks good. I only see two minor potential issues: http://github.com/ethyca/fides/blob/c7dfce05e62cbf89dcaf85bcba6799e090df4dd0/fidesctl/src/fidesctl/core/annotate_dataset.py#L59-L59 uses input as a variable name though it's a builtin function

With the change in the loop, current_dataset here: http://github.com/ethyca/fides/blob/c7dfce05e62cbf89dcaf85bcba6799e090df4dd0/fidesctl/src/fidesctl/core/annotate_dataset.py#L169-L169 could be reference before it's assigned.

Minor things that probably aren't really an issue but just wanted to point them out. If you're good with it, I say ship it.

good catches!

not really worried about either, we're not using input in this function (since we're relying on click) and technically it would only hit that if the dataset list was empty, which would get caught earlier i believe?

either way i'm going to go ahead and merge as-is, thank you for your work on this huge feature!

@ThomasLaPiana
Copy link
Contributor

for the default validation as false, i would've had to rename it to no-validation and then completely redo how that bool gets passed down the pipeline, so for now i just made it false and the most feature-rich command is fidesctl annotate-dataset <path> -a -v

@ThomasLaPiana ThomasLaPiana merged commit b84e206 into main Oct 28, 2021
@ThomasLaPiana ThomasLaPiana deleted the brenton-interactive-generate branch October 28, 2021 22:14
@brentonmallen1
Copy link
Contributor Author

Sounds good!

earmenda pushed a commit that referenced this pull request Oct 29, 2021
commit b84e206
Author: CarpeFridiem <brenton@ethyca.com>
Date:   Thu Oct 28 18:14:13 2021 -0400

    Brenton interactive generate (#176)

    * Basic Functionality in Place

    * Added initial UI
    * Added viz url getter
    * Amended mysql table filter

    * Update Visualize Endpoint

    * Combined graphs into one html page and api endpoint
    * Added associated testing
    * TODO: currently only visualize default taxonomy, need to update to read all taxonomy from server

    * Add Annotate CLI

    * Added commentary for annotate-all
    * Added annotate option to generate_dataset
    * Added new cli command annotate_dataset

    * Visualization for Default and Total Taxonomy

    * Added functionality to support full taxonomy for visualization
    * Added viz title change accordingly

    * FTL

    * Formatting
    * Type Hinting
    * Linting

    * Minor bug fixes

    * An attempt was made to satify Xenon but not sure that's going to happen

    * Xenon fix for visualize

    [ci-skip]

    * Xenon and Pylint Fixes

    * Added annotate to xenon ignore (no space between comma separated items, smh)
    * Reduced complexity of visualization

    * Fix Test

    * Fix Skipping Bug

    * refactor crud.py to use crud functions instead of embedding the logic directly into the endpoint function

    * clean up the visualize logic now that crud.py is cleaned up

    * make the generate_dataset function unaware of annotate_dataset function, move it to the CLI side instead

    * add fideskey validation to the data categories

    * pylint fixes

    * Remove Default Flag for Visualizer

    * Make Annotate Dataset Separate Command

    [ci skip]

    * User input validation

    * Add option to validate user input data categories for both format and taxonomy compliance
    * Fixed cli docstring to meet fastapi help expectations

    [ci skip]

    * updating cli documentation

    * Input Validation Minor Fixes and FTL

    * Added optional data category validation for annotating a dataset
    * Removed viz hover text for tidiness
    * Removed default taxonomy only option for resource viz
    * Format, Type Hint, Lint

    * Fix Tests

    * Check-in Multiple Datasets

    * Main functionality worked out
    * TODO: Test

    * Support Multiple Datasets

    * Refactored Testing
    * Fix output dataset yaml formatting
    * Formatting and Linting

    * update how validation works, known but in the writing manifest part

    * clean up the looping logic in the annotate-dataset function, fixes the bug around blank files getting written out

    * update the validation flag for `annotate-dataset`

    Co-authored-by: Thomas La Piana <tal103020@icloud.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit 318f243
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Thu Oct 28 16:43:10 2021 -0400

    Delete x

    "x" slipped through the crax of the API CSS PR.

commit 26903b6
Author: Cillian <1268052+cilliankieran@users.noreply.github.com>
Date:   Tue Oct 26 21:10:53 2021 +0100

    1026 ck doc updates (#184)

    * Updates to SVG renders in dark mode and CSS adjustments to fix code window css

    * Remove unused stylesheet.css

    * Update CLI CSS

    * Format CSS

    * Tidying pass of unecessary css

    * Separate and tidy CSS

    * Include taxonomy CSS

    Co-authored-by: Neville Samuell <neville@ethyca.com>

commit a74fc12
Author: Neville Samuell <neville@ethyca.com>
Date:   Tue Oct 26 12:27:52 2021 -0400

    Update API CSS file for docs site to be consistent with fidesops (and rescope global overrides)  (#183)

    * Rename swagger_override.css to api.css for consistency

    * Include (re-scoped) API CSS from fidesops repo

commit f7fbdea
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Tue Oct 26 11:14:10 2021 -0400

    Get rid of 'try it out' and Visualize block (#181)

commit efabfd2
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Tue Oct 26 09:39:18 2021 -0400

    restored cli css (#182)

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>

commit 255a9a9
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Mon Oct 25 20:33:36 2021 -0400

    Update fides-logo.svg

commit e7768a8
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Mon Oct 25 20:26:22 2021 -0400

    Standardizing css with fidesops

commit 7ded5af
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Mon Oct 25 18:07:37 2021 -0400

    API CSS (#178)

    * Added more extensive help doc to cli.py and options.py.
    * updating CLI docs
    * Update options.py
    * Update cli.py
    black didn't like the whitespace.
    * api and css
    * Update index.md

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit 269459e
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Fri Oct 22 01:06:29 2021 -0400

    Added more extensive help doc to cli.py and options.py. (#175)

    * Added more extensive help doc to cli.py and options.py.
    * updating CLI docs
    * Update options.py
    * Update cli.py

    black didn't like the whitespace.

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit d9dc784
Author: Adrian Galvan <adriang430@gmail.com>
Date:   Thu Oct 21 19:57:06 2021 -0700

    Fixing stylesheet so dark mode headers can be a separate color from the default light theme (#177)

    Co-authored-by: Adrian Galvan <adrian@ethyca.com>

commit 8bdbf89
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Thu Oct 21 15:35:58 2021 -0400

    Test: cli directory with styled man pages (#157)

    * man

    * cli commands

    * removing dob property

    * more cli commands

    * more cli

    * updates

    * more

    * more

    * Added missing cli commands to the pretty cli doc. The only one that's left (that I know of) is generate-dataset

    * resolving merge conflicts

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit a452378
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Tue Oct 19 09:45:15 2021 -0400

    Update fides-logo.svg
ThomasLaPiana pushed a commit that referenced this pull request Oct 29, 2021
…chies (#170)

* Update populating taxonomy to be recursive

* Consider hierarchy of type in evaluations

* Fix hydration of taxonomy to find all keys in model

* Add check for policy rule action to only currently support REJECT for evaluation

* Fix typing in function definitions

* Remove qualifier lists from data set based models

* Fix pylint errors

* small documentation update

* Add evaluation of dataset references

* Add validation for missing evaluation resources

* Add validation for missing parent_keys and make eval output consistent

* Replace instances of identified_data and pseudonymized_data qualifiers with newer version

* Update evaluation output to be more consistent

* Add hierarchical tests for compare_rule_to_declaration

* Add tests for validate_fides_keys_exist_for_evaluation and get_dataset_by_fides_key

* Add tests for validate_supported_policy_rules and get_fides_key_parent_hierarchy

* Add tests for finding nested missing keys

* Fix evaluate validation to include policy rule keys and add tests for dataset evaluation

* Add tests for recursive use of populate_referenced_keys

* Squashed commit of the following:

commit b84e206
Author: CarpeFridiem <brenton@ethyca.com>
Date:   Thu Oct 28 18:14:13 2021 -0400

    Brenton interactive generate (#176)

    * Basic Functionality in Place

    * Added initial UI
    * Added viz url getter
    * Amended mysql table filter

    * Update Visualize Endpoint

    * Combined graphs into one html page and api endpoint
    * Added associated testing
    * TODO: currently only visualize default taxonomy, need to update to read all taxonomy from server

    * Add Annotate CLI

    * Added commentary for annotate-all
    * Added annotate option to generate_dataset
    * Added new cli command annotate_dataset

    * Visualization for Default and Total Taxonomy

    * Added functionality to support full taxonomy for visualization
    * Added viz title change accordingly

    * FTL

    * Formatting
    * Type Hinting
    * Linting

    * Minor bug fixes

    * An attempt was made to satify Xenon but not sure that's going to happen

    * Xenon fix for visualize

    [ci-skip]

    * Xenon and Pylint Fixes

    * Added annotate to xenon ignore (no space between comma separated items, smh)
    * Reduced complexity of visualization

    * Fix Test

    * Fix Skipping Bug

    * refactor crud.py to use crud functions instead of embedding the logic directly into the endpoint function

    * clean up the visualize logic now that crud.py is cleaned up

    * make the generate_dataset function unaware of annotate_dataset function, move it to the CLI side instead

    * add fideskey validation to the data categories

    * pylint fixes

    * Remove Default Flag for Visualizer

    * Make Annotate Dataset Separate Command

    [ci skip]

    * User input validation

    * Add option to validate user input data categories for both format and taxonomy compliance
    * Fixed cli docstring to meet fastapi help expectations

    [ci skip]

    * updating cli documentation

    * Input Validation Minor Fixes and FTL

    * Added optional data category validation for annotating a dataset
    * Removed viz hover text for tidiness
    * Removed default taxonomy only option for resource viz
    * Format, Type Hint, Lint

    * Fix Tests

    * Check-in Multiple Datasets

    * Main functionality worked out
    * TODO: Test

    * Support Multiple Datasets

    * Refactored Testing
    * Fix output dataset yaml formatting
    * Formatting and Linting

    * update how validation works, known but in the writing manifest part

    * clean up the looping logic in the annotate-dataset function, fixes the bug around blank files getting written out

    * update the validation flag for `annotate-dataset`

    Co-authored-by: Thomas La Piana <tal103020@icloud.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit 318f243
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Thu Oct 28 16:43:10 2021 -0400

    Delete x

    "x" slipped through the crax of the API CSS PR.

commit 26903b6
Author: Cillian <1268052+cilliankieran@users.noreply.github.com>
Date:   Tue Oct 26 21:10:53 2021 +0100

    1026 ck doc updates (#184)

    * Updates to SVG renders in dark mode and CSS adjustments to fix code window css

    * Remove unused stylesheet.css

    * Update CLI CSS

    * Format CSS

    * Tidying pass of unecessary css

    * Separate and tidy CSS

    * Include taxonomy CSS

    Co-authored-by: Neville Samuell <neville@ethyca.com>

commit a74fc12
Author: Neville Samuell <neville@ethyca.com>
Date:   Tue Oct 26 12:27:52 2021 -0400

    Update API CSS file for docs site to be consistent with fidesops (and rescope global overrides)  (#183)

    * Rename swagger_override.css to api.css for consistency

    * Include (re-scoped) API CSS from fidesops repo

commit f7fbdea
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Tue Oct 26 11:14:10 2021 -0400

    Get rid of 'try it out' and Visualize block (#181)

commit efabfd2
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Tue Oct 26 09:39:18 2021 -0400

    restored cli css (#182)

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>

commit 255a9a9
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Mon Oct 25 20:33:36 2021 -0400

    Update fides-logo.svg

commit e7768a8
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Mon Oct 25 20:26:22 2021 -0400

    Standardizing css with fidesops

commit 7ded5af
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Mon Oct 25 18:07:37 2021 -0400

    API CSS (#178)

    * Added more extensive help doc to cli.py and options.py.
    * updating CLI docs
    * Update options.py
    * Update cli.py
    black didn't like the whitespace.
    * api and css
    * Update index.md

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit 269459e
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Fri Oct 22 01:06:29 2021 -0400

    Added more extensive help doc to cli.py and options.py. (#175)

    * Added more extensive help doc to cli.py and options.py.
    * updating CLI docs
    * Update options.py
    * Update cli.py

    black didn't like the whitespace.

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit d9dc784
Author: Adrian Galvan <adriang430@gmail.com>
Date:   Thu Oct 21 19:57:06 2021 -0700

    Fixing stylesheet so dark mode headers can be a separate color from the default light theme (#177)

    Co-authored-by: Adrian Galvan <adrian@ethyca.com>

commit 8bdbf89
Author: dougfulton <lbj.kgb@gmail.com>
Date:   Thu Oct 21 15:35:58 2021 -0400

    Test: cli directory with styled man pages (#157)

    * man

    * cli commands

    * removing dob property

    * more cli commands

    * more cli

    * updates

    * more

    * more

    * Added missing cli commands to the pretty cli doc. The only one that's left (that I know of) is generate-dataset

    * resolving merge conflicts

    Co-authored-by: douglas fulton <dfulton@ad1.systemadmin.com>
    Co-authored-by: Kelly Huang <kelly@ethyca.com>

commit a452378
Author: kelly <85575406+iamkelllly@users.noreply.github.com>
Date:   Tue Oct 19 09:45:15 2021 -0400

    Update fides-logo.svg

* Add tests for evaluating dataset/dataset collection/dataset field

Co-authored-by: Eduardo Armendariz <eduardo@ethyca.com>
Co-authored-by: Thomas La Piana <tal103020@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make generate-dataset interactive with a -i/--interactive flag
3 participants