Building on what we've learned from DP Creator, DP Wizard offers:
- Easy installation with
pip install dp_wizard
- Simplified single-user application design
- Streamlined workflow that doesn't assume familiarity with differential privacy
- Interactive visualization of privacy budget choices
- UI development in Python with Shiny
usage: dp-wizard [-h] [--public_csv CSV] [--private_csv CSV] [--contrib CONTRIB] [--demo]
DP Wizard makes it easier to get started with Differential Privacy.
options:
-h, --help show this help message and exit
--public_csv CSV Path to public CSV
--private_csv CSV Path to private CSV
--contrib CONTRIB How many rows can an individual contribute?
--demo Use generated fake CSV for a quick demo
Use "--public_csv" if you have a public data set, and are curious how
DP can be applied: The preview visualizations will use your public data.
Use "--private_csv" if you only have a private data set, and want to
make a release from it: The preview visualizations will only use
simulated data, and apart from the headers, the private CSV is not
read until the release.
Use "--public_csv" and "--private_csv" together if you have two CSVs
with the same structure. Perhaps the public CSV is older and no longer
sensitive. Preview visualizations will be made with the public data,
but the release will be made with private data.
This is the first project we've developed with Python Shiny, so let's remember what we learned along the way.
To get started, clone the repo and install dev dependencies in a virtual environment:
$ git clone https://github.com/opendp/dp-wizard.git
$ cd dp-wizard
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements-dev.txt
$ pre-commit install
$ playwright install
Now install the application itself and run it:
$ pip install --editable .
$ dp-wizard
Your browser should open and connect you to the application.
Tests should pass, and code coverage should be complete (except blocks we explicitly ignore):
$ ./ci.sh
We're using Playwright for end-to-end tests. You can use it to generate test code just by interacting with the app in a browser:
$ dp-wizard # The server will continue to run, so open a new terminal to continue.
$ playwright codegen http://127.0.0.1:8000/
You can also step through these tests and see what the browser sees:
$ PWDEBUG=1 pytest -k test_app
If Playwright fails in CI, we can still see what went wrong:
- Scroll to the end of the CI log, to
actions/upload-artifact
. - Download the zipped artifact locally.
- Inside the zipped artifact will be another zip:
trace.zip
. - Don't unzip it! Instead, open it with trace.playwright.dev.
Branch names should be of the form NNNN-short-description
, where NNNN
is the issue number being addressed.
Dependencies should be pinned for development, but not pinned when the package is installed.
New dev dependencies can be added to requirements-dev.in
, and then run pip-compile requirements-dev.in
to update requirements-dev.txt
A Github project board provides an overview of the issues and PRs.
graph TD
subgraph Pending
%% We only get one auto-add workflow with the free plan.
%% https://docs.github.com/en/issues/planning-and-tracking-with-projects/automating-your-project/adding-items-automatically
Issue-New
PR-New-or-Changes
end
%% subgraph In Progress
%% How should this be used?
%% Can it be automated
%% end
subgraph Ready for Review
PR-for-Review
end
subgraph In Review
PR-in-Review --> PR-Approved
end
subgraph Done
Issue-Closed
PR-Merged
PR-Closed
end
PR-New-or-Changes -->|manual| PR-for-Review
PR-for-Review -->|manual| PR-in-Review
Issue-New -->|auto| Issue-Closed
PR-New-or-Changes -->|auto| PR-Closed
PR-for-Review -->|auto| PR-Closed
PR-in-Review -->|auto| PR-Closed
PR-for-Review -->|manual| PR-New-or-Changes
PR-in-Review -->|auto| PR-New-or-Changes
PR-Approved -->|auto| PR-Merged
- For
manual
transitions, the status of the issue or PR will need to be updated by hand, either on the issue, or by dragging between columns on the board. - For
auto
transitions, some other action (for example, approving a PR) should trigger a workflow. - These are the only the states that matter. Whether PR is a draft or has assignees does not matter.
- If we need anything more than this, we should consider a paid plan, so that we have access to more workflows.