-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding differential privacy to FixedPointBoundedL2VecSum #578
Conversation
The relevant pull request is here: divviup/libprio-rs#578.
The relevant pull request is here: divviup/libprio-rs#578.
This comprises of the following changes: - Add a `noise_param` parameter to the fixedvec type(s) in the various vdaf enums. This describes the amount of noise to be added. - Wire up the new `postprocess()` function of prio to be called in both leader and helper. The noising itself happens in the implementation of the fixedvec type in prio, see the relevant pr (divviup/libprio-rs#578).
This comprises the following changes: - Add a `noise_param` parameter to the fixedvec type(s) in the various vdaf enums. This describes the amount of noise to be added. - Wire up the new `postprocess()` function of prio to be called in both leader and helper. The noising itself happens in the implementation of the fixedvec type in prio, see the relevant pr (divviup/libprio-rs#578).
This comprises the following changes: - Add a `noise_param` parameter to the fixedvec type(s) in the various vdaf enums. This describes the amount of noise to be added. - Wire up the new `postprocess()` function of prio to be called in both leader and helper. The noising itself happens in the implementation of the fixedvec type in prio, see the relevant pr (divviup/libprio-rs#578).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high level comments to start! By the way I agree with the assessment that some sort of posprocessing step is needed, since in both DAP implementations (Janus/Daphne) aggregate()
is not used directly to aggregate things (an accumulator is used instead).
I think Tim's proposal of moving the parameter for differential privacy into an associated type on I have gone ahead and updated the code as suggested, applying some of the renaming-suggestions as well (ad81ed5). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ooovi, @MxmUrw: We met to discuss this PR today and decided that merging the discrete Gaussian stuff is the best option. (See inline comments.) The main concern right now is getting the API changes right. At a high level our requirements are:
- Decouple the primitives from the VDAF implementation. Eventually what we would like to see is a module containing a variety of primitives suitable for building different DP mechanisms (client-side, server-side, usw.).
- Make the primitives as generic as possible. Ideally the Gaussian sampler would take as input the
$\epsilon$ ,$\delta$ , sensitivity, and whatever other parameters are required and output a vector that can be added to an aggregate share. The same primitive should be useful for any VDAF for which it is compatible. - Ideally it would be possible for a user to use any DP mechanism they want (or none at all), so long as it's compatible with the given VDAF. Right now Prio3FixedPointBoundedL2VecSum can only be used with one DP mechanism (central-DP w/ discrete Gaussian).
I think the next thing to do is figure out the API changes for VDAF, in a separate PR. We sketched something in our meeting today and we plan to put up a PR soon. Would you please take a look when it's ready? Once it's merged, we can rebase this PR on top of it. Does that sound like a plan?
2e21f75
to
fa92453
Compare
1073d8f
to
4a3951d
Compare
Hi, we are wondering about what the best way for dealing with randomness seeds is: We want the There are multiple options:
|
The first option sounds good, we do effectively the same thing for sharding, except passing in seeds themselves. |
Hello again, we have rebased our PR, and are ready to discuss! There are quite a few changes, so here is a summary:
Concerning (1):
EDIT: Would reviewing be easier if we closed this, and opened a new PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How do the various DP traits relate to one another? It might be useful to put together little diagram and an explanation of it in the module documentation for
dp
. - I'm not sure that the implementation of
add_noise_to_agg_share
forFixedPointBoundedL2VecSum
is correct, but I might be misunderstanding the code. - Regarding
experimental_strategy
module: What exactly is this demonstrating? In other words, could the use case that @divergentdave mentions be addressed differently?
This was suggested by @divergentdave here: divviup#578 (comment) Also renaming `Type::add_noise_to_aggregate_share` to `add_noise_to_result`.
The general architecture comes from #607, there are three traits:
Directly, the public interface only mentions the strategy. Originally in @tholop's PR, all three were pure marker traits (didn't require anything for implementation). I am of the opinion that it would be useful to actually axiomatize what a "strategy" is (i.e. require references to the distribution and budget, as well as methods connecting them).
Sure, though I would prefer to do so when it is clear whether my proposal is going to be adopted (and with which changes), or we revert back to the previous state.
Yes of course it could! The goal of my idea is that common patterns, i.e., the idea of what a "strategy" is, to be actually encoded in the trait interface. It is possible to skip this, and implement the noising for every Prio3 type individually for every strategy which it intends to support. The cost of code duplication probably wouldn't even be too much, as there is not too much infrastructural moving around of data happening. My opinion is that all invariants and expectations of behaviour for a given set of objects (in this case implementors of a trait), which can be encoded statically in the type system, should be. In the sense of "code is the best documentation", as it can be checked by the compiler. Thus what the
Do you have specific pointers why these doubts come up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How do the various DP traits relate to one another?
The general architecture comes from #607, there are three traits:
* `DifferentialPrivacyBudget`: Implementors should be types of budgets for DP-mechanisms. Examples: zCDP, ApproximateDP (Epsilon-Delta), PureDP * `DifferentialPrivacyDistribution`: Distribution from which noise is sampled. Examples: DiscreteGaussian, DiscreteLaplace * `DifferentialPrivacyStrategy`: This is a combination of choices for budget _and_ distribution. Examples: zCDP-DiscreteGaussian, EpsilonDelta-DiscreteGaussian
Directly, the public interface only mentions the strategy. Originally in @tholop's PR, all three were pure marker traits (didn't require anything for implementation). I am of the opinion that it would be useful to actually axiomatize what a "strategy" is (i.e. require references to the distribution and budget, as well as methods connecting them).
That works for me. I would just dump this explanation in the dp
module's docucomment.
Quick question: If DifferentialPrivacyStrategy actually needed? From your description it sounds like its functionality is already implied by a choice of DifferentialPrivacyBudget and DifferentialPrivacyDistribution. I'm sure I'm missing something obvious ...
It might be useful to put together little diagram and an explanation of it in the module documentation for
dp
.Sure, though I would prefer to do so when it is clear whether my proposal is going to be adopted (and with which changes), or we revert back to the previous state.
Ack.
- Regarding
experimental_strategy
module: What exactly is this demonstrating? In other words, could the use case that @divergentdave mentions be addressed differently?Yes of course it could! The goal of my idea is that common patterns, i.e., the idea of what a "strategy" is, to be actually encoded in the trait interface. It is possible to skip this, and implement the noising for every Prio3 type individually for every strategy which it intends to support. The cost of code duplication probably wouldn't even be too much, as there is not too much infrastructural moving around of data happening.
My opinion is that all invariants and expectations of behaviour for a given set of objects (in this case implementors of a trait), which can be encoded statically in the type system, should be. In the sense of "code is the best documentation", as it can be checked by the compiler.
I agree in principle, but from practical experience there is a degree to which adding too many trait bounds becomes cumbersome. Either way I think we're in a good place with the traits we already have (Budget, Distribution, Strategy (although see my question above).
Thus what the
experimental_strategy
shows is that, assuming we make theStrategy
trait as explicit as it currently is, that then the one thing which @divergentdave doubted would be possible, is still possible. In case you don't operate on the same assumption as stated above in italic, my addition probably does not have advantages besides making slightly more generic code possible. I could make an example which shows how generic noising for the simpler prio types could be implemented with my version of the trait.
Sounds good. Unless we intend to use it for this PR, I think we should remove the experimental_strategy.rs
module. Better to only check in code we intend to use!
- I'm not sure that the implementation of
add_noise_to_agg_share
forFixedPointBoundedL2VecSum
is correct, but I might be misunderstanding the code.Do you have specific pointers why these doubts come up?
I've already commented in-line but it seems that they've gotten lost somehow. I'll follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed distribution.rs
. My biggest worry is that the OpenDP implementation doesn't follow the CKS20 pseudocode line-by-line, so it's hard to assess its correctness. I left some comments for parts that I don't understand.
The
I agree. Once the design is accepted, I'll remove it. I'm kind of waiting for @divergentdave's opinion. |
I think there might be an issue with the caching of |
The new exemptions and |
Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
dd50c94
to
29067d9
Compare
Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
29067d9
to
e34885b
Compare
Thanks, I did, and I also ran And, we're green! |
Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
e34885b
to
fb5363b
Compare
Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
fb5363b
to
81306a0
Compare
Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
81306a0
to
3d0c502
Compare
Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
3d0c502
to
e5bdff3
Compare
@divergentdave, I was a little bit unsure about the rebase onto 87d4a65, it looks like this now:
|
@cjpatton , @tgeoghegan is there something left we need to do to get this merged? |
If Chris and David are happy with the change, then I think we can take it. |
@tgeoghegan I just wanted to flag #578 (review) in case you wanted to double check that the licensing is good to go. |
We discussed the licensing internally and we're satisfied with it; merging now. |
commit 3d5f759 Author: ovi <ovi@posteo.de> Date: Fri Sep 8 17:57:46 2023 +0200 ordering ZCdpBudget commit d6a9985 Author: ovi <ovi@posteo.de> Date: Fri Sep 8 17:53:18 2023 +0200 ordering ZCdpDiscreteGaussians commit d2cdd4c Author: ovi <ovi@posteo.de> Date: Mon Sep 4 16:17:38 2023 +0200 non-pub epsilon commit cfef6a7 Author: ovi <ovi@posteo.de> Date: Mon Sep 4 15:07:01 2023 +0200 pub budget commit eae7e4e Author: ovi <ovi@posteo.de> Date: Mon Sep 4 14:32:57 2023 +0200 Display for error commit 410427e Author: ovi <ovi@posteo.de> Date: Mon Sep 4 13:53:37 2023 +0200 FromStr for Rational commit 59f5522 Author: ovi <ovi@posteo.de> Date: Sun Sep 3 14:13:20 2023 +0200 Display Rational commit 3ac128d Author: ovi <ovi@posteo.de> Date: Sun Sep 3 13:11:47 2023 +0200 trying a thing commit 42e6cad Author: ovi <ovi@posteo.de> Date: Sat Sep 2 03:07:45 2023 +0200 yet more derives commit d43961a Author: ovi <ovi@posteo.de> Date: Sat Sep 2 01:39:16 2023 +0200 yet more derives commit 76d20d9 Author: Maxim Urschumzew <u.maxim@live.de> Date: Thu Aug 31 08:59:19 2023 +0200 Add more derives. commit 7bced48 Author: Maxim Urschumzew <u.maxim@live.de> Date: Thu Aug 31 08:53:04 2023 +0200 Add derive. commit fcdc568 Author: Maxim Urschumzew <u.maxim@live.de> Date: Tue Aug 29 11:51:53 2023 +0200 Add serialize for rational. commit bd41119 Author: Maxim Urschumzew <u.maxim@live.de> Date: Sun Aug 27 18:25:33 2023 +0200 Add default impl for poplar. commit 2437364 Author: Maxim Urschumzew <u.maxim@live.de> Date: Sun Aug 27 18:21:15 2023 +0200 Add default implementations. commit 7c45460 Author: Maxim Urschumzew <u.maxim@live.de> Date: Sun Aug 27 17:47:41 2023 +0200 Add default implementation for `TypeWithNoise`. commit e5bdff3 Author: Maxim Urschumzew <u.maxim@live.de> Date: Mon Aug 14 13:34:03 2023 +0200 Add differential privacy to `FixedPointBoundedL2VecSum` (divviup#578). Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
commit 3d5f759 Author: ovi <ovi@posteo.de> Date: Fri Sep 8 17:57:46 2023 +0200 ordering ZCdpBudget commit d6a9985 Author: ovi <ovi@posteo.de> Date: Fri Sep 8 17:53:18 2023 +0200 ordering ZCdpDiscreteGaussians commit d2cdd4c Author: ovi <ovi@posteo.de> Date: Mon Sep 4 16:17:38 2023 +0200 non-pub epsilon commit cfef6a7 Author: ovi <ovi@posteo.de> Date: Mon Sep 4 15:07:01 2023 +0200 pub budget commit eae7e4e Author: ovi <ovi@posteo.de> Date: Mon Sep 4 14:32:57 2023 +0200 Display for error commit 410427e Author: ovi <ovi@posteo.de> Date: Mon Sep 4 13:53:37 2023 +0200 FromStr for Rational commit 59f5522 Author: ovi <ovi@posteo.de> Date: Sun Sep 3 14:13:20 2023 +0200 Display Rational commit 3ac128d Author: ovi <ovi@posteo.de> Date: Sun Sep 3 13:11:47 2023 +0200 trying a thing commit 42e6cad Author: ovi <ovi@posteo.de> Date: Sat Sep 2 03:07:45 2023 +0200 yet more derives commit d43961a Author: ovi <ovi@posteo.de> Date: Sat Sep 2 01:39:16 2023 +0200 yet more derives commit 76d20d9 Author: Maxim Urschumzew <u.maxim@live.de> Date: Thu Aug 31 08:59:19 2023 +0200 Add more derives. commit 7bced48 Author: Maxim Urschumzew <u.maxim@live.de> Date: Thu Aug 31 08:53:04 2023 +0200 Add derive. commit fcdc568 Author: Maxim Urschumzew <u.maxim@live.de> Date: Tue Aug 29 11:51:53 2023 +0200 Add serialize for rational. commit bd41119 Author: Maxim Urschumzew <u.maxim@live.de> Date: Sun Aug 27 18:25:33 2023 +0200 Add default impl for poplar. commit 2437364 Author: Maxim Urschumzew <u.maxim@live.de> Date: Sun Aug 27 18:21:15 2023 +0200 Add default implementations. commit 7c45460 Author: Maxim Urschumzew <u.maxim@live.de> Date: Sun Aug 27 17:47:41 2023 +0200 Add default implementation for `TypeWithNoise`. commit e5bdff3 Author: Maxim Urschumzew <u.maxim@live.de> Date: Mon Aug 14 13:34:03 2023 +0200 Add differential privacy to `FixedPointBoundedL2VecSum` (divviup#578). Co-Authored-By: Olivia <ovi@posteo.de> Co-Authored-By: Maxim Urschumzew <u.maxim@live.de>
As announced in #440, we added a feature for our fixed point vector type to add discrete gaussian noise to the aggregation result.
Changes to prio code that is not part of our type, currently all hidden behind the
experimental
feature flag:Type
trait got anadd_noise
function that's the identity unless overridden.Aggregator
trait got apostprocess
function that's the identity unless overridden. TheAggregator
implementation forPrio3
overrides it calling theadd_noise
functionOur type overrides the default implementation of
add_noise
, and adds noise obtained using our sampler implementation.Looking forward to your comments! :)
Co-Authored-By: Olivia ovi@posteo.de
Co-Authored-By: Maxim Urschumzew u.maxim@live.de