-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable differential privacy for FixedPointBoundedL2VecSum. #1440
Conversation
This comprises the following changes: - Add a `noise_param` parameter to the fixedvec type(s) in the various vdaf enums. This describes the amount of noise to be added. - Wire up the new `postprocess()` function of prio to be called in both leader and helper. The noising itself happens in the implementation of the fixedvec type in prio, see the relevant pr (divviup/libprio-rs#578).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't really review this until the corresponding libprio-rs change is done, but we can have some useful discussion in the meantime.
We currently call the postprocess() function of prio in two seperate locations for the leader and for the helper, as it seems that their respective aggregation mechanism utilizes different codepaths. I can imagine that there could be a better way to hook up our noising function. Is there?
Both helper and leader paths ultimately call aggregator::aggregate_share::compute_aggregate_share
to get an <A as vdaf::Aggregator>::AggregateShare
. I think that would be the place to apply noise. Both because you'd call it once for either role, but also because in that scope, you have awareness of how many contributions went into the aggregate share, and my intuition is that some DP schemes will want that information to tune the noise they apply.
We have a new noise_param argument for our vdaf. Should we add such an argument to tools/collect? Currently, a default value (no noise) is used when instantiating our vdaf.
IIUC the noise parameter is meaningless for the collector, which is why over in divviup/libprio-rs#578, I argued for changing the interface so that only the vdaf::Aggregator
trait deals with DP parameters. But I think we'll have to revisit the question of what the DAP collector needs to do (or not) once we have agreed on the libprio-rs level change.
We added an experimental feature for janus/aggregator and janus/aggregator_core which depends on the experimental feature for prio.
While it's a bit icky semantically, I'd prefer to put this behind the existing fpvec_bounded_l2
to minimize feature combinations.
We added a $FEATURES argument to the dockerfile which allows additional janus features to be enabled, for example from our external docker-compose.yml.
Yes, this is a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a new noise_param argument for our vdaf. Should we add such an argument to tools/collect? Currently, a default value (no noise) is used when instantiating our vdaf.
IMO, feel free to extend tooling to be aware of the new VDAF parameter, gated behind an appropriate feature flag.
@@ -199,6 +199,11 @@ impl VdafOps { | |||
} | |||
} | |||
|
|||
// Postprocess the aggregated shares. This allows, e.g., for central differential privacy, | |||
// but the implementation is experimental. | |||
#[cfg(feature = "experimental")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we (eventually?) want to drop the "experimental" flag here -- I would strongly prefer that VDAF-handling be uniform, even if the VDAFs we use in Divvi Up happen to have a no-op postprocess
implementation.
Do you have thoughts on whether it will be appropriate to drop this specific feature gate in the future, perhaps once the related libprio-rs
changes land?
// Postprocess the aggregated shares. This allows, e.g., for central differential privacy, | ||
// but the implementation is experimental. | ||
#[cfg(feature = "experimental")] | ||
accumulator.postprocess(&vdaf).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about postprocess
semantics: this code runs every time that an aggregation job is stepped, which ultimately causes postprocess
to be called on every batch aggregation related to the aggregation job. If multiple aggregation jobs write into the same batch, this might lead to postprocess
being called on the same batch aggregation multiple times. This processing happens before the share is merged into the existing batch aggregation.
Is this expected from the POV of VDAF that uses this currently? More generally, do these semantics make sense for postprocess
for all VDAFs? (I would naively expect that we'd want postprocess
to be called once per batch aggregation, or perhaps once per collection, likely as part of the collection process -- the reasoning here is that the grouping of reports into aggregation jobs is implementation-specific, and I suspect many postprocess
implementations would be easier to write if they did not have to account for being called on a given batch aggregation an arbitrary number of times. But, again, that might be a naive perception.)
(If there's prior discussion/documentation of the postprocess
semantics, apologies, I couldn't find it easily on the related libprio-rs
PR or issue.)
// Postprocess the aggregated shares. This allows, e.g., for central differential privacy, | ||
// but the implementation is experimental. | ||
#[cfg(feature = "experimental")] | ||
accumulator.postprocess(&vdaf)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could postprocess
be part of accumulator.flush_to_datastore
? (this would cause it to be rerun if a technical issue causes a transaction retry in the transaction containing flush_to_datastore
, so both performance & correctness might be good reasons not to do this -- though if this is a correctness concern see my other comment about postprocess
' expected semantics)
Closing in favor of the up-to-date PR: #1892. |
This comprises the following changes:
noise_param
parameter to the fixedvec type(s) in the various vdaf enums.This describes the amount of noise to be added.
postprocess()
function of prio to be called in both leaderand helper. The noising itself happens in the implementation of the fixedvec
type in prio, see the relevant pr (adding differential privacy to FixedPointBoundedL2VecSum libprio-rs#578).
Notes:
postprocess()
function of prio in two seperate locations for the leader and for the helper, as it seems that their respective aggregation mechanism utilizes different codepaths. I can imagine that there could be a better way to hook up our noising function. Is there?noise_param
argument for our vdaf. Should we add such an argument totools/collect
? Currently, a default value (no noise) is used when instantiating our vdaf.experimental
feature forjanus/aggregator
andjanus/aggregator_core
which depends on theexperimental
feature for prio.$FEATURES
argument to the dockerfile which allows additional janus features to be enabled, for example from our externaldocker-compose.yml
.Tasks: