Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable differential privacy for FixedPointBoundedL2VecSum. #1440

Closed
wants to merge 1 commit into from

Conversation

MxmUrw
Copy link
Contributor

@MxmUrw MxmUrw commented Jun 2, 2023

This comprises the following changes:

  • Add a noise_param parameter to the fixedvec type(s) in the various vdaf enums.
    This describes the amount of noise to be added.
  • Wire up the new postprocess() function of prio to be called in both leader
    and helper. The noising itself happens in the implementation of the fixedvec
    type in prio, see the relevant pr (adding differential privacy to FixedPointBoundedL2VecSum libprio-rs#578).

Notes:

  1. We currently call the postprocess() function of prio in two seperate locations for the leader and for the helper, as it seems that their respective aggregation mechanism utilizes different codepaths. I can imagine that there could be a better way to hook up our noising function. Is there?
  2. We have a new noise_param argument for our vdaf. Should we add such an argument to tools/collect? Currently, a default value (no noise) is used when instantiating our vdaf.
  3. We added an experimental feature for janus/aggregator and janus/aggregator_core which depends on the experimental feature for prio.
  4. We added a $FEATURES argument to the dockerfile which allows additional janus features to be enabled, for example from our external docker-compose.yml.

Tasks:

  • Switch prio dependency back to crates.io, after our other pr is merged.

This comprises the following changes:
 - Add a `noise_param` parameter to the fixedvec type(s) in the various vdaf enums.
   This describes the amount of noise to be added.
 - Wire up the new `postprocess()` function of prio to be called in both leader
   and helper. The noising itself happens in the implementation of the fixedvec
   type in prio, see the relevant pr (divviup/libprio-rs#578).
@MxmUrw MxmUrw requested a review from a team as a code owner June 2, 2023 12:14
@MxmUrw MxmUrw changed the title Enable differential privacy for the fixedvec type. Enable differential privacy for FixedPointBoundedL2VecSum. Jun 2, 2023
Copy link
Contributor

@tgeoghegan tgeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't really review this until the corresponding libprio-rs change is done, but we can have some useful discussion in the meantime.

We currently call the postprocess() function of prio in two seperate locations for the leader and for the helper, as it seems that their respective aggregation mechanism utilizes different codepaths. I can imagine that there could be a better way to hook up our noising function. Is there?

Both helper and leader paths ultimately call aggregator::aggregate_share::compute_aggregate_share to get an <A as vdaf::Aggregator>::AggregateShare. I think that would be the place to apply noise. Both because you'd call it once for either role, but also because in that scope, you have awareness of how many contributions went into the aggregate share, and my intuition is that some DP schemes will want that information to tune the noise they apply.

We have a new noise_param argument for our vdaf. Should we add such an argument to tools/collect? Currently, a default value (no noise) is used when instantiating our vdaf.

IIUC the noise parameter is meaningless for the collector, which is why over in divviup/libprio-rs#578, I argued for changing the interface so that only the vdaf::Aggregator trait deals with DP parameters. But I think we'll have to revisit the question of what the DAP collector needs to do (or not) once we have agreed on the libprio-rs level change.

We added an experimental feature for janus/aggregator and janus/aggregator_core which depends on the experimental feature for prio.

While it's a bit icky semantically, I'd prefer to put this behind the existing fpvec_bounded_l2 to minimize feature combinations.

We added a $FEATURES argument to the dockerfile which allows additional janus features to be enabled, for example from our external docker-compose.yml.

Yes, this is a good idea!

@tgeoghegan tgeoghegan requested a review from a team June 9, 2023 23:08
Copy link
Contributor

@branlwyd branlwyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a new noise_param argument for our vdaf. Should we add such an argument to tools/collect? Currently, a default value (no noise) is used when instantiating our vdaf.

IMO, feel free to extend tooling to be aware of the new VDAF parameter, gated behind an appropriate feature flag.

@@ -199,6 +199,11 @@ impl VdafOps {
}
}

// Postprocess the aggregated shares. This allows, e.g., for central differential privacy,
// but the implementation is experimental.
#[cfg(feature = "experimental")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we (eventually?) want to drop the "experimental" flag here -- I would strongly prefer that VDAF-handling be uniform, even if the VDAFs we use in Divvi Up happen to have a no-op postprocess implementation.

Do you have thoughts on whether it will be appropriate to drop this specific feature gate in the future, perhaps once the related libprio-rs changes land?

// Postprocess the aggregated shares. This allows, e.g., for central differential privacy,
// but the implementation is experimental.
#[cfg(feature = "experimental")]
accumulator.postprocess(&vdaf).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about postprocess semantics: this code runs every time that an aggregation job is stepped, which ultimately causes postprocess to be called on every batch aggregation related to the aggregation job. If multiple aggregation jobs write into the same batch, this might lead to postprocess being called on the same batch aggregation multiple times. This processing happens before the share is merged into the existing batch aggregation.

Is this expected from the POV of VDAF that uses this currently? More generally, do these semantics make sense for postprocess for all VDAFs? (I would naively expect that we'd want postprocess to be called once per batch aggregation, or perhaps once per collection, likely as part of the collection process -- the reasoning here is that the grouping of reports into aggregation jobs is implementation-specific, and I suspect many postprocess implementations would be easier to write if they did not have to account for being called on a given batch aggregation an arbitrary number of times. But, again, that might be a naive perception.)

(If there's prior discussion/documentation of the postprocess semantics, apologies, I couldn't find it easily on the related libprio-rs PR or issue.)

// Postprocess the aggregated shares. This allows, e.g., for central differential privacy,
// but the implementation is experimental.
#[cfg(feature = "experimental")]
accumulator.postprocess(&vdaf)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could postprocess be part of accumulator.flush_to_datastore? (this would cause it to be rerun if a technical issue causes a transaction retry in the transaction containing flush_to_datastore, so both performance & correctness might be good reasons not to do this -- though if this is a correctness concern see my other comment about postprocess' expected semantics)

@MxmUrw
Copy link
Contributor Author

MxmUrw commented Sep 12, 2023

Closing in favor of the up-to-date PR: #1892.

@MxmUrw MxmUrw closed this Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants