Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DP noise during aggregation for Prio3SumVec, Prio3Histogram #1072

Merged
merged 21 commits into from
Jul 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,019 changes: 1,019 additions & 0 deletions documentation/Pure DP Mechanism.lyx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the LyX system, but I think we should be cautious about adding new file formats for documentation. I don't suppose it can render to Markdown? We already have several Markdown documents across our projects, and the format is well-supported on GitHub and docs.rs, which are the principal ways we expect people to consume libprio-rs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some instructions to do so using pandoc here: https://wiki.lyx.org/Tips/ConvertMarkdown. I'm going to try that out and see how GitHub handles the results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running into a number of issues with this.

  • MathJax doesn't support \nicefrac
  • GitHub Markdown has a different experimental syntax for "alerts", like > [!NOTE], etc. Pandoc is using the same triple colon syntax for definitions and proofs here that Docusaurus uses for inset notes, etc.
  • There seems to be a precedence issue in GitHub's Markdown parser, because underscores inside different $ equations are getting grouped together, interpreted as italicizing the text they contain, and thus breaking equations. I've seen suggestions to escape the underscore with a backslash, but those come along with complaints that other tools don't handle this the same way, and thus render literal underscores inside equations.
  • Decorations around the big sigmas and pis are not showing up right, this must be another MathJax incompatibility.
  • The bibliographic reference did not get converted or rendered right, it may need to be turned into a footnote.

Ultimately, I think we'll need to pick one ecosystem, either pandoc/LaTeX or GitHub-flavored Markdown, and target it, as there are some fundamental incompatibilities in Markdown handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I don't want to be too difficult here, because you put a lot of effort into good docs, and LaTeX is more natural for math at this level. Two further ideas:

  • Can we get this to where it requires a LaTeX installaton but not LyX? Can a .lyx file be handled by LaTeX? If not, can we use a format that is LaTeX friendly?
  • Could we check in a PDF of the document so that there's something in the repo people can read without needing additional tools?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I worked through most of the way through these issues, but two remain. Some absolute value bars disappear on my browser at 100% zoom level, but reappear at 110%. In one equation, we have a sum in an exponent, and that is falling apart. It looks fine when I try to render it using the example JSBin link on mathjax.org, but here both bounds just show up below the sigma.

I can check in either a .tex file or a .pdf file. The PDF is only ~180kB, so I think that should be okay to check in, especially since it'll be the most straightforward to read.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well written LaTeX is very readable :)

Large diffs are not rendered by default.

Binary file added documentation/Pure DP Mechanism.pdf
Binary file not shown.
62 changes: 62 additions & 0 deletions src/dp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ pub enum DpError {
/// Tried to convert BigInt into something incompatible.
#[error("DP error: {0}")]
BigIntConversion(#[from] TryFromBigIntError<BigInt>),

/// Invalid parameter value.
#[error("invalid parameter: {0}")]
InvalidParameter(String),
}

/// Positive arbitrary precision rational number to represent DP and noise distribution parameters in
Expand Down Expand Up @@ -95,13 +99,57 @@ impl ZCdpBudget {
/// for a `rho`-ZCDP budget.
///
/// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf
// TODO(#1095): This should be fallible, and it should return an error if epsilon is zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it panic on epsilon == 0 instead? Or do we want crate prio to defend itself gracefully from misconfiguration at a higher level (i.e., Janus and its control plane)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently panics when adding noise, due to a division by zero. Ideally prio as a library crate shouldn't panic at all, and just return an error upon misconfiguration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, input validation should err. I think panics are more appropriate in situations where a situation shouldn't occur assuming input validation was performed.

pub fn new(epsilon: Rational) -> Self {
Self { epsilon: epsilon.0 }
}
}

impl DifferentialPrivacyBudget for ZCdpBudget {}

/// Pure differential privacy budget. (&epsilon;-DP or (&epsilon;, 0)-DP)
cjpatton marked this conversation as resolved.
Show resolved Hide resolved
#[derive(Clone, Debug, Eq, PartialEq, Serialize, Ord, PartialOrd)]
pub struct PureDpBudget {
epsilon: Ratio<BigUint>,
}

impl PureDpBudget {
/// Create a budget for parameter `epsilon`.
pub fn new(epsilon: Rational) -> Result<Self, DpError> {
if epsilon.0.numer() == &BigUint::ZERO {
return Err(DpError::InvalidParameter("epsilon cannot be zero".into()));
}
Ok(Self { epsilon: epsilon.0 })
}
}

impl DifferentialPrivacyBudget for PureDpBudget {}

/// This module encapsulates a deserialization helper struct. It is needed so we can wrap its
/// derived `Deserialize` implementation in a customized `Deserialize` implementation, which makes
/// use of the budget's constructor to enforce input validation invariants.
mod budget_serde {
use num_bigint::BigUint;
use num_rational::Ratio;
use serde::{de, Deserialize};

#[derive(Deserialize)]
pub struct PureDpBudget {
epsilon: Ratio<BigUint>,
}

impl<'de> Deserialize<'de> for super::PureDpBudget {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let helper = PureDpBudget::deserialize(deserializer)?;
super::PureDpBudget::new(super::Rational(helper.epsilon))
.map_err(|_| de::Error::custom("epsilon cannot be zero"))
}
}
}

/// Strategy to make aggregate results differentially private, e.g. by adding noise from a specific
/// type of distribution instantiated with a given DP budget.
pub trait DifferentialPrivacyStrategy {
Expand All @@ -126,3 +174,17 @@ pub trait DifferentialPrivacyStrategy {
}

pub mod distributions;

#[cfg(test)]
mod tests {
use serde_json::json;

use super::PureDpBudget;

#[test]
fn budget_deserialization() {
serde_json::from_value::<PureDpBudget>(json!({"epsilon": [[1], [1]]})).unwrap();
serde_json::from_value::<PureDpBudget>(json!({"epsilon": [[0], [1]]})).unwrap_err();
serde_json::from_value::<PureDpBudget>(json!({"epsilon": [[1], [0]]})).unwrap_err();
}
}
87 changes: 86 additions & 1 deletion src/dp/distributions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ use serde::{Deserialize, Serialize};

use super::{
DifferentialPrivacyBudget, DifferentialPrivacyDistribution, DifferentialPrivacyStrategy,
DpError, ZCdpBudget,
DpError, PureDpBudget, ZCdpBudget,
};

/// Sample from the Bernoulli(gamma) distribution, where $gamma /leq 1$.
Expand Down Expand Up @@ -262,6 +262,8 @@ where
}

/// A DP strategy using the discrete gaussian distribution providing zero-concentrated DP.
///
/// This uses L2-sensitivity, with the substitution definition of neighboring datasets.
pub type ZCdpDiscreteGaussian = DiscreteGaussianDpStrategy<ZCdpBudget>;

impl DifferentialPrivacyStrategy for DiscreteGaussianDpStrategy<ZCdpBudget> {
Expand All @@ -287,6 +289,89 @@ impl DifferentialPrivacyStrategy for DiscreteGaussianDpStrategy<ZCdpBudget> {
}
}

/// Samples `BigInt` numbers according to the discrete Laplace distribution, with the given scale
/// parameter. The distribution is defined over the integers, represented by arbitrary-precision
/// integers. The sampling procedure follows [[CKS20]].
///
/// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf
pub struct DiscreteLaplace {
/// The scale parameter of the distribution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this $t$ in your writeup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though I've seen t, b, and sigma used for it. I think I just want to refer to it as the "scale" in documentation for clarity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, scale is a good term to use here. I just wanted to clarify. Perhaps mention in the write up that $t = \texttt{scale}$.

scale: Ratio<BigUint>,
}

impl DiscreteLaplace {
/// Create a new sampler for the discrete Laplace distribution with the given scale parameter.
/// Returns an error if the scale parameter is zero or if it has a denominator of zero.
pub fn new(scale: Ratio<BigUint>) -> Result<Self, DpError> {
if scale.denom().is_zero() {
return Err(DpError::ZeroDenominator);
}
if scale.numer().is_zero() {
return Err(DpError::InvalidParameter(
"the scale of the discrete Laplace distribution must be nonzero".into(),
cjpatton marked this conversation as resolved.
Show resolved Hide resolved
));
divergentdave marked this conversation as resolved.
Show resolved Hide resolved
}
Ok(Self { scale })
}
}

impl Distribution<BigInt> for DiscreteLaplace {
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> BigInt {
sample_discrete_laplace(&self.scale, rng)
}
}

impl DifferentialPrivacyDistribution for DiscreteLaplace {}

/// A DP strategy using the discrete Laplace distribution.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Ord, PartialOrd)]
pub struct DiscreteLaplaceDpStrategy<B>
where
B: DifferentialPrivacyBudget,
{
budget: B,
}

/// A DP strategy using the discrete Laplace distribution, providing pure DP.
///
/// This uses L1-sensitivity, with the substitution definition of neighboring datasets.
pub type PureDpDiscreteLaplace = DiscreteLaplaceDpStrategy<PureDpBudget>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When naming this stuff, we should make sure to leave room for new DP schemes that may arrive soon, based on the drafts mt and the Bens are working on.

Is this the only DP strategy we'd ever implement that is pure and samples the discrete Laplace distribution? IIUC (but I could easily be wrong), this strategy is where each aggregator independently adds sufficient noise to protect against either aggregator defecting, but the incoming schemes use MPC to meet privacy goals with less noise. So should this be PureDpDiscreteLaplaceButWithMaybeTwiceAsMuchNoiseAsYouNeedSorryAboutThat? (deliberately terrible name to force a better choice)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the binomial noise in MPC mechanism would be different enough that it wouldn't need to interact with these traits, so I'm not too worried about namespace overlap. Plus, this strategy ultimately gets passed into an AggregatorWithNoise implementation, which makes it pretty explicit how noise is added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A binomial noise sampler w/o MPC seems conceivable, but I don't think we need to plan for this.


impl DifferentialPrivacyStrategy for PureDpDiscreteLaplace {
type Budget = PureDpBudget;
type Distribution = DiscreteLaplace;
type Sensitivity = Ratio<BigUint>;

fn from_budget(budget: Self::Budget) -> Self {
DiscreteLaplaceDpStrategy { budget }
}

/// Create a new sampler for the discrete Laplace distribution with a scale parameter calibrated
/// to provide `epsilon`-differential privacy when added to the result of an integer-valued
/// function with L1-sensitivity `sensitivity`.
///
/// A mechanism is defined for 1-dimensional query results in [[GRS12]], and restated in Lemma
/// 29 from [[CKS20]]. However, most VDAF instances will produce query results of higher
/// dimensions. Proposition 1 of [[DMNS06]] gives a mechanism for multidimensional queries using
/// the continuous Laplace distribution. In both cases, the scale parameter of the respective
/// distribution is set to the sensitivity divided by epsilon, and independent samples from the
/// distribution are added to each component of the query result. Intuitively, adding discrete
/// Laplace noise using this scale parameter to each vector element of the query result should
/// provide epsilon-DP, since continuous Laplce noise can be used in the multi-dimensional case,
/// and discrete and continuous Laplace noise provide the same pure DP with the same parameters
/// in the one-dimensional case.
///
/// [GRS12]: https://theory.stanford.edu/~tim/papers/priv.pdf
/// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf
/// [DMNS06]: https://people.csail.mit.edu/asmith/PS/sensitivity-tcc-final.pdf
fn create_distribution(
&self,
sensitivity: Self::Sensitivity,
) -> Result<Self::Distribution, DpError> {
DiscreteLaplace::new(sensitivity / &self.budget.epsilon)
}
}

#[cfg(test)]
mod tests {

Expand Down
1 change: 1 addition & 0 deletions src/flp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,7 @@ where
S: DifferentialPrivacyStrategy,
{
/// Add noise to the aggregate share to obtain differential privacy.
// TODO(#1073): Rename to add_noise_to_agg_share.
fn add_noise_to_result(
&self,
dp_strategy: &S,
Expand Down
3 changes: 3 additions & 0 deletions src/flp/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ use std::fmt::{self, Debug};
use std::marker::PhantomData;
use subtle::Choice;

#[cfg(feature = "experimental")]
cjpatton marked this conversation as resolved.
Show resolved Hide resolved
mod dp;

/// The counter data type. Each measurement is `0` or `1` and the aggregate result is the sum of the
/// measurements (i.e., the total number of `1s`).
#[derive(Clone, PartialEq, Eq)]
Expand Down
Loading