-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for DP noise during aggregation for Prio3SumVec, Prio3Histogram #1072
Changes from all commits
fddd370
6bbf508
e580b4d
5deba10
d3fe805
4f2be3b
9d57ec5
6ae3534
c45f86c
fd7da10
0908062
ef963f3
61c46af
4336562
f66a277
3af8c31
62d83a6
1c01b5b
7222eb6
bb0c510
5bf76b7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,10 @@ pub enum DpError { | |
/// Tried to convert BigInt into something incompatible. | ||
#[error("DP error: {0}")] | ||
BigIntConversion(#[from] TryFromBigIntError<BigInt>), | ||
|
||
/// Invalid parameter value. | ||
#[error("invalid parameter: {0}")] | ||
InvalidParameter(String), | ||
} | ||
|
||
/// Positive arbitrary precision rational number to represent DP and noise distribution parameters in | ||
|
@@ -95,13 +99,57 @@ impl ZCdpBudget { | |
/// for a `rho`-ZCDP budget. | ||
/// | ||
/// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf | ||
// TODO(#1095): This should be fallible, and it should return an error if epsilon is zero. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could it panic on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It currently panics when adding noise, due to a division by zero. Ideally There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, input validation should err. I think panics are more appropriate in situations where a situation shouldn't occur assuming input validation was performed. |
||
pub fn new(epsilon: Rational) -> Self { | ||
Self { epsilon: epsilon.0 } | ||
} | ||
} | ||
|
||
impl DifferentialPrivacyBudget for ZCdpBudget {} | ||
|
||
/// Pure differential privacy budget. (ε-DP or (ε, 0)-DP) | ||
cjpatton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#[derive(Clone, Debug, Eq, PartialEq, Serialize, Ord, PartialOrd)] | ||
pub struct PureDpBudget { | ||
epsilon: Ratio<BigUint>, | ||
} | ||
|
||
impl PureDpBudget { | ||
/// Create a budget for parameter `epsilon`. | ||
pub fn new(epsilon: Rational) -> Result<Self, DpError> { | ||
if epsilon.0.numer() == &BigUint::ZERO { | ||
return Err(DpError::InvalidParameter("epsilon cannot be zero".into())); | ||
} | ||
Ok(Self { epsilon: epsilon.0 }) | ||
} | ||
} | ||
|
||
impl DifferentialPrivacyBudget for PureDpBudget {} | ||
|
||
/// This module encapsulates a deserialization helper struct. It is needed so we can wrap its | ||
/// derived `Deserialize` implementation in a customized `Deserialize` implementation, which makes | ||
/// use of the budget's constructor to enforce input validation invariants. | ||
mod budget_serde { | ||
use num_bigint::BigUint; | ||
use num_rational::Ratio; | ||
use serde::{de, Deserialize}; | ||
|
||
#[derive(Deserialize)] | ||
pub struct PureDpBudget { | ||
epsilon: Ratio<BigUint>, | ||
} | ||
|
||
impl<'de> Deserialize<'de> for super::PureDpBudget { | ||
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> | ||
where | ||
D: serde::Deserializer<'de>, | ||
{ | ||
let helper = PureDpBudget::deserialize(deserializer)?; | ||
super::PureDpBudget::new(super::Rational(helper.epsilon)) | ||
.map_err(|_| de::Error::custom("epsilon cannot be zero")) | ||
} | ||
} | ||
} | ||
|
||
/// Strategy to make aggregate results differentially private, e.g. by adding noise from a specific | ||
/// type of distribution instantiated with a given DP budget. | ||
pub trait DifferentialPrivacyStrategy { | ||
|
@@ -126,3 +174,17 @@ pub trait DifferentialPrivacyStrategy { | |
} | ||
|
||
pub mod distributions; | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
use serde_json::json; | ||
|
||
use super::PureDpBudget; | ||
|
||
#[test] | ||
fn budget_deserialization() { | ||
serde_json::from_value::<PureDpBudget>(json!({"epsilon": [[1], [1]]})).unwrap(); | ||
serde_json::from_value::<PureDpBudget>(json!({"epsilon": [[0], [1]]})).unwrap_err(); | ||
serde_json::from_value::<PureDpBudget>(json!({"epsilon": [[1], [0]]})).unwrap_err(); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,7 +60,7 @@ use serde::{Deserialize, Serialize}; | |
|
||
use super::{ | ||
DifferentialPrivacyBudget, DifferentialPrivacyDistribution, DifferentialPrivacyStrategy, | ||
DpError, ZCdpBudget, | ||
DpError, PureDpBudget, ZCdpBudget, | ||
}; | ||
|
||
/// Sample from the Bernoulli(gamma) distribution, where $gamma /leq 1$. | ||
|
@@ -262,6 +262,8 @@ where | |
} | ||
|
||
/// A DP strategy using the discrete gaussian distribution providing zero-concentrated DP. | ||
/// | ||
/// This uses L2-sensitivity, with the substitution definition of neighboring datasets. | ||
pub type ZCdpDiscreteGaussian = DiscreteGaussianDpStrategy<ZCdpBudget>; | ||
|
||
impl DifferentialPrivacyStrategy for DiscreteGaussianDpStrategy<ZCdpBudget> { | ||
|
@@ -287,6 +289,89 @@ impl DifferentialPrivacyStrategy for DiscreteGaussianDpStrategy<ZCdpBudget> { | |
} | ||
} | ||
|
||
/// Samples `BigInt` numbers according to the discrete Laplace distribution, with the given scale | ||
/// parameter. The distribution is defined over the integers, represented by arbitrary-precision | ||
/// integers. The sampling procedure follows [[CKS20]]. | ||
/// | ||
/// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf | ||
pub struct DiscreteLaplace { | ||
/// The scale parameter of the distribution. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, though I've seen t, b, and sigma used for it. I think I just want to refer to it as the "scale" in documentation for clarity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, scale is a good term to use here. I just wanted to clarify. Perhaps mention in the write up that |
||
scale: Ratio<BigUint>, | ||
} | ||
|
||
impl DiscreteLaplace { | ||
/// Create a new sampler for the discrete Laplace distribution with the given scale parameter. | ||
/// Returns an error if the scale parameter is zero or if it has a denominator of zero. | ||
pub fn new(scale: Ratio<BigUint>) -> Result<Self, DpError> { | ||
if scale.denom().is_zero() { | ||
return Err(DpError::ZeroDenominator); | ||
} | ||
if scale.numer().is_zero() { | ||
return Err(DpError::InvalidParameter( | ||
"the scale of the discrete Laplace distribution must be nonzero".into(), | ||
cjpatton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
)); | ||
divergentdave marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
Ok(Self { scale }) | ||
} | ||
} | ||
|
||
impl Distribution<BigInt> for DiscreteLaplace { | ||
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> BigInt { | ||
sample_discrete_laplace(&self.scale, rng) | ||
} | ||
} | ||
|
||
impl DifferentialPrivacyDistribution for DiscreteLaplace {} | ||
|
||
/// A DP strategy using the discrete Laplace distribution. | ||
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Ord, PartialOrd)] | ||
pub struct DiscreteLaplaceDpStrategy<B> | ||
where | ||
B: DifferentialPrivacyBudget, | ||
{ | ||
budget: B, | ||
} | ||
|
||
/// A DP strategy using the discrete Laplace distribution, providing pure DP. | ||
/// | ||
/// This uses L1-sensitivity, with the substitution definition of neighboring datasets. | ||
pub type PureDpDiscreteLaplace = DiscreteLaplaceDpStrategy<PureDpBudget>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When naming this stuff, we should make sure to leave room for new DP schemes that may arrive soon, based on the drafts mt and the Bens are working on. Is this the only DP strategy we'd ever implement that is pure and samples the discrete Laplace distribution? IIUC (but I could easily be wrong), this strategy is where each aggregator independently adds sufficient noise to protect against either aggregator defecting, but the incoming schemes use MPC to meet privacy goals with less noise. So should this be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the binomial noise in MPC mechanism would be different enough that it wouldn't need to interact with these traits, so I'm not too worried about namespace overlap. Plus, this strategy ultimately gets passed into an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A binomial noise sampler w/o MPC seems conceivable, but I don't think we need to plan for this. |
||
|
||
impl DifferentialPrivacyStrategy for PureDpDiscreteLaplace { | ||
type Budget = PureDpBudget; | ||
type Distribution = DiscreteLaplace; | ||
type Sensitivity = Ratio<BigUint>; | ||
|
||
fn from_budget(budget: Self::Budget) -> Self { | ||
DiscreteLaplaceDpStrategy { budget } | ||
} | ||
|
||
/// Create a new sampler for the discrete Laplace distribution with a scale parameter calibrated | ||
/// to provide `epsilon`-differential privacy when added to the result of an integer-valued | ||
/// function with L1-sensitivity `sensitivity`. | ||
/// | ||
/// A mechanism is defined for 1-dimensional query results in [[GRS12]], and restated in Lemma | ||
/// 29 from [[CKS20]]. However, most VDAF instances will produce query results of higher | ||
/// dimensions. Proposition 1 of [[DMNS06]] gives a mechanism for multidimensional queries using | ||
/// the continuous Laplace distribution. In both cases, the scale parameter of the respective | ||
/// distribution is set to the sensitivity divided by epsilon, and independent samples from the | ||
/// distribution are added to each component of the query result. Intuitively, adding discrete | ||
/// Laplace noise using this scale parameter to each vector element of the query result should | ||
/// provide epsilon-DP, since continuous Laplce noise can be used in the multi-dimensional case, | ||
/// and discrete and continuous Laplace noise provide the same pure DP with the same parameters | ||
/// in the one-dimensional case. | ||
/// | ||
/// [GRS12]: https://theory.stanford.edu/~tim/papers/priv.pdf | ||
/// [CKS20]: https://arxiv.org/pdf/2004.00010.pdf | ||
/// [DMNS06]: https://people.csail.mit.edu/asmith/PS/sensitivity-tcc-final.pdf | ||
fn create_distribution( | ||
&self, | ||
sensitivity: Self::Sensitivity, | ||
) -> Result<Self::Distribution, DpError> { | ||
DiscreteLaplace::new(sensitivity / &self.budget.epsilon) | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
mod tests { | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the LyX system, but I think we should be cautious about adding new file formats for documentation. I don't suppose it can render to Markdown? We already have several Markdown documents across our projects, and the format is well-supported on GitHub and docs.rs, which are the principal ways we expect people to consume
libprio-rs
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some instructions to do so using pandoc here: https://wiki.lyx.org/Tips/ConvertMarkdown. I'm going to try that out and see how GitHub handles the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running into a number of issues with this.
> [!NOTE]
, etc. Pandoc is using the same triple colon syntax for definitions and proofs here that Docusaurus uses for inset notes, etc.$
equations are getting grouped together, interpreted as italicizing the text they contain, and thus breaking equations. I've seen suggestions to escape the underscore with a backslash, but those come along with complaints that other tools don't handle this the same way, and thus render literal underscores inside equations.Ultimately, I think we'll need to pick one ecosystem, either pandoc/LaTeX or GitHub-flavored Markdown, and target it, as there are some fundamental incompatibilities in Markdown handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I don't want to be too difficult here, because you put a lot of effort into good docs, and LaTeX is more natural for math at this level. Two further ideas:
.lyx
file be handled by LaTeX? If not, can we use a format that is LaTeX friendly?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I worked through most of the way through these issues, but two remain. Some absolute value bars disappear on my browser at 100% zoom level, but reappear at 110%. In one equation, we have a sum in an exponent, and that is falling apart. It looks fine when I try to render it using the example JSBin link on mathjax.org, but here both bounds just show up below the sigma.
I can check in either a .tex file or a .pdf file. The PDF is only ~180kB, so I think that should be okay to check in, especially since it'll be the most straightforward to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well written LaTeX is very readable :)