Skip to content

What are some use cases for Veracruz?

dominic-mulligan-arm edited this page Oct 30, 2020 · 2 revisions

We'll consider three potential Veracruz use-cases, here. Some of these are of potential commercial interest, some are more of academic interest intended to demonstrate why some of Veracruz's design decisions were made. The chosen use-cases are intended to capture Veracruz's breadth of applicability.

Privacy-preserving machine learning

We consider "privacy-preserving machine learning" from two different angles: the protection of datasets, and the protection of machine learning algorithms.

Protection of datasets

Two small online retailers — Acorp and Bmart — have a problem: they cannot adequately compete with their larger competitors in providing useful product recommendations to viewers of their respective websites based on past product purchases. The real problem for Acorp and Bmart is that their larger competitors have much larger datasets to play with, and therefore can train more accurate machine learning models, than either of them.

However, the situation becomes more interesting if Acorp and Bmart choose to pool their respective data sets, and learn a joint model over this data set. Now, the two are able to learn a more accurate model in concert than either would be able to do individually. Unfortunately, both Acorp and Bmart are in fierce competition, not just with their larger competitors, but also with each other, and therefore the idea of sharing data with each other is unconscionable!

Ordinarily, this would be the end of the matter, if not for Veracruz...

Specifically, Acorp and Bmart are going to collaborate in a limited sense, using Veracruz to learn a machine learning model over their joint data set. Importantly, neither retailer will learn anything about the other retailer's data set, nor will any other third party. Moreover, the only principals that will gain access to the machine learning model are the two retailers.

Out-of-band, Acorp and Bmart first negotiate the following aspects of their collaboration:

  • The encoding, if any, that will be used when communicating the data sets over TLS.
  • The columns that the joint data set will contain, and their interpretation.
  • The machine learning algorithm that will be applied to the data sets.
  • The encoding, if any, that will be used when communicating the result of the machine learning model, once learning has finished.
  • Any error codes that will be returned by the machine learning algorithm, and their interpretation.

After this negotiation, one of the two corporations — let's assume it's Acorp — prepares a Veracruz program realizing the negotiated machine learning algorithm, as above, and gives access to Bmart for auditing. Note here that Acorp is specifically declassifying its program as a means of inducing Bmart to enroll in the computation: a "nothing up my sleeve" step. Moreover, Acorp is now taking on two roles in the computation: that of Data Provider and Program Provider.

Separately, the two prepare a Veracruz global policy specifying their respective roles — Data Provider, Program Provider, Results Receiver in the case of Acorp, and Data Provider, Results Receiver in the case of Bmart — and select a host on which to execute the computation. If the host is an established Cloud host, with a good reputation, the two may opt to use Veracruz's hypervisor-based containerisation, for instance. In other contexts, for example the host being one of Acorp or Bmart themselves, they may choose to use a hardware-based containerisation technology, either TrustZone or SGX.

Once setup, the two provision their program and then their respective data sets, as described in What are Veracruz computations?, and obtain the machine learning model extracted from their joint data set. Neither learns anything, other than this machine learning model, from the other participant's data set, nor does anybody else, as desired.

Protection of algorithms

Suppose Alice, a machine learning researcher, develops an innovative machine learning algorithm and wishes to license this algorithm to third parties, charging a small for every invocation of their algorithm to a data set. Suppose also that Bob wants to license this algorithm from Alice, and apply it to a data set that he owns. Bob would also like to keep his data set private, if possible.

Out-of-band, Alice tells Bob the binary format that he should store his data set in, and the binary format of the output answer that the algorithm will produce, and how to interpret it, along with any error codes that the machine learning algorithm may produce. Alice then informs Bob of the hash of her algorithm's implementation, for his record keeping.

Alice writes a global policy for a Veracruz computation: Alice will take the role of Program Provider, Bob the role of Data Provider and Results Receiver. The two agree to use a neutral third party as the delegate, for example a commercial Cloud host. The two follow the protocol outlined in What are Veracruz computations?, and provision their respective secrets into Veracruz, before it computes the final result, which Bob can access.

Note in this case, Alice does not declassify her program. However, from Bob's perspective, this is "safe", as Alice will not be able to learn anything about Bob's data set from the invocation of her program, as she is not a Results Receiver in the computation. Moreover, Bob will not learn anything about Alice's program — as he never sees it — other than what can be deduced from the program's output.

Delegated computation

Suppose a company deploys a fleet of computationally-weak Internet of Things (IoT, henceforth) devices. The fleet of devices can be augmented with functionality that ordinarily requires a more computationally sophisticated device (e.g. natural language speech processing, or similar) by delegating the computation to a more sophisticated device. Unfortunately, this delegation poses two risks:

  1. A loss of privacy. This aspect becomes especially serious when the delegated task is handed off to untrusted, multi-tenanted Edge devices, rather than centralized data centres, under the control of the owner of the IoT device fleet.
  2. Problems with integrity, wherein the delegate computes a different function, or provides a different capability, than the delegating device is expecting, either due to bad programming, misconfiguration, or due to nefarious interference by e.g. a co-tenant, as in point (1) above.

Instead, Veracruz can be used as a means of delegating computations from one device to another, solving problems with both privacy and integrity of the computation.

Briefly, an untrusted server will act as the delegate, and is assumed capable of spawning a strong container capable of running the trusted Veracruz runtime. Depending on the particular model of delegation, either the device itself, or a third-party managing the device, takes on the role of Program Provider, with the delegating device acting as both Data Provider and Results Receiver. The device and program provider — if this role is not taken on by the device itself — follow the protocol outlined in What are Veracruz computations? to provision their secrets into the delegate's container, after which the result is computed, and made thereafter made available to the delegating device.

Commitments

This use-case is more of academic interest, but demonstrates that the trusted Veracruz runtime's stateful nature is sometimes useful when designing a collaborative computation to achieve a desired effect.

Alice and Bob are playing a distributed coin-tossing game, wherein Bob "calls" a coin toss, guessing either heads or tails, before Alice tosses a fair coin. If Bob guessed correctly, prior to Alice tossing the coin, then Bob wins a prize, otherwise he loses and wins nothing. Alice does not trust Bob: she worries that he will change his guess after the coin is tossed in order to try to obtain a prize he is not entitled to. (Alice on the other hand is known to be unscrupulously fair by everybody, including Bob, who has no reason not to trust her.) Cryptographic approaches to solving this problem exist, using cryptographic commitments, but we sketch an approach base on Veracruz, here.

Alice prepares a program, P, which accepts a single input: Bob's guess. She presents this program to Bob (i.e. intentionally declassifies it) and Bob audits it. The program has a very simple behaviour: it receives Bob's guess as input, and returns it as its output — that is, it's a glorified identity function. Bob audits the program and accepts it.

Alice and Bob then agree a global policy for a Veracruz computation. Alice is listed as the Program Provider and Results Receiver, and Bob is listed as the Data Provider. They arbitrarily choose some delegate to host the computation, who loads the trusted Veracruz runtime into a strong container.

Following the protocol outlined in What are Veracruz computations?, Alice provisions her program into the delegate's container, and waits for Bob to provision his guess into the container as an input. Note that Alice need not trust that Bob has guessed here, at this point, as she can instead query the state of the Veracruz runtime: if Bob has committed to a guess, then the Veracruz runtime should be in a "ready to execute" lifecycle state. Alice, trusting that the Veracruz runtime is correct, knows that at this point Bob cannot renege on his guess and guess again, as the runtime's state machine ensures that secrets, once provisioned, cannot be changed.

Once Alice knows that Bob has committed to a guess, she makes her coin toss. Now, to obtain Bob's guess, she requests the result of the computation from the Veracruz runtime, which executes the program, returning Bob's guess to Alice as its result. (See What is the Veracruz programming model? for a full explanation of this point.) Alice then compares the guess to the coin toss, granting or denying Bob his prize as appropriate.