You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apologies for the long issue. We are using MOTION in research that we are getting ready to publish and we don't want to misrepresent MOTION.
I wonder what your experience is with the execution time of MOTION using Boolean GMW to execute arithmetic operations? For example, I wrote the following small benchmark which computes Hamming distance. Am I doing anything incorrectly which would significantly slow things down?
void EvaluateProtocol(mo::PartyPointer& party, std::uint32_t value) {
mo::SecureUnsignedInteger zero(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 0), 0));
mo::SecureUnsignedInteger one(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 1), 0));
std::vector<mo::SecureUnsignedInteger> inputA(value);
std::vector<mo::SecureUnsignedInteger> inputB(value);
for (uint32_t i = 0; i < value / 4; i++) {
inputA[4 * i] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 1), 0));
inputA[4 * i + 1] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 0), 0));
inputA[4 * i + 2] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 1), 0));
inputA[4 * i + 3] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 1), 0));
}
for (uint32_t i = 0; i < value / 4; i++) {
inputB[4 * i] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 1), 1));
inputB[4 * i + 1] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 0), 1));
inputB[4 * i + 2] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 1), 1));
inputB[4 * i + 3] = mo::SecureUnsignedInteger(party->In<mo::MpcProtocol::kBooleanGmw>(mo::ToInput((uint32_t) 0), 1));
}
auto distance = zero;
for (uint32_t i = 0; i < value; i++) {
distance += (inputA[i] == inputB[i]).Mux(zero.Get(), one.Get());
}
// construct an output gate
auto output = distance.Out();
// run the protocol
party->Run();
party->Finish();
// retrieve the result
auto result = output.As<std::uint32_t>();
// print the result into the terminal
std::cout << "Distance " << result << std::endl;
}
A rough measurement of this program with two parties (hardcoded), on an input size of 1000, gives an end-to-end execution time of about 8 seconds. I am running a 2019 Intel MacBook Pro w/ 16g of physical RAM. It was run locally, so the network latency is very low (RTT to localhost appears to be about 50 microseconds). I was also careful to ensure that the two MOTION processes did not cause exhaustion of physical RAM.
For a comparison, I ran the same benchmark with an input size of 10000 using EMP's sh2pc library and the elapsed time is about 1 second.
I should mention that I'm aware there are ways to improve this particular benchmark. I'm only using this as a representative example of how we are using MOTION to do arithmetic, etc. We have some benchmarks that will only work with Boolean GMW, and require arithmetic. So, using conversions to Arithmetic GMW and/or using Boolean operations are not an option.
Does this agree with your expectation? Obviously, there are a lot of differences between using MOTION w/ Boolean GMW and EMP w/ Boolean Yao. But, this is a larger difference (8x slower on an input size which is 1 order of magnitude smaller) than I expected.
My understanding of the main performance concerns when comparing these two frameworks are...
Since MOTION supports N parties (and therefore needs to use secret sharing for GMW and BMR), it needs to build up a circuit for optimal communication round complexity. This causes a significant amount of memory allocation / deallocation since each gate and wire object has (at least) 1 - 2 pointers which take up 1 - 2 words of memory. In contrast, EMP uses a streaming version of Yao's protocol in which gates are garbled and send over the wire on-demand. Therefore, EMP's memory requirements are much lower.
Since MOTION uses secret sharing, the communication round complexity is proportional to the depth of the circuit. In contrast, EMP's communication round complexity is constant. Even with streaming gates, communication is unidirectional.
Does that sound right?
The text was updated successfully, but these errors were encountered:
A short answer is: A high level of abstraction definitely comes with a cost. That's why we advise to use SIMD operations where possible, which run one operation on many elements, as one gate. For that, you would input vectors instead of single elements and do operations as if on a single element.
GCs are often faster than GMW but that's likely not the main problem here.
What also may help is the use of other memory allocators (e.g., see how MOTION_USE_TCMALLOC is used in the docker script).
Maybe trivial but better double-check that you compiled in Release.
Did you try to optimize your code @Isweet? If yes, how much faster did it get for you?
Also, I know that the code vectorization is a little annoying because the developer needs to do more than just writing the usual code, so we are working on automating the vectorization. If you're interested, keep an eye on this repo.
Hey,
Apologies for the long issue. We are using MOTION in research that we are getting ready to publish and we don't want to misrepresent MOTION.
I wonder what your experience is with the execution time of MOTION using Boolean GMW to execute arithmetic operations? For example, I wrote the following small benchmark which computes Hamming distance. Am I doing anything incorrectly which would significantly slow things down?
A rough measurement of this program with two parties (hardcoded), on an input size of
1000
, gives an end-to-end execution time of about 8 seconds. I am running a 2019 Intel MacBook Pro w/ 16g of physical RAM. It was run locally, so the network latency is very low (RTT tolocalhost
appears to be about 50 microseconds). I was also careful to ensure that the two MOTION processes did not cause exhaustion of physical RAM.For a comparison, I ran the same benchmark with an input size of
10000
using EMP'ssh2pc
library and the elapsed time is about 1 second.I should mention that I'm aware there are ways to improve this particular benchmark. I'm only using this as a representative example of how we are using MOTION to do arithmetic, etc. We have some benchmarks that will only work with Boolean GMW, and require arithmetic. So, using conversions to Arithmetic GMW and/or using Boolean operations are not an option.
Does this agree with your expectation? Obviously, there are a lot of differences between using MOTION w/ Boolean GMW and EMP w/ Boolean Yao. But, this is a larger difference (8x slower on an input size which is 1 order of magnitude smaller) than I expected.
My understanding of the main performance concerns when comparing these two frameworks are...
Does that sound right?
The text was updated successfully, but these errors were encountered: