Introduction
OpenMPCC is a project which aims to combine secure Multi Party Computation (MPC) with Trusted Execution Environments (TEE) and the cloud. This enables secure and seamless collaboration in distributed environments and ensures privacy and trust.
The combination of these different technologies allows for the efficient deployment of:
- Private Information Retrieval
- Private Machine Learning
- Private Analytics
Concrete use cases for these technologies are for example:
- Cyber Threat Detection
- Threat Correlation
- Data Retention
- Chatbots on restricted information
Cyber Threat Detection and Threat Correlation leverage the Private Analytics component. Through MPC and TEEs collaborators are able to securely exchange sensitive data and analyze it without compromising the raw information. This allows federal agencies to collaborate with agencies from other countries while protecting the privacy of innocent individuals and obfuscating their sources.
Data Retention leverages the Private Information Retrieval component and allows the secure encrypted storage of sensitive information. However, authorized personnel is still going to be able to retrieve data when legally required even in cases where nobody other than the authorized personnel is allowed to know which data was retrieved.
The Private Machine Learning setting as employed by Chatbots on restricted information allows for chatbots to securely process and provide insights from restricted information. Private Machine Learning preserves the confidentiality of the data while allowing the AI-powered analysis.
Architecture
OpenMPCC itself consists of three major components, which interact with each other in order to enable highly scalable and secure Multi Party Computation.
The foundation of OpenMPCC consists of post quantum secure building blocks which can be leveraged to build post quantum secure MPC protocols. With the MPC protocols as the second building block OpenMPCC is able to implement the privacy and security required for the collaboration of multiple parties. The third and last building block, the confidential cloud, allows for the scalable deployment of MPC protocols while offering security and privacy from a hardware root of trust bridging the gap of the MPC protocols which assume that the execution environment is trusted.
Details
TBD
Getting Started
In this section you will learn how to install MP-SPDZ in your local environment.
Installation of MP-SPDZ
This tutorial will show you how to install MP-SPDZ on your system from source and precompiled binaries. We also cover advanced configuration options that can be enabled during compilation.
Source Installation
The source installation requires that some dependencies have a quite recent version. Therefore, we assume that you use at least Ubuntu 24.04 for the sake of this tutorial. If you are using any other distribution you have to make sure that the packages that you install meet the mentioned versions.
In a first step we are going to install all the necessary packages
apt-get update && apt-get install -y --no-install-recommends \
automake \
build-essential \
ca-certificates \
clang \
cmake \
gdb \
git \
libboost-dev \
libboost-filesystem-dev \
libboost-iostreams-dev \
libboost-thread-dev \
libclang-dev \
libgmp-dev \
libntl-dev \
libsodium-dev \
libssl-dev \
libtool \
openssl \
python3 \
valgrind \
vim
with some needing to meet the following criteria in order to work properly.
- Clang 10 or later
- Cmake 3.15 or later
- Boost 1.75 or later
- GMP with C++ support
- Python 3.5 or later
After installing the build requirements we are going to build MP-SPDZ using the latest version which currently is 0.4.0.
git clone https://github.com/data61/MP-SPDZ.git
cd MP-SPDZ
git checkout v0.4.0
make -j $(nproc)
Configuration
As MP-SPDZ includes a variety of protocols a user might not need to build everything that MP-SPDZ is offering. For this purpose the compilation can be limited to a single protocol and additional configuration options can be used.
For example some protocols do not implement an offline phase or the user might want to benchmark protocols with a fake offline phase.
This can be achieved through adding MY_CFLAGS += -DINSECURE to either CONFIG or CONFIG.mine.
Other advanced configuration options are for the usage of homomorphic encryption using GF(2^40) which can be done through setting USE_NTL = 1 or the usage of KOS through USE_KOS = 1 and SECURE = -DINSECURE.
Whenever a change to the configuration files is done make clean needs to be run before compiling again with the new changes.
Single protocols like mascot can be compiled using make mascot.
This can be used in cases where one is only interested in a small subset of protocols.
Binary Installation
We first retrieve the latest version from the Github release page.
In our case this is v0.4.0 and we can use wget https://github.com/data61/MP-SPDZ/releases/download/v0.4.0/mp-spdz-0.4.0.tar.xz for this.
Afterwards the archive needs to be extracted using tar xf mp-spdz-0.4.0.tar.xz.
From the top level folder execute Scripts/tldr.sh in order to place the binaries at their correct location.
Afterwards you have a ready to use installation.
Tutorials
In this section you will learn how to create your first program with MP-SPDZ and also how you can solve more complex problems.
Optimized Secure Edit Distance in MPC
This tutorial explains an optimized secure computation of the edit distance between two parties using MP-SPDZ. It shows how conversion between Boolean and arithmetic shares can improve performance, and illustrates how edaBits in MP-SPDZ can be used for efficient comparisons. It is based on the paper Privacy-Preserving Edit Distance Computation Using Secret-Sharing Two-Party Computation, published in Latincrypt 2023. Previous papers proposed approaches based on homomorphic encryption and garbled circuits, this time MPC based on secret sharing is the tool of choice.
Algorithm in plaintext
The edit distance (also known as Levenshtein distance) is a metric for how related two strings of data are. It gives the number of operations (insertions, deletions and substitutions) needed to go from string A to string B. It is commonly used in genome sequencing and DNA analysis.
The Wagner–Fischer algorithm formalizes the problem as a recursion, from which a dynamic programming solution can be efficiently implemented.
The core idea of the algorithm is building a matrix D of size (len(A)+1) × (len(B)+1), where D[i][j] is the minimum edit distance between:
- the first
icharacters of string A - and the first
jcharacters of string B.
import numpy as np
def edit_distance(A, B):
D = np.zeros(shape=(len(A) + 1, len(B) + 1))
# Fill base cases
for i in range(len(A) + 1):
D[i][0] = i # (cost of deleting all characters from A)
for j in range(len(B) + 1):
D[0][j] = j # (cost of deleting all characters from B)
for i in range(1, len(A) + 1):
for j in range(1, len(B) + 1):
# Compare two characters of the string
if A[i - 1] == B[j - 1]:
t = 0
else:
t = 1
D[i][j] = min(
D[i - 1][j] + 1, # Deletion
D[i][j - 1] + 1, # Insertion
D[i - 1][j - 1] + t # Substitution
)
# Not return the final edit distance
return D[len(A)][len(B)]
if __name__ == "__main__":
file_chain_A = open("Inputs/arithmetic/Input-P0-0")
file_chain_B = open("Inputs/arithmetic/Input-P1-0")
A = file_chain_A.read().replace("\n", "")
B = file_chain_B.read().replace("\n", "")
file_chain_A.close()
file_chain_B.close()
print("Distance =", edit_distance(A, B))
Example
Given A = "kitten" and B = "sitting", the DP matrix looks like this:
'' s i t t i n g
-------------------------
'' | 0 1 2 3 4 5 6 7
k | 1 1 2 3 4 5 6 7
i | 2 2 1 2 3 4 5 6
t | 3 3 2 1 2 3 4 5
t | 4 4 3 2 1 2 3 4
e | 5 5 4 3 2 2 3 4
n | 6 6 5 4 3 3 2 3
The final edit distance is D[6][7] = 3.
Edit distace in MPC
Genomic data is frequently large and highly sensitive, so finding a way to efficiently compute the edit distance metric in a privacy preserving way is quite relevant.
In the case of genomic data, the characters in A and B are nucleotides (ATCG) which can be encoded using only 2 bits each.
However, note that the algorithm has two different types of operations:
- Comparisons between characters from A and B and entries of the matrix
D - Arithmetic between entries of
Dand 1 or the result of a comparison This means the MPC approach will involve mixed arithmetic, which can be optimized by using extended doubly-authenticated bits or edaBits.
What Is an edaBit?
An edaBit (efficiently generated bit-decomposed value) is a special type of preprocessed value in MP-SPDZ that provides:
- A secret-shared number
x, and - Its bit-decomposition
[x_0, x_1, ..., x_{k-1}] where x = x₀ + 2x₁ + ... + 2^{k-1}x_{k-1}Both the value and its bits are already secret-shared, generated during the preprocessing phase. This means we can switch from the two representations efficiently.
Algorithm in MPC
We split the algorithm above in two stages, as to minimize the number of conversions. First we compare all characters of A and B in bitwise representation and store the results of the comparisons in a matrix.
Next we convert the results of these comparisons to secret-shared integers, and execute the dynamic programming approach in MPC.
The resulting code computes the edit distance between two secret-shared strings A and B, meaning each party holds inputs and does not reveal them to others. The function securely computes how many insertions, deletions, or substitutions are needed to turn one string into the other.
The full implementation can be found here. Let's go through it step by step.
1. Comparisons
Enable edaBits in the backend protocol, a secret-shared bit type and functions to compare two individual nucleotides.
The comparison function amounts to computing the XOR between the pairs of bits in each of the nucleotides to compare them, and the XOR between the results of the two comparisons.
We negate the result such that it outputs 1 if the nucleotides are equal.
program.use_edabit(True)
sb = sbits.get_type(1)
def xor_nucleotids(N1, N2):
xor = Array(2, sb)
xor[0] = N1[0] ^ N2[0]
xor[1] = N1[1] ^ N2[1]
return xor
def or_nucleotid(N):
return N[0].bit_or(N[1])
def equal_nucleotids(N1, N2):
return or_nucleotid(xor_nucleotids(N1, N2)).bit_not()
2. Comparison matrix
Compute and store all the results of comparisons in a matrix.
def compute_comp_matrix(A, B):
M_comp = Matrix(len(A), len(B), sb)
@for_range_opt(len(A))
def _(i):
@for_range_opt(len(B))
def _(j):
comp = equal_nucleotids(A[i], B[j]).bit_not()
M_comp[i][j] = comp
return M_comp
3. Main loop
With the matrix T of comparisons already computed, we convert the results of comparisons to secret-shared integers and the rest of the algorithm is relatively straightforward, being quite similar to the plaintext version above.
Notice that we use @for_range_opt to execute loops depending on bounds that are provided in the command line, and thus are not known at compile-time.
def edit_distance(A, B):
T_bits = compute_comp_matrix(A, B)
T = Matrix(len(T_bits), len(T_bits[0]), sint)
@for_range_opt(len(T_bits))
def _(i):
@for_range_opt(len(T_bits[0]))
def _(j):
T[i][j] = sint(T_bits[i][j])
D = Matrix(len(A) + 1, len(B) + 1, sint)
D.assign_all(0)
@for_range_opt(len(A) + 1)
def _(i):
D[i][0] = i
@for_range_opt(len(B) + 1)
def _(j):
D[0][j] = j
@for_range(1, len(A) + 1)
def _(i):
@for_range(1, len(B) + 1)
def _(j):
min_terms = [
D[i - 1][j] + 1,
D[i][j - 1] + 1,
D[i - 1][j - 1] + T[i - 1][j - 1]
]
D[i][j] = util.min(min_terms)
return D[len(A)][len(B)]
4. Reads inputs and execute the program
The final missing piece is reading strings from the inputs given by the two parties, calling the MPC function and printing the final result.
def read_strings(A_length, B_length):
A = Matrix(A_length, 2, sb)
B = Matrix(B_length, 2, sb)
for i in range(A_length):
A[i][0] = sb.get_input_from(0)
A[i][1] = sb.get_input_from(0)
for j in range(B_length):
B[j][0] = sb.get_input_from(1)
B[j][1] = sb.get_input_from(1)
return A, B
A_length = int(program.args[1])
B_length = int(program.args[2])
A, B = read_strings(A_length, B_length)
d = edit_distance(A, B)
print_ln("Edit distance = %s", d.reveal())
Experimental results
Let's now try this idea in practice. After implementing the algorithms in MP-SPDZ, we will compile and run two sorting programs with input strings of 100 nucleotids each.
The backend of choice will be spdz2k as in the original paper, which can be accelerated with edaBits:
./compile.py edit-distance.mpc 100 100
Scripts/spdz2k.sh -v optim_edit_distance-100-100
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../spdz2k-party.x 0 -v optim_edit_distance-100-100 -pn 11237 -h localhost -N 2
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../spdz2k-party.x 1 -v optim_edit_distance-100-100 -pn 11237 -h localhost -N 2
Using SPDZ2k security parameter 64
Using statistical security parameter 40
Trying to run 64-bit computation
Edit distance = 52
The following benchmarks are including preprocessing (offline phase).
Time = 92.3301 seconds
Data sent = 6775.89 MB in ~738945 rounds (party 0 only)
Global data sent = 13551.8 MB (all parties)
One runtime optimization we can make is reduce the precision of the arithmetic secret-sharings. Because our inputs are bounded in length, we can actually use less than 64 bits to represent shares, for example 16. We can make this modification by rebuilding the MP-SPDZ backends with lower precision and running the benchmarks again:
make RING_SIZE=16 spdz2k-party.x
./compile.py -R 16 edit-distance.mpc 100 100
Scripts/spdz2k.sh -v optim_edit_distance-100-100
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../spdz2k-party.x 0 optim_edit_distance-100-100 -pn 13321 -h localhost -N 2
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../spdz2k-party.x 1 optim_edit_distance-100-100 -pn 13321 -h localhost -N 2
Using SPDZ2k security parameter 64
Using statistical security parameter 40
Trying to run 16-bit computation
Edit distance = 52
The following benchmarks are including preprocessing (offline phase).
Time = 35.8789 seconds
Data sent = 2041.55 MB in ~559161 rounds (party 0 only; use '-v' for more details)
Global data sent = 4083.09 MB (all parties)
Hello World!
Let us get our first MPC program to run.
To make sure we can compile and the run the simplest program let us write a "Hello world!" program.
In the folder Programs/Source we will make a new file called hello_world.mpc.
The file needs the .mpc extension, which tells the MP-SPDZ compiler that this is a program.
Open the file and add the following line of code
print_ln("Hello world!")
This line of code works as expected as it prints "Hello World!".
To run the program we need to first compile it using the MP-SPDZ compiler and then run it using a protocol. To compile we run
./compile.py hello_world.mpc
We do not need to provide a path to the program we have written we only provide the name of the file as long as it reside in the Programs/Source folder.
To run the program we execute the Script
Scripts/mascot.sh hello_world
The program we have written is now executed using the MASCOT protocol for multi-party computations. Different protocol are available, and can be used to evaluate the program. The list of available protocols and their security guarantees are available at Protocols. The protocol decides which mathematical property used to do the multiparty computation. In the case where we use MASCOT, so we are working in a finite field of size $2^n$. If we for example were working on binary values, we could either use binary secret sharing in the Tiny protocol, or even use garbled circuits with the BMR protocol. To keep this simple we will stick with the MASCOT.
Run the program and the output should look like this
Running MP-SPDZ/Scripts/../mascot-party.x 0 hello_world -pn 18477 -h localhost -N 2
Running MP-SPDZ/Scripts/../mascot-party.x 1 hello_world -pn 18477 -h localhost -N 2
Using statistical security parameter 40
Hello world!
The following benchmarks are including preprocessing (offline phase).
Time = 0.000687384 seconds
Data sent = 0 MB in ~0 rounds (party 0 only; use '-v' for more details)
Global data sent = 0 MB (all parties)
As it gets quite tedious to run both the compiler and the program each time a change is made, MP-SPDZ provide a script which can do this at once by using the following command
Scripts/compile-run.py mascot hello_world
First Multiplication
Now that we have written our own program and made it run, let us begin to do more interesting stuff with multi-party computation. The first program we write that actually does stuff will be a simple multiplications program, that is we multiply two input values from two parties.
To get inputs from the parties we need to load them into the .mpc program.
This is done using the function
sint.get_input_from(i)
This function will get an input from party i and return it as a sint which is the secure integer type used in MP-SPDZ.
Do not worry about this type as of now, as we will dive into it later.
Hence to start our multiplication program we need to get input from both parties.
So first create a new file called first_mul.mpc in Programs/Source and add the first two lines
a = sint.get_input_from(0)
b = sint.get_input_from(1)
We can then do simple arithmetic operations on these sint and the compiler of MP-SPDZ will perform the underlying steps needed to do the operations we want.
In our case we do multiplication, so we just use the * operations, hence add the line
mul = a * b
Now the product of the two input are stored in the mul variable.
However as this is secret integer type, we need to open/reveal the value for us to print it and see the results.
We can do this with the following lines of code
res = mul.reveal()
print_ln('Multiplication: %s', res)
Here res will be the plain-text values of the secret integer mul, hence we can now print it using print_ln
Note that the print_ln does not work in the same fashion as python print, the MP-SPDZ print function works more like printf known form the programming language C.
In our case we print the string "Multiplication: " followed by a string denoted as %s which will take the text value of res
We are now ready to run our program
./compile.py first_mul.mpc
Scripts/mascot.sh first_mul - I
In the script we add the argument -I to make the execution interactive.
This will prompt us to input the values for the parties before executing the computations.
By inputting the values 2 3 we gain the following output
Running MP-SPDZ/Scripts/../mascot-party.x 0 first_mul -I -pn 17723 -h localhost -N 2
Running MP-SPDZ/Scripts/../mascot-party.x 1 first_mul -I -pn 17723 -h localhost -N 2
Using statistical security parameter 40
Please input integers:
2
3
Multiplication: 6
The following benchmarks are including preprocessing (offline phase).
Time = 3.43371 seconds
Data sent = 20.5228 MB in ~55 rounds (party 0 only; use '-v' for more details)
Global data sent = 41.0456 MB (all parties)
As expected the output of the computation is 6.
You can run the program a few times to see that output is always equal to the product of the two input values.
If we do not want to input the input values each time we can use the input-data files from the Player-Data folder.
The input of each party is stored in a file with the name Input-Pi-0 for Party i.
The values of the integers are stored in simple text format separated by spaces.
By adding the input vales of two and three for the both P0 and P1, we can run the protocol with interactive mode, by removing the -I from the execution script, or running
Scripts/compile-run.py mascot first_mul
Giving the same output as before.
By writing this simple program we have now executed our first program using a multiplication, which is the simplest non-trivial thing to do within multi-party computation.
Secure Integer and other MP-SPDZ types
In thefirst_mul.mpc program we use the secret integer sint to denote the input of the parties.
More precisely the sint is a type defined in the high-level interface of MP-SPDZ.
This specific type denotes a secret integer in the protocol-specific domain.
In other words the value of the integer is secret shared across the parties, and the way this secret sharing is realized depends on the protocol used.
When we use the sint.get_input_from(i) a few things happen.
First, party i reads the input from his file.
Secondly, he secret-shares it with the other parties, this is the part that is protocol specific. However, when reading formal protocol this will equate to the Input command in a MPC setting.
Now that we have a secret shared value, we can perform many different operations such as +, -, *, /, and even comparisons ==, != <, > etc.
A thing to note about the high-level interface is that a basic type like the sint is stored in a so-called register in the MP-SPDZ virtual machine.
This means that this value is bound to a single thread and can only be accessed by that thread.
A container type such as Array is stored on so-called memory in the virtual machine.
Such types are not bound to a thread, hence can be access across multiple thread, both allowing for shared data or cross thread communication.
So when designing the MPC program these types needs to be considered.
However, it is important to note that a list of basic types (e.g. a list of sint) does not need to be a container type (e.g an Array) the basic type can be used as vector, where each operation is element-wise.
We can read a vector as input by using
sint.get_input_form(i, size=n)
which reads n inputs from party i into a vector.
MP-SPDZ Control Flow
Due to the compilation step of in MP-SPDZ we cannot use control flow as we would do in python.
The compiler read the python like code in the .mpc file and turn it into byte-code that the MP-SPDZ virtual machine can read.
Due to this compilation step we cannot write the MPC program like a python program, we need to tell the compiler what we are doing.
Python loops written in the .mpc program will be unrolled, hence can have impact on the performance, likewise can only depend on values known at compile time.
Looping
To do a simple for-loop, we need to use the syntax
@for_range(n)
def _(i):
...
Here n is the range we loop over as in Python, however it needs to be a public integer, so we cannot for example loop over a sint type.
Another important aspect is that results from the loop iterations needs to be passed outside the loop using a container type (e.g. Array) or by using the update function on basic types.
A simple loop adding together numbers would look like this
a = sint.get_input_from(0, size=10)
n = 10
res = sint(0)
@for_range(n)
def _(i):
res.update(res + a[i])
Here we see that the total sum is updated using the update function.
The MP-SPDZ compiler can also optimize the loop by running multiple loop-bodies in parallel.
This is done by using the @for_range_opt(n) notation instead.
Functions
The same applies to functions, if we want to use a function definition during the MPC program execution we need to use the high-level interface. The syntax is quite simple and works like you would expect in python.
@function
def mul_acc(a,b,c):
return a * b + c
The arguments to the functions can be either container types or basic types, while the return value must be a basic type. In our case we take 3 integers as input, multiply the first two and add the result to the third.
If statements
Branching is non-trivial in MPC, hence we have two types of branching, one based on secret shared values (sint) and one based on public/clear values.
We first look at the public value case.
Again due to the compilation of MP-SPDZ to have a runtime if-statement, we need to use the correct annotation and syntax. A simple if-statement on a clear/public values looks as follows
@if_(x>0)
def _():
...
Here the condition of the branch is written in the annotation. Hence, this conditional piece of code is executed if x is greater than 0.
Likewise, the if-else-statement looks as follows
@if_e(x > 0)
def _():
...
@else_
def _():
...
This works as expected, we only need to add the e to the first annotation to use the else branch.
Note that to make values live beyond the scope of the if/else branch we need to make use of memory values such an Array or a MemValue (i.e. a single value stored in memory of the virtual machine).
Next the branching based on secret values (sint) works a bit different.
Here we cannot execute code based on the value of the secret.
First of all, we might not know the exact values, and furthermore executing code based on the exact value will reveal something about the value, thus remove its secrecy.
What we can do however is use the value as a MUX i.e. as a bit selecting the output.
The syntax looks as follows
c.if_else(a, b)
The return value is a if c = 1 and b if c = 0. If c have other values the return is undefined.
At first this secret value based if seems strange, however since c is itself a secret values and defines the conditions we can easily make useful construction from this.
For example finding the max of two different secret values can be done as
(a < b).if_else(b, a)
Which in Python would be equivalent to
if a < b:
return b
else:
return a
To perform complex operations we need to take more case, since we only have a MUX on the secret value.
A simple case could be if x < 10 we want to multiply with 2 and if x >= 10 we want to multiply with 3.
To solve this issue we need to compute both branch of the if statement and use the MUX to select the correct output.
Because if both branches are computed the operations executed are independent of the value of the secret, hence the secrecy is not compromised.
x = sint(6)
t = x*2
f = x*3
out = (x < 10).if_else(t, f)
Running the program will give the output 12 as expected, and changing x=sint(12) to compute the other branch give the output 36.
Number of points inside a circle
Let us use what we have learn to write a simple program, where party 0 provides a circle and party 1 provides a series of point and both parties learn the number of points within the circle.
Given the circle center is as $(x_0, y_0)$ with radius $r$ we can check the point $(x,y)$ using the formula
\sqrt{(x-x_0)^2 +(y-y_0)^2} < r
Computing square-roots does not result in integers so by rewriting we get
(x-x_0)^2 + (y-y_0)^2 -r^2 < 0
which can be computed by integers only.
The code could look as follows
circ_center = sint.get_input_from(0, size = 2)
radius = sint.get_input_from(0)
num_points = sint.get_input_from(1)
num_points = num_points.reveal()
count = sint(0)
@for_range(num_points)
def _(i):
point = sint.get_input_from(1, size = 2)
cal = sint(0)
cal += (circ_center[0] - point[0]).square()
cal += (circ_center[1] - point[1]).square()
cal -= radius.square()
inside = (cal < 0)
count.update(count + inside)
num_inside = count.reveal()
print_ln('Total number of point in circle: %s', num_inside)
Benchmarking using MP-SPDZ
The core of MP-SPDZ is to benchmark MPC protocols in different security settings. Using our example of counting the number of points inside a circle we can benchmark our protocols. As we have been running using MASCOT so far let us see how it performs. On my machine the following are the results
| Time (Offline) | Communication | |
|---|---|---|
| Mascot | 1.05s (0.94) | 59.5MB |
We can then test the same program but using a different protocol. MASCOT was based on OT to perform the computation. We now benchmark the same program using the HighGear protocol which is based on somewhat homomorphic encryption.
| Time (Offline) | Communication per party | |
|---|---|---|
| Mascot | 1.05s (0.94s) | 59.5MB |
| HighGear | 27.9s (27.9s) | 414.3 MB |
The results are as expected since homomorphic encryption is slower than the OT based approach.
The benchmarks can be better suited the specific program by changing the batch-size of the preprocessing by using --batch-size when running the program.
In our case it changes little regarding the performance of HighGear
Both of the protocols benchmarked have the security model of Malicious, dishonest majority. A setting could be where we are unsure that the majority is honest, hence use a protocol in this setting. We benchmarked the malicious, honest majority version of Shamir secret sharing.
| Time (Offline) | Communication per party | |
|---|---|---|
| Mascot | 1.05s (0.94s) | 59.5MB |
| HighGear | 27.9s (27.9s) | 414.3 MB |
| Shamir(Malicious) | 0.091s (0.081s) | 2.44 MB |
Furthermore, we can relax the security even more by looking at the semi-honest version of Shamir. In this setting every party must follow the protocol, and can only use the knowledge gained during the execution to try to extract information. This could be in a case where every party is trusted, but someone might be spying on the communication. The result becomes
| Time (Offline) | Communication per party | |
|---|---|---|
| Mascot | 1.05s (0.94s) | 59.5MB |
| HighGear | 27.9s (27.9s) | 414.3 MB |
| Shamir(Malicious) | 0.091s (0.081s) | 2.44 MB |
| Shamir(Semi-Honest) | 0.015s (0.008s) | 0.33 MB |
From the few benchmarks conducted the power of MP-SPDZ becomes clear.
The program can be benchmarked using different protocol to see which performs the best.
Furthermore, the program can be optimized to both decrease time consumption and communication very efficiently since MP-SPDZ tracks this information.
To gain a better overview of what the time and communication is spent on the execution of the protocol can be done in verbose mode -v which report time and communication of each part of the execution.
Secure Sorting in MPC
Sorting is a fundamental operation in computer science. Whether you're processing data for analytics, searching for values, or ensuring order for algorithms like binary search, sorting is everywhere. But in the context of secure multi-party computation (MPC), even basic tasks like sorting become non-trivial.
This tutorial explains how to perform sorting securely in MPC: why traditional algorithms do not work, and how to use two classic algorithms in the MPC setting using the MP-SPDZ framework. We assume that the reader is familiarized with the very basics of MP-SPDZ, for example by reading documentation.
The challenge
In MPC, multiple parties compute jointly on private data without revealing their individual inputs. Each value is secret-shared, meaning no single party can see the actual plaintext value.
A traditional sorting algorithm may perform operations like this:
if x[i] > x[j]:
swap(x[i], x[j])
This is a problem in MPC because the condition x[i] > x[j] is evaluated in the clear, and the if statement leaks information about the data ordering, and consequently the inputs.
Hence, sorting in MPC cannot perform operations such as conditional branches based on secret values, revealing result of comparisons or having secret bounds on loop iterations.
Instead, one must use data-independent control flow and secure conditional swaps that are oblivious to the data being operated upon.
Secure sorting
Sorting in MPC is performed using oblivious sorting algorithms, where the sequence of operations (comparisons and swaps) is fixed in advance, independent of input data. Two well-known examples are:
- Bubble Sort (educational, but inefficient)
- Odd-Even Mergesort (efficient and practical for MPC)
These algorithms use secure comparisons and conditional swaps to avoid leaking information.
In frameworks like MP-SPDZ, this is typically done using cond.if_else(a, b):
cond = x > y # Secure comparison result
min_val = cond.if_else(y, x)
max_val = cond.if_else(x, y)
Bubble Sort in MPC
Bubble Sort is one of the simplest sorting algorithms. It repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. While intuitive, it has poor performance due to its O(n^2) time complexity, for datasets containing n elements.
In the MPC setting, we can implement Bubble Sort using nested loops and secure compare-and-swap logic.
def bubble_sort(secret_list):
n = len(secret_list)
for i in range(n):
for j in range(n - i - 1):
a = secret_list[j]
b = secret_list[j + 1]
cond = a > b
secret_list[j] = cond.if_else(b, a)
secret_list[j + 1] = cond.if_else(a, b)
Odd-Even Mergesort in MPC
Odd-Even Mergesort is a parallel, recursive sorting network. It's based on the idea of recursively sorting halves of the array and then merging them using a fixed compare-and-swap pattern called the odd-even merge. This makes it ideal for secure computation because its pattern is predictable and independent of data.
Odd-Even Mergesort has a much better asymptotic complexity of O(n log^2 n), and it scales well for large inputs in MPC. In fact, it is one of the sorting algorithms implemented within MP-SPDZ.
def compare_and_swap(x, i, j):
a = x[i]
b = x[j]
cond = a > b
x[i] = cond.if_else(b, a)
x[j] = cond.if_else(a, b)
def odd_even_merge(x, lo, n, r):
step = r * 2
if step < n:
odd_even_merge(x, lo, n, step)
odd_even_merge(x, lo + r, n, step)
for i in range(lo + r, lo + n - r, step):
compare_and_swap(x, i, i + r)
else:
compare_and_swap(x, lo, lo + r)
def odd_even_mergesort_rec(x, lo, n):
if n > 1:
m = n // 2
odd_even_mergesort_rec(x, lo, m)
odd_even_mergesort_rec(x, lo + m, m)
odd_even_merge(x, lo, n, 1)
def odd_even_mergesort(x):
odd_even_mergesort_rec(x, 0, len(x))
Experimental results
To demonstrate the trade-offs between these algorithms, let's try a little demo. After implementing the algorithms in MP-SPDZ, we will compile and run two sorting programs with inputs of 128 elements:
n = 128 # must be a power of 2 to keep odd-merge-sort simple
x = [sint.get_input_from(0) for _ in range(n)] # secret input from party 0
sorting_algorithm(x) # odd_even_mergesort or bubble_sort
for i in range(n):
print_ln("%s", x[i].reveal())
Notice how secret-shared values need to be opened to all parties using reveal before they can be printed to check that the output is really sorted. We will use a small piece of code in Python to populate the array for the first party:
from random import randint
with open("Player-Data/Input-P0-0", "w") as file:
file.writelines(f"{randint(1, 100)}\n" for _ in range(128))
Now let's compile and run both programs on localhost using the MASCOT backend in the active security / dishonest majority setting, and look at the performance results. Starting with bubble-sort.mpc:
./compile.py bubble-sort.mpc
Scripts/mascot.sh -v bubble-sort
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../mascot-party.x 0 bubble-sort -pn 16954 -h localhost -N 2
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../mascot-party.x 1 bubble-sort -pn 16954 -h localhost -N 2
Using statistical security parameter 40
...
The following benchmarks are including preprocessing (offline phase).
Time = 242.044 seconds
Data sent = 21954.3 MB in ~29247 rounds (party 0 only; use '-v' for more details)
Global data sent = 43888.1 MB (all parties)
And now with odd-merge-sort.mpc
./compile.py odd-even-mergesort.mpc
Scripts/mascot.sh -v odd-even-mergesort
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../mascot-party.x 0 odd-even-mergesort -pn 17594 -h localhost -N 2
Running /home/dfaranha/projects/MP-SPDZ/Scripts/../mascot-party.x 1 odd-even-mergesort -pn 17594 -h localhost -N 2
Using statistical security parameter 40
...
The following benchmarks are including preprocessing (offline phase).
Time = 69.8483 seconds
Data sent = 6557.69 MB in ~8023 rounds (party 0 only; use '-v' for more details)
Global data sent = 13094.9 MB (all parties)
One can immediately see the much lower number of multiplications and openings in the second approach, leading to huge savings in time and communication.