-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an implementation of an SQL encrypted query on a clear database using TFHE-rs #94
Comments
Can we assume that the encrypted query is a tuple of multiple encrypted components parsed from the original SQL query? I saw a similar approach in the following paper for an equality comparison. Doing so will likely mean loosing some privacy with regard to access patterns. However, we can likely also deduce an access pattern based on using a clear database. |
@BraunPhilipp the nature of the query should remain concealed, the format after encryption is up to you |
As the result is a For example, if we run |
you are right, that's an oversight, decryption needs the secret key, the string is just whatever equivalent form of the data to easily compare with your cleartext query (my guess is it's easier to stringify the result of the clear query rather than putting the encrypted result formatted as a DB entry) |
If there is multiple rows that match, should we support returning multiple rows? Or just return one of the matching rows? |
has to match the expected SQL behavior, so you should run the clear query on the SQL system you use as reference and check they are the same, so if the clear query returns several rows, then return the several rows |
I don't think this works. You would need to hold enough space in the return value to hold the complete database. Reason: You may as well select the whole database, but the server can't know because it doesn't see the query. So it always needs to reserve enough space to potentially fit the whole DB into the result. Or am I getting something wrong? |
No I think that if you do it to be completely oblivious that's correct, that's what we are looking at for now, we don't expect to run queries on super large DBs, I mean we know some implementations are impractical though they work (like the ecdsa signature we did a while ago) |
OK, just to be clear: So you actually expect the encrypted result to be large enough to hold the whole DB? Indeed, that seems to get impractical very quickly considering FHE ciphertext sizes. And thinking about it further: If you always return something that is at least the size of the DB, couldn't you just return the full DB always and let the client do the searching? (If it can handle downloading the full DB, it can surely handle searching on it.) 🤔 |
fully agree, let's say the ideal setting is that the data is too large to download, but you need to be able to query it without the holder seeing what you query obviously in an FHE setting because of the ciphertext size it's insane, but there should be improvements on that end, the remaining issue is just that such a query performed in a fully oblivous manner looks to be infeasible while returning reasonably sized data I guess future bounties or research avenues is to make leaky queries with a protocole allowing to restrict the size of the data returned, but that's not the object of this bounty :) |
here it's a kind of "let's make a toy sized bounty of something we would ideally want to do but that might remain impractical" Also perhaps for a very smart SQL query the integer layer is not the one to use, but asking some deep crypto knowledge/implems (like dropping to the LWE/GLWE level) is not what bounties are about but more about using the lib, seeing what's doable and what's not doable |
A middle ground could be accepting a cleartext
https://www.w3schools.com/sql/sql_top.asp
Note that SQL results are not guaranteed to have any specific order (if you don't use order by). Hence, you could get different outputs on two different runs of two correct implementations. |
hash maps to the rescue for the unspecified order, I don't know about the LIMIT one but we will not feed huge DBs for the bounties, as computation will likely be impractical in terms of runtime anyways |
To clarify the expected work:
|
execution on the server side for the select distinct, otherwise a way to answer to all queries is to return the whole database under the secret key of the user and let the client process the data, you can see how it's a bit of a loop hole if we accept post processing on the client side we expect the query to be encrypted, I don't think we have anything about hiding the length of the query for this bounty |
|
yes for the sparse part you are correct, the FHE server won't be able to diminish the size of the output due to the oblivious nature of the processing, having the client parse the result and ignore rows/results marked as empty is fair game |
|
Format of the EncryptedQuery is free, as long as the query is encrypted and therefore protected from prying eyes |
|
Hmm the encrypted values should match the type of the column I think to hide the encrypted range ? |
Sure, that would obviously hide any range. But, that would require the client to know the exact type of each individual columns in the table (an any other tables in case of JOIN), right ? What about question 1) ? |
ok that's a good point, I would guess the header of the DB is known ? So that the user knows how to encrypt the data to send As for 1, I'm not sure yet, will discuss to see what other think internally as to what we want |
Is choice 4 proposed by @Juul-Mc-Goa allowed (ie: return every table encrypted (with |
I don’t think it makes sense, if the whole table is returned encrypted and be done with it, why make anything in FHE at all 🤔 |
That's why I'm asking... Ok, so, to be crystal clear: the only acceptable result is :
Am I right ? |
and you do well to ask, columns were deemed unnecessary given the client knows what the request was so can apply the column filter itself The DB cannot be just a DB returned without any processing done on it, doing the last round of selection on the client side is fine, but just returning the DB is not considered fair game. Also no PublicKey involved here |
Just to make sure I'm following: So even tho the server computes the WHERE AST + DISTINCT in order to produce the row selection bools, the full (encrypted) DB cannot be returned? (i.e. Server should return the homomorphically-selected table) |
The thing here is that there are two extremes:
In between the extremes for performance reasons we can likely have a middle ground, the middle ground where part of the processing is done in FHE some remaining trivial steps are done on the client side. I don't think returning the whole DB in the clear (or trivially encrypted as both are equivalent) or encrypted (which you cannot do without a public key which is not part of the current challenge) is an interesting way of solving the task. Recap:
Re reading earlier answers I can see how there may be something somewhat unclear (and as I am not implementing things I might have overlooked it) but from this question #94 (comment)
If the full DB was to be returned trivially encrypted then the answer to the full DB is no, if it has been processed in some way then it's ok. The masked DB is obviously ok as there is no way to shrink its size as far as I can see. But maybe that means that the DB + vector of bools in the end are not a fitting solution. Please don't hesitate to ask questions to precise everything, I don't want to mislead you guys @JoseSK999 @0xalexbel @Juul-Mc-Goa @pandasamanvaya @RasoulAM @yellow123Nike @BraunPhilipp @matthiasgeihs It's a big bounty and we certainly haven't thought of all the ways it could be built and made faster, plus I'm not checking this 24/7 for implementation as we have quite a lot of stuff to do on other parts FHE anyways 😉 As a side note: The conclusion of the whole exercise after the bounty reviews might also be that a fully oblivious SQL encrypted query without communication during processing is not feasible at scale because the full size DB is returned in an encrypted form either partially or fully processed, which means that for practical applications it is unusable as is. |
I think the comment you are refering to could actually mean:
So this would be equivalent to choice n°2 |
@Juul-Mc-Goa yes 2 is fine As long as the DB sees some processing in FHE I would say it's ok As there is no public key and no trivial encryption is accepted her I would think that the conditions are equivalent here. |
Couple of questions around implementation -
|
A friendly reminder that the Submission deadline is May 12th, 2024 at 23:59 AoE (Anywhere on Earth). Good luck! |
Yes to the first question And yes to the second as well |
Hey @zaccherinij @IceTDrinker. Just a humble request. Since there were lot of ambiguities around the bounty and most of them were clarified in the last few weeks, is it possible to extend the deadline of this bounty by a couple of days(2-3 days)? This is an quite interesting bounty and we wouldn't like to miss the chance to make a submission. This would also help some people to polish their and people like me to make a submission. |
Hi @pandasamanvaya,
To keep things fair to everyone, we will follow the initial deadline.
We encourage you to submit at the deadline, even if you’re work can be
enhanced, as we will be ranking three winning submissions.
Best of luck
Jérémy
Le sam. 11 mai 2024 à 14:49, Samanvaya Panda ***@***.***> a
écrit :
Hey @zaccherinij <https://github.com/zaccherinij> @IceTDrinker
<https://github.com/IceTDrinker>. Just a humble request. Since there were
lot of ambiguities around the bounty and most of them were clarified in the
last few weeks, is it possible to extend the deadline of this bounty by a
couple of days(2-3 days)? This is an quite interesting bounty and we
wouldn't like to miss the chance to make a submission. This would also help
some people to polish their and people like me to make a submission.
—
Reply to this email directly, view it on GitHub
<#94 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABL53W5PU3HYO74YB6RO24DZBYHULAVCNFSM6AAAAABDBQ5UGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVG4ZTKOJYGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Jérémy Zaccherini*
VP Marketing at Zama
|
Couple of questions:
|
@0xalexbel I don't think you'll get a reply from our dev team today! But I encourage you to submit your code before the due date in any case, as three submissions will be rewarded, and no mistake is eliminatory per se :) Best of luck. JZ |
1 similar comment
@0xalexbel I don't think you'll get a reply from our dev team today! But I encourage you to submit your code before the due date in any case, as three submissions will be rewarded, and no mistake is eliminatory per se :) Best of luck. JZ |
I am ready to submit but the link does not work ! |
Hi,
it is working on my end, see below:
https://www.zama.ai/bounty-and-grant-program
You have a form called: "submit to the bounty program".
Let me know if you need more help!
Cheers,
JZ
…On Sun, May 12, 2024 at 7:04 PM Alexandre Belhoste ***@***.***> wrote:
I am ready to submit but the link does not work !
https://zama.ai/bounty-program
Where is the submit link ?
—
Reply to this email directly, view it on GitHub
<#94 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABL53WYUYAI3GQEL5XTRJRLZB6OJXAVCNFSM6AAAAABDBQ5UGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBWGMYTKOBXGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Is it possible to secure the submission and still continue to write the README.md file explaining the whole algorithm and polish the doc. Since the deadline in 15h or so ? |
@0xalexbel Yes, no worries. Our team will review submissions based on their latest PR at the submission deadline time. Meaning you can submit now, and edit your submissions until the submission deadline which is: tomorrow 2pm for Paris—FR time. |
It's probably too late to ask the question, but still... Couple of questions:
Note: I tried to interprete the best I could, hope I was right... 😅 |
|
The program exits with error code 1 if something went wrong, like syntax error, wrong db format etc. ?? The program does not know if the SQL select is correct since it performed the request blindly |
the program does one request in the clear and one encrypted compares the two, I think it was made clear in the original post ? |
I read the original instructions very carefully several times, there no mention of the comparison request. But i can add it... |
Thank you to everyone who submitted to the Zama Bounty Program Season 5. Our team will review all submissions and give some initial feedbacks in the coming days! |
Overview
TFHE-rs is a pure Rust implementation of TFHE for Boolean and integer arithmetics over encrypted data.
With this bounty, we are asking users to implement an SQL encrypted query on a clear database.
Here's a simplified overview of how FHE can be used to implement an SQL encrypted query on a clear database:
1. Query Encryption
The first step involves the user encrypting their SQL query using their FHE secret key before sending it to the server hosting the database. The encryption ensures that the server cannot see the actual query, only its encrypted form.
2. Processing Encrypted Query on Clear Database
The server then processes this encrypted query on the clear database. FHE computations mixing clear and encrypted data will return encrypted data, it means the server can perform the necessary SQL operations (like SELECT, JOIN, etc.) using the clear data as well as the encrypted query. The result of this operation is an encrypted version of the query result.
3. Decrypting the Result
Finally, the encrypted result is sent back to the user, who can decrypt it using their private key to obtain the clear text result of the SQL query.
We ask that you, the hunters, implement a database that can manipulate signed and unsigned integers (8, 16, 32, 64 bits), booleans and short strings (32 ASCII characters) as possible data types for a column.
As we are not looking at a classical database use case, we don’t expect the database to be resilient to crashes, be highly available etc. it is merely a datastore addressed in a similar way to a database via an encrypted SQL query.
How to participate?
1️⃣ Register here.
2️⃣ When ready, submit your code here.
🗓️ Submission deadline: May 12th, 2024.
Description
Expected Work
The implementation should live in its own crate (and not in tfhe-rs/tfhe/example) and depend on either tfhe-rs 0.5.x or the main branch of tfhe-rs.
We ask that your crate passes the clippy checks when treating warnings as errors.
We ask you to provide both an executable and an API.
Executable
For the clear database storage you can use existing databases or tools but we require that you manage to load a database from a directory with the following structure:
db_dir
example of a table_content.csv:
The executable takes as input a database to load with the format discussed earlier and a file containing the SQL query in plaintext to avoid dealing with escaping quotes on the command line, we require the following format for the output:
erroneous results will automatically disqualify a submission.
API
You are free to use the High Level API with CPU or GPU or the integer API with CPU or GPU.
We only expect you to implement ONE API, implementing more APIs won’t improve you submission’s classification, correctness is mandatory, incorrect implementations will be disqualified, performance will be the main differentiator for submissions.
For the encryption we ask that you use a Select query from the sqlparser crate as input. You may choose another type if you explain how it's a better choice. You must return your own EncryptedQuery representation that can be used on the device of your choice.
There are 3 possible APIs to implement
The APIs are specified in detail below, choose the one(s) you wish to implement (only implement methods of the API you chose)
CPU Integer API
GPU Integer API
High-Level API
SQL support
We ask to be able to run the following encrypted queries:
Requested:
SQL Select: https://www.w3schools.com/sql/sql_select.asp
SQL Select Distinct: https://www.w3schools.com/sql/sql_distinct.asp
SQL Where: https://www.w3schools.com/sql/sql_where.asp
SQL And: https://www.w3schools.com/sql/sql_and.asp
SQL Or: https://www.w3schools.com/sql/sql_or.asp
SQL Not: https://www.w3schools.com/sql/sql_not.asp
SQL In: https://www.w3schools.com/sql/sql_in.asp
SQL Between: https://www.w3schools.com/sql/sql_between.asp
For integer values we expect you to manage the <, <=, >, >=, = operators
For strings we expect you to manage the = operator only
Note that the != (not equal) operator is built via the NOT operator of SQL and an = operator.
Note that some operators may re-use other implementations, for example the IN operator is indicated to be equivalent to multiple OR operators.
Bonus API:
SQL Joins: https://www.w3schools.com/sql/sql_join.asp
The format of the encrypted query and result are free, the evaluation of this bounty will be on speed and correctness on the Requested APIs, bonus APIs are NOT required and would only be used as a last resort to differentiate two submissions.
Other Details
Benchmarking
Implementations will be benchmarked on the following AWS hardware:
CPU: hpc7a.96xlarge (192 vCPU, 768 GB Memory)
GPU: p3.2xlarge (note that TFHE-rs only supports single GPU execution for now) (1 Tesla V100, 16 GB GPU Memory)
Parameters
The allowed parameters are:
PARAM_MESSAGE_2_CARRY_2_KS_PBS,
PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_2_KS_PBS,
PARAM_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS
Your solution must be able to run with the multi bit and non multi bit parameter sets, here it should be straightforward as the message and carry moduli are the same.
Note that the multi bit parameters should be faster on GPU, namely the group 3 should be the fastest. It could be the case on CPU as well but it can saturate even the largest CPU machines with threads and performance suffers as a consequence.
Turtle shell queries
As Mario Kart has its turtle shells, you will be able, if you want to, to provide one example of a small DB (with the directory and csv structure discussed earlier) and a “turtle shell” query you feel is very hard/has corner cases alongside your submission, we will run the DB and the query on the submissions we gather, corner cases beware! The performance and correctness on these hunter provided use cases will be used for the evaluation of the bounty, in addition to cases we’ll choose ourselves, we ask that the query runs in under 5 minutes on a laptop/commodity hardware, though our benchmarking machine are larger we want to keep runtimes reasonable when possible.
Good luck!
Reward
🥇Best submission: up to €5,000
To be considered best submission, a solution must be efficient, effective and demonstrate a deep understanding of the core problem. Alongside the technical correctness, it should also be submitted with a clean code, clear explanations and a complete documentation.
🥈Second-best submission: up to €3,000
For a solution to be considered the second best submission, it should be both efficient and effective. The code should be neat and readable, while its documentation might not be as exhaustive as the best submission, it should cover the key aspects of the solution.
🥉Third-best submission: up to €2,000
The third best submission is one that presents a solution that effectively tackles the challenge at hand, even if it may have certain areas of improvement in terms of efficiency or depth of understanding. Documentation should be present, covering the essential components of the solution.
Reward amounts are decided based on code quality and speed performance on a m6i.metal AWS server.
Related links and references
Questions?
Do you have a specific question about this bounty? Simply comment this issue and our team will get back to you!
The text was updated successfully, but these errors were encountered: