Undergratuate Thesis - Rener Oliveira
- Latex Template: EMAp model from Lucas Moschen;
- C++ Code: HELR library, forked from Kyoohyung Han.
The amount of data generated by individuals and enterprises is growing exponentially over the last decades, which empowers the use of machine learning methods since, for statistical purposes, the more data a model can have access to, the more accurately it will predict or represent reality. The problem emerges when the model must deal with sensitive data such as medical records, financial history, or genomic data, in which additional care must be taken in order to protect the privacy of data owners. Encrypting sensitive data might appear a good solution at first sight, but it can considerably limit the ability to do statistical analysis. This work is a survey on Fully Homomorphic Encryption (FHE), a special kind of cryptography scheme that still permits some machine learning methods to run over encrypted data, while it has strong mathematical guarantees of privacy protection.
Table of Contents (click to expand)
- INTRODUCTION
- ALGEBRAIC REVIEW
- 2.1 - Basic structures
- 2.2 - Homomorphisms and Quotient Rings
- 2.3 - Cyclotomic polynomials
- 2.4 - Lattices
- 2.4.1 - Lattice Problems
- 2.4.2 - Ring Learning with Errors
- FULLY HOMOMORPHIC ENCRYPTION
- 3.1 - Privacy Homomorphisms
- 3.1.1 - Requirements and Limitations
- 3.2 - Bootstrappable encryption
- 3.2.1 - Overview and Bootstrapping
- 3.2.2 - An integer scheme
- 3.2.3 - Practical considerations and further research
- 3.3 - FHE over the complex numbers
- 3.3.1 - Encoding and Decoding
- 3.3.2 - Encryption, Decryption, and Relinearization
- 3.3.3 - Approximate Bootstrapping
- 3.1 - Privacy Homomorphisms
- PRIVATE LOGISTIC REGRESSION
- 4.1 - Statistical Review
- 4.2 - Homomorphic Training
- 4.2.1 - Ciphertext packing and data representation
- 4.2.2 - Batch Inner Product
- 4.3 - Data Applications
- CONCLUSIONS AND FURTHER WORK
- APPENDIX A - AN IDEAL LATTICE SCHEME
- A.1 Initial definitions
- A.2 Abstract construction
- A.3 Concrete construction using ideal lattices
Summarized results of encrypted logistic regression training.
- Dataset: Subset of the TissueMNIST dataset; 92672 rows and 196 columns;
- Machine: 64-bit quad-core Intel Core i5-6200U 2.3GHz CPU, 16GB of RAM;
- CKKS parameters:
$N=2^{15}, q=2^{45}$ (more details here). - FHE Performance:
KeyGen time | Encrypt time | Training time | Public key size |
---|---|---|---|
8.52 min | 19.17 min | 6.19 hours | 2.62 GB |
- Model performance:
Encrypted | Unencrypted | |
---|---|---|
Accuracy | 64.6211% | 64.6363% |
AUC | 81.7039% | 81.6996% |