Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-848] Add support for encrypted data stored in Amazon S3 #886

Closed
wants to merge 12 commits into from
Closed
27 changes: 26 additions & 1 deletion conf/zeppelin-site.xml.template
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@
</property>


<!-- If used S3 to storage the notebooks, it is necessary the following folder structure bucketname/username/notebook/ -->
<!-- Amazon S3 notebook storage -->
<!-- Creates the following directory structure: s3://{bucket}/{username}/{notebook-id}/note.json -->
<!--
<property>
<name>zeppelin.notebook.s3.user</name>
Expand All @@ -89,6 +90,30 @@
</property>
-->

<!-- Additionally, encryption is supported for notebook data stored in S3 -->
<!-- Use the AWS KMS to encrypt data -->
<!-- If used, the EC2 role assigned to the EMR cluster must have rights to use the given key -->
<!-- See https://aws.amazon.com/kms/ and http://docs.aws.amazon.com/kms/latest/developerguide/concepts.html -->
<!--
<property>
<name>zeppelin.notebook.s3.kmsKeyID</name>
<value>AWS-KMS-Key-UUID</value>
<description>AWS KMS key ID used to encrypt notebook data in S3</description>
</property>
-->

<!-- Use a custom encryption materials provider to encrypt data -->
<!-- No configuration is given to the provider, so you must use system properties or another means to configure -->
<!-- See https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/EncryptionMaterialsProvider.html -->
<!--
<property>
<name>zeppelin.notebook.s3.encryptionMaterialsProvider</name>
<value>provider implementation class name</value>
<description>Custom encryption materials provider used to encrypt notebook data in S3</description>
</property>
-->


<!-- If using Azure for storage use the following settings -->
<!--
<property>
Expand Down
12 changes: 12 additions & 0 deletions docs/install/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,18 @@ You can configure Zeppelin with both **environment variables** in `conf/zeppelin
<td>s3.amazonaws.com</td>
<td>Endpoint for the bucket</td>
</tr>
<tr>
<td>ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID</td>
<td>zeppelin.notebook.s3.kmsKeyID</td>
<td></td>
<td>AWS KMS Key ID to use for encrypting data in S3 (optional)</td>
</tr>
<tr>
<td>ZEPPELIN_NOTEBOOK_S3_EMP</td>
<td>zeppelin.notebook.s3.encryptionMaterialsProvider</td>
<td></td>
<td>Class name of a custom S3 encryption materials provider implementation to use for encrypting data in S3 (optional)</td>
</tr>
<tr>
<td>ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING</td>
<td>zeppelin.notebook.azure.connectionString</td>
Expand Down
80 changes: 60 additions & 20 deletions docs/storage/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,16 @@ limitations under the License.
### Notebook Storage

Zeppelin has a pluggable notebook storage mechanism controlled by `zeppelin.notebook.storage` configuration option with multiple implementations.
There are few Notebook storages available for a use out of the box:
There are few Notebook storage systems available for a use out of the box:
- (default) all notes are saved in the notebook folder in your local File System - `VFSNotebookRepo`
- there is also an option to version it using local Git repository - `GitNotebookRepo`
- another option is Amazon S3 service - `S3NotebookRepo`
- another option is Amazon's S3 service - `S3NotebookRepo`

Multiple storages can be used at the same time by providing a comma-separated list of the class-names in the configuration.
Multiple storage systems can be used at the same time by providing a comma-separated list of the class-names in the configuration.
By default, only first two of them will be automatically kept in sync by Zeppelin.

</br>

#### Notebook Storage in local Git repository <a name="Git"></a>

To enable versioning for all your local notebooks though a standard Git repository - uncomment the next property in `zeppelin-site.xml` in order to use GitNotebookRepo class:
Expand All @@ -42,44 +43,46 @@ To enable versioning for all your local notebooks though a standard Git reposito
```

</br>

#### Notebook Storage in S3 <a name="S3"></a>

For notebook storage in S3 you need the AWS credentials, for this there are three options, the environment variable ```AWS_ACCESS_KEY_ID``` and ```AWS_ACCESS_SECRET_KEY```, credentials file in the folder .aws in you home and IAM role for your instance. For complete the need steps is necessary:
Notebooks may be stored in S3, and optionally encrypted. The [``DefaultAWSCredentialsProviderChain``](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html) credentials provider is used for credentials and checks the following:

- The ``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` environment variables
- The ``aws.accessKeyId`` and ``aws.secretKey`` Java System properties
- Credential profiles file at the default location (````~/.aws/credentials````) used by the AWS CLI
- Instance profile credentials delivered through the Amazon EC2 metadata service

</br>
you need the following folder structure on S3
The following folder structure will be created in S3:

```
bucket_name/
username/
notebook/

s3://bucket_name/username/notebook-id/
```

set the environment variable in the file **zeppelin-env.sh**:
Configure by setting environment variables in the file **zeppelin-env.sh**:

```
export ZEPPELIN_NOTEBOOK_S3_BUCKET = bucket_name
export ZEPPELIN_NOTEBOOK_S3_USER = username
```

in the file **zeppelin-site.xml** uncomment and complete the next property:
Or using the file **zeppelin-site.xml** uncomment and complete the S3 settings:

```
<!--If used S3 to storage, it is necessary the following folder structure bucket_name/username/notebook/-->
<property>
<name>zeppelin.notebook.s3.user</name>
<value>username</value>
<description>user name for s3 folder structure</description>
</property>
<property>
<name>zeppelin.notebook.s3.bucket</name>
<value>bucket_name</value>
<description>bucket name for notebook storage</description>
</property>
<property>
<name>zeppelin.notebook.s3.user</name>
<value>username</value>
<description>user name for s3 folder structure</description>
</property>
```

uncomment the next property for use S3NotebookRepo class:
Uncomment the next property for use S3NotebookRepo class:

```
<property>
Expand All @@ -89,12 +92,49 @@ uncomment the next property for use S3NotebookRepo class:
</property>
```

comment the next property:
Comment out the next property to disable local notebook storage (the default):

```
<property>
<name>zeppelin.notebook.storage</name>
<value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value>
<description>notebook persistence layer implementation</description>
</property>
```
```

#### Data Encryption in S3

##### AWS KMS encryption keys

To use an [AWS KMS](https://aws.amazon.com/kms/) encryption key to encrypt notebooks, set the following environment variable in the file **zeppelin-env.sh**:

```
export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID = kms-key-id
```

Or using the following setting in **zeppelin-site.xml**:
```
<property>
<name>zeppelin.notebook.s3.kmsKeyID</name>
<value>AWS-KMS-Key-UUID</value>
<description>AWS KMS key ID used to encrypt notebook data in S3</description>
</property>
```

##### Custom Encryption Materials Provider class

You may use a custom [``EncryptionMaterialsProvider``](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/EncryptionMaterialsProvider.html) class as long as it is available in the classpath and able to initialize itself from system properties or another mechanism. To use this, set the following environment variable in the file **zeppelin-env.sh**:


```
export ZEPPELIN_NOTEBOOK_S3_EMP = class-name
```

Or using the following setting in **zeppelin-site.xml**:
```
<property>
<name>zeppelin.notebook.s3.encryptionMaterialsProvider</name>
<value>provider implementation class name</value>
<description>Custom encryption materials provider used to encrypt notebook data in S3</description>
</property>
```
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,7 @@ public List<Map<String, String>> generateNotebooksInfo(boolean needsReload) {
try {
notebook.reloadAllNotes();
} catch (IOException e) {
LOG.error("Fail to reload notes from repository");
LOG.error("Fail to reload notes from repository", e);
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,14 @@ public String getEndpoint() {
return getString(ConfVars.ZEPPELIN_NOTEBOOK_S3_ENDPOINT);
}

public String getS3KMSKeyID() {
return getString(ConfVars.ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID);
}

public String getS3EncryptionMaterialsProviderClass() {
return getString(ConfVars.ZEPPELIN_NOTEBOOK_S3_EMP);
}

public String getInterpreterDir() {
return getRelativeDir(ConfVars.ZEPPELIN_INTERPRETER_DIR);
}
Expand Down Expand Up @@ -497,6 +505,8 @@ public static enum ConfVars {
ZEPPELIN_NOTEBOOK_S3_BUCKET("zeppelin.notebook.s3.bucket", "zeppelin"),
ZEPPELIN_NOTEBOOK_S3_ENDPOINT("zeppelin.notebook.s3.endpoint", "s3.amazonaws.com"),
ZEPPELIN_NOTEBOOK_S3_USER("zeppelin.notebook.s3.user", "user"),
ZEPPELIN_NOTEBOOK_S3_EMP("zeppelin.notebook.s3.encryptionMaterialsProvider", null),
ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID("zeppelin.notebook.s3.kmsKeyID", null),
ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING("zeppelin.notebook.azure.connectionString", null),
ZEPPELIN_NOTEBOOK_AZURE_SHARE("zeppelin.notebook.azure.share", "zeppelin"),
ZEPPELIN_NOTEBOOK_AZURE_USER("zeppelin.notebook.azure.user", "user"),
Expand Down
Loading