Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Level Encryption plugin #12902

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

asonje
Copy link

@asonje asonje commented Mar 25, 2024

Description

This pull request adds index level encryption features to OpenSearch based on the issue #3469. Each OpenSearch index is individually encrypted based on user provided encryption keys. A new cryptofs store type index.store.type is introduced which instantiates a CryptoDirectory that encrypts and decrypts files as they are written and read respectively

Related Issues

Resolves #3469

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

asonje added 4 commits March 25, 2024 09:23
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Signed-off-by: Olasoji Denloye <olasoji.denloye@intel.com>
Copy link
Contributor

github-actions bot commented Dec 5, 2024

❌ Gradle check result for ded0f58: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@asonje
Copy link
Author

asonje commented Dec 5, 2024

{"run-benchmark-test": "id_3"}

Copy link
Contributor

github-actions bot commented Dec 5, 2024

The Jenkins job url is https://build.ci.opensearch.org/job/benchmark-pull-request/1810/ . Final results will be published once the job is completed.

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-pull-request/1810/

Metric Task Value Unit
Cumulative indexing time of primary shards 0 min
Min cumulative indexing time across primary shards 0 min
Median cumulative indexing time across primary shards 0 min
Max cumulative indexing time across primary shards 0 min
Cumulative indexing throttle time of primary shards 0 min
Min cumulative indexing throttle time across primary shards 0 min
Median cumulative indexing throttle time across primary shards 0 min
Max cumulative indexing throttle time across primary shards 0 min
Cumulative merge time of primary shards 0 min
Cumulative merge count of primary shards 0
Min cumulative merge time across primary shards 0 min
Median cumulative merge time across primary shards 0 min
Max cumulative merge time across primary shards 0 min
Cumulative merge throttle time of primary shards 0 min
Min cumulative merge throttle time across primary shards 0 min
Median cumulative merge throttle time across primary shards 0 min
Max cumulative merge throttle time across primary shards 0 min
Cumulative refresh time of primary shards 0 min
Cumulative refresh count of primary shards 2
Min cumulative refresh time across primary shards 0 min
Median cumulative refresh time across primary shards 0 min
Max cumulative refresh time across primary shards 0 min
Cumulative flush time of primary shards 0 min
Cumulative flush count of primary shards 1
Min cumulative flush time across primary shards 0 min
Median cumulative flush time across primary shards 0 min
Max cumulative flush time across primary shards 0 min
Total Young Gen GC time 0.333 s
Total Young Gen GC count 13
Total Old Gen GC time 0 s
Total Old Gen GC count 0
Store size 23.6241 GB
Translog size 5.12227e-08 GB
Heap used for segments 0 MB
Heap used for doc values 0 MB
Heap used for terms 0 MB
Heap used for norms 0 MB
Heap used for points 0 MB
Heap used for stored fields 0 MB
Segment count 32
Min Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
Mean Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
Median Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
Max Throughput wait-for-snapshot-recovery 4.18848e+07 byte/s
100th percentile latency wait-for-snapshot-recovery 600574 ms
100th percentile service time wait-for-snapshot-recovery 600574 ms
error rate wait-for-snapshot-recovery 0 %
Min Throughput default 3.01 ops/s
Mean Throughput default 3.02 ops/s
Median Throughput default 3.02 ops/s
Max Throughput default 3.04 ops/s
50th percentile latency default 7.2419 ms
90th percentile latency default 8.27506 ms
99th percentile latency default 8.72156 ms
100th percentile latency default 8.79778 ms
50th percentile service time default 6.12717 ms
90th percentile service time default 6.93989 ms
99th percentile service time default 7.81181 ms
100th percentile service time default 7.84552 ms
error rate default 0 %
Min Throughput range 0.7 ops/s
Mean Throughput range 0.7 ops/s
Median Throughput range 0.7 ops/s
Max Throughput range 0.71 ops/s
50th percentile latency range 159.906 ms
90th percentile latency range 163.337 ms
99th percentile latency range 174.24 ms
100th percentile latency range 182.367 ms
50th percentile service time range 156.87 ms
90th percentile service time range 160.573 ms
99th percentile service time range 170.992 ms
100th percentile service time range 178.964 ms
error rate range 0 %
Min Throughput distance_amount_agg 0.07 ops/s
Mean Throughput distance_amount_agg 0.07 ops/s
Median Throughput distance_amount_agg 0.07 ops/s
Max Throughput distance_amount_agg 0.07 ops/s
50th percentile latency distance_amount_agg 1.32056e+06 ms
90th percentile latency distance_amount_agg 1.84544e+06 ms
99th percentile latency distance_amount_agg 1.96348e+06 ms
100th percentile latency distance_amount_agg 1.97006e+06 ms
50th percentile service time distance_amount_agg 13592.9 ms
90th percentile service time distance_amount_agg 13707.9 ms
99th percentile service time distance_amount_agg 13806.9 ms
100th percentile service time distance_amount_agg 13817.7 ms
error rate distance_amount_agg 0 %
Min Throughput autohisto_agg 1.51 ops/s
Mean Throughput autohisto_agg 1.51 ops/s
Median Throughput autohisto_agg 1.51 ops/s
Max Throughput autohisto_agg 1.52 ops/s
50th percentile latency autohisto_agg 13.9108 ms
90th percentile latency autohisto_agg 14.7679 ms
99th percentile latency autohisto_agg 21.0951 ms
100th percentile latency autohisto_agg 23.0766 ms
50th percentile service time autohisto_agg 12.3491 ms
90th percentile service time autohisto_agg 13.0162 ms
99th percentile service time autohisto_agg 19.8559 ms
100th percentile service time autohisto_agg 21.7752 ms
error rate autohisto_agg 0 %
Min Throughput date_histogram_agg 1.51 ops/s
Mean Throughput date_histogram_agg 1.52 ops/s
Median Throughput date_histogram_agg 1.51 ops/s
Max Throughput date_histogram_agg 1.53 ops/s
50th percentile latency date_histogram_agg 13.3376 ms
90th percentile latency date_histogram_agg 13.855 ms
99th percentile latency date_histogram_agg 28.4331 ms
100th percentile latency date_histogram_agg 37.9943 ms
50th percentile service time date_histogram_agg 11.9196 ms
90th percentile service time date_histogram_agg 12.2359 ms
99th percentile service time date_histogram_agg 27.0351 ms
100th percentile service time date_histogram_agg 36.3977 ms
error rate date_histogram_agg 0 %
Min Throughput desc_sort_tip_amount 0.5 ops/s
Mean Throughput desc_sort_tip_amount 0.5 ops/s
Median Throughput desc_sort_tip_amount 0.5 ops/s
Max Throughput desc_sort_tip_amount 0.51 ops/s
50th percentile latency desc_sort_tip_amount 35.4273 ms
90th percentile latency desc_sort_tip_amount 36.0668 ms
99th percentile latency desc_sort_tip_amount 41.6973 ms
100th percentile latency desc_sort_tip_amount 42.6475 ms
50th percentile service time desc_sort_tip_amount 32.7031 ms
90th percentile service time desc_sort_tip_amount 33.1555 ms
99th percentile service time desc_sort_tip_amount 38.9371 ms
100th percentile service time desc_sort_tip_amount 39.7858 ms
error rate desc_sort_tip_amount 0 %
Min Throughput asc_sort_tip_amount 0.5 ops/s
Mean Throughput asc_sort_tip_amount 0.51 ops/s
Median Throughput asc_sort_tip_amount 0.5 ops/s
Max Throughput asc_sort_tip_amount 0.51 ops/s
50th percentile latency asc_sort_tip_amount 8.78618 ms
90th percentile latency asc_sort_tip_amount 9.31272 ms
99th percentile latency asc_sort_tip_amount 10.2617 ms
100th percentile latency asc_sort_tip_amount 10.2739 ms
50th percentile service time asc_sort_tip_amount 5.91895 ms
90th percentile service time asc_sort_tip_amount 6.25749 ms
99th percentile service time asc_sort_tip_amount 7.16934 ms
100th percentile service time asc_sort_tip_amount 7.17124 ms
error rate asc_sort_tip_amount 0 %

@opensearch-ci-bot
Copy link
Collaborator

Benchmark Baseline Comparison Results

Benchmark Results for Job: https://build.ci.opensearch.org/job/benchmark-compare/25/

Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 0 0 0 min
Min cumulative indexing time across primary shard 0 0 0 min
Median cumulative indexing time across primary shard 0 0 0 min
Max cumulative indexing time across primary shard 0 0 0 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 0 0 0 min
Cumulative merge count of primary shards 0 0 0
Min cumulative merge time across primary shard 0 0 0 min
Median cumulative merge time across primary shard 0 0 0 min
Max cumulative merge time across primary shard 0 0 0 min
Cumulative merge throttle time of primary shards 0 0 0 min
Min cumulative merge throttle time across primary shard 0 0 0 min
Median cumulative merge throttle time across primary shard 0 0 0 min
Max cumulative merge throttle time across primary shard 0 0 0 min
Cumulative refresh time of primary shards 0 0 0 min
Cumulative refresh count of primary shards 2 2 0
Min cumulative refresh time across primary shard 0 0 0 min
Median cumulative refresh time across primary shard 0 0 0 min
Max cumulative refresh time across primary shard 0 0 0 min
Cumulative flush time of primary shards 0 0 0 min
Cumulative flush count of primary shards 1 1 0
Min cumulative flush time across primary shard 0 0 0 min
Median cumulative flush time across primary shard 0 0 0 min
Max cumulative flush time across primary shard 0 0 0 min
Total Young Gen GC time 0.325 0.333 0.008 s
Total Young Gen GC count 13 13 0
Total Old Gen GC time 0 0 0 s
Total Old Gen GC count 0 0 0
Store size 23.6241 23.6241 0 GB
Translog size 5.12227e-08 5.12227e-08 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 32 32 0
Min Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
Mean Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
Median Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
Max Throughput wait-for-snapshot-recovery 4.18863e+07 4.18848e+07 -1520 byte/s
100th percentile latency wait-for-snapshot-recovery 600592 600574 -18.3125 ms
100th percentile service time wait-for-snapshot-recovery 600592 600574 -18.3125 ms
error rate wait-for-snapshot-recovery 0 0 0 %
Min Throughput default 3.01355 3.01317 -0.00038 ops/s
Mean Throughput default 3.02202 3.02142 -0.0006 ops/s
Median Throughput default 3.02008 3.0195 -0.00058 ops/s
Max Throughput default 3.03885 3.03768 -0.00117 ops/s
50th percentile latency default 6.30176 7.2419 0.94013 ms
90th percentile latency default 7.23365 8.27506 1.04142 ms
99th percentile latency default 8.1162 8.72156 0.60537 ms
100th percentile latency default 8.53889 8.79778 0.25889 ms
50th percentile service time default 5.29412 6.12717 0.83305 ms
90th percentile service time default 6.00782 6.93989 0.93207 ms
99th percentile service time default 6.91444 7.81181 0.89737 ms
100th percentile service time default 7.18129 7.84552 0.66423 ms
error rate default 0 0 0 %
Min Throughput range 0.702121 0.702159 4e-05 ops/s
Mean Throughput range 0.703487 0.703551 6e-05 ops/s
Median Throughput range 0.703176 0.703235 6e-05 ops/s
Max Throughput range 0.706283 0.706399 0.00012 ops/s
50th percentile latency range 159.567 159.906 0.33825 ms
90th percentile latency range 163.17 163.337 0.16733 ms
99th percentile latency range 168.481 174.24 5.75862 ms
100th percentile latency range 169.503 182.367 12.8635 ms
50th percentile service time range 157.297 156.87 -0.42642 ms
90th percentile service time range 160.425 160.573 0.14827 ms
99th percentile service time range 165.635 170.992 5.3576 ms
100th percentile service time range 166.422 178.964 12.5422 ms
error rate range 0 0 0 %
Min Throughput distance_amount_agg 0.0745231 0.0733222 -0.0012 ops/s
Mean Throughput distance_amount_agg 0.0750837 0.0733966 -0.00169 ops/s
Median Throughput distance_amount_agg 0.0749722 0.0734084 -0.00156 ops/s
Max Throughput distance_amount_agg 0.0757835 0.0734192 -0.00236 ops/s
50th percentile latency distance_amount_agg 1.29251e+06 1.32056e+06 28054.8 ms
90th percentile latency distance_amount_agg 1.81601e+06 1.84544e+06 29427.4 ms
99th percentile latency distance_amount_agg 1.93337e+06 1.96348e+06 30107.9 ms
100th percentile latency distance_amount_agg 1.9399e+06 1.97006e+06 30158.4 ms
50th percentile service time distance_amount_agg 13599.7 13592.9 -6.83984 ms
90th percentile service time distance_amount_agg 13728.3 13707.9 -20.3667 ms
99th percentile service time distance_amount_agg 13891.7 13806.9 -84.8272 ms
100th percentile service time distance_amount_agg 13921.2 13817.7 -103.477 ms
error rate distance_amount_agg 0 0 0 %
Min Throughput autohisto_agg 1.50711 1.50695 -0.00016 ops/s
Mean Throughput autohisto_agg 1.51174 1.51147 -0.00027 ops/s
Median Throughput autohisto_agg 1.5107 1.51044 -0.00026 ops/s
Max Throughput autohisto_agg 1.52112 1.52062 -0.00049 ops/s
50th percentile latency autohisto_agg 13.8174 13.9108 0.09342 ms
90th percentile latency autohisto_agg 14.4257 14.7679 0.34225 ms
99th percentile latency autohisto_agg 20.6309 21.0951 0.4642 ms
100th percentile latency autohisto_agg 20.8432 23.0766 2.2334 ms
50th percentile service time autohisto_agg 12.2654 12.3491 0.08371 ms
90th percentile service time autohisto_agg 12.9628 13.0162 0.05343 ms
99th percentile service time autohisto_agg 19.0487 19.8559 0.8072 ms
100th percentile service time autohisto_agg 19.0793 21.7752 2.6959 ms
error rate autohisto_agg 0 0 0 %
Min Throughput date_histogram_agg 1.5097 1.50961 -8e-05 ops/s
Mean Throughput date_histogram_agg 1.51602 1.51589 -0.00013 ops/s
Median Throughput date_histogram_agg 1.51459 1.51446 -0.00013 ops/s
Max Throughput date_histogram_agg 1.52884 1.52864 -0.0002 ops/s
50th percentile latency date_histogram_agg 13.7754 13.3376 -0.43773 ms
90th percentile latency date_histogram_agg 14.3006 13.855 -0.44553 ms
99th percentile latency date_histogram_agg 15.9162 28.4331 12.5168 ms
100th percentile latency date_histogram_agg 16.4133 37.9943 21.5809 ms
50th percentile service time date_histogram_agg 12.3155 11.9196 -0.39588 ms
90th percentile service time date_histogram_agg 12.6337 12.2359 -0.39774 ms
99th percentile service time date_histogram_agg 14.3141 27.0351 12.721 ms
100th percentile service time date_histogram_agg 15.1024 36.3977 21.2953 ms
error rate date_histogram_agg 0 0 0 %
Min Throughput desc_sort_tip_amount 0.502272 0.502143 -0.00013 ops/s
Mean Throughput desc_sort_tip_amount 0.503736 0.503524 -0.00021 ops/s
Median Throughput desc_sort_tip_amount 0.503399 0.503206 -0.00019 ops/s
Max Throughput desc_sort_tip_amount 0.506743 0.50636 -0.00038 ops/s
50th percentile latency desc_sort_tip_amount 35.634 35.4273 -0.20672 ms
90th percentile latency desc_sort_tip_amount 36.4391 36.0668 -0.37232 ms
99th percentile latency desc_sort_tip_amount 42.9616 41.6973 -1.26432 ms
100th percentile latency desc_sort_tip_amount 45.5025 42.6475 -2.85496 ms
50th percentile service time desc_sort_tip_amount 32.9943 32.7031 -0.29126 ms
90th percentile service time desc_sort_tip_amount 33.5623 33.1555 -0.40684 ms
99th percentile service time desc_sort_tip_amount 40.3316 38.9371 -1.39457 ms
100th percentile service time desc_sort_tip_amount 42.2373 39.7858 -2.45145 ms
error rate desc_sort_tip_amount 0 0 0 %
Min Throughput asc_sort_tip_amount 0.503111 0.503087 -2e-05 ops/s
Mean Throughput asc_sort_tip_amount 0.505123 0.50508 -4e-05 ops/s
Median Throughput asc_sort_tip_amount 0.504659 0.50462 -4e-05 ops/s
Max Throughput asc_sort_tip_amount 0.509265 0.509189 -8e-05 ops/s
50th percentile latency asc_sort_tip_amount 8.64147 8.78618 0.14471 ms
90th percentile latency asc_sort_tip_amount 9.20375 9.31272 0.10897 ms
99th percentile latency asc_sort_tip_amount 21.0795 10.2617 -10.8178 ms
100th percentile latency asc_sort_tip_amount 32.0886 10.2739 -21.8147 ms
50th percentile service time asc_sort_tip_amount 5.80252 5.91895 0.11642 ms
90th percentile service time asc_sort_tip_amount 6.20851 6.25749 0.04897 ms
99th percentile service time asc_sort_tip_amount 18.1958 7.16934 -11.0265 ms
100th percentile service time asc_sort_tip_amount 29.2886 7.17124 -22.1174 ms
error rate asc_sort_tip_amount 0 0 0 %

@kumargu
Copy link
Contributor

kumargu commented Dec 16, 2024

Listing down my meeting notes with @asonje

Tenets

  1. Performance: There should not be more than single digit millisecond impact to search queries (let's keep it as goal and iterate backwards from there). Similarly, there should not be more then 5% impact to indexing throughout
  2. There should be minimal impact to mem, disk, cpu usage.
  3. When it comes to security, data at rest at all known rest storage should be encrypted
  4. All existing features must continue to work with this feature.
  5. Key (master key) rotation may not be a requirement right now; but the design decisions must be taken in view key rotation feature can be supported ideally without the need of reindexing data (take motivation from the Solr implementation)

High level items pending in this PR

  1. Bring in optimizations to reduce search latencies (which, right now are above expected baselines)
  2. Design ideas on how master key rotation via a customer action would not need a reindexing.
  3. Check if crypto-fs can be deleted if customer has lost their master key. (this is a important security requirement). However there should be safeguards in place to identify if the caller can deleted data for the lost key.
  4. Snapshots in remote store (S3, Azure blob etc) for the encrypted index must also be encrypted with the same CMK with which the index was encrypted at Opensearch rest. This is a must to have requirement. In lack of this, we loose the definition of encryption at rest the remote store rest; as any user can now read all the data from snapshot while they are not supposed to.
  5. Ability to recover from snapshots (should work by default)
  6. When using Remote backed storage, the indices to be encrypted must be also encrypted at the remote storage (rest)
  7. Support for FIPS (asonje@ confirmed its already supported).
  8. Backoff when KMS is down

Todo

(@kumargu)
Write a doc describing how fscrypt can be useful here also listing down its limitations. On a high level, i think fscrypt can meet perf requirement, and give us ability to delete crypto fs in lack of a key (lost).

@kumargu
Copy link
Contributor

kumargu commented Jan 6, 2025

@asonje checking if you have bandwidth to resume working on this. thanks.

@asonje
Copy link
Author

asonje commented Jan 7, 2025

@kumargu yes i do. I am currently working on some optimizations and will update you once its ready

}
}

private void storeWrappedKey(byte[] wrappedKey) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this thread-safe?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called from within the constructor; the directory is not visible to other threads

* @param dest the new file name
*/
@Override
public void rename(String source, String dest) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this atomic? what happens when a node crashes while a rename is in progress?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not atomic. if a crash happens during a rename operation, the ivMap will become inconsistent and would need to be fixed up on the next open. I will look into this

try {
Cipher cipher = CipherFactory.getCipher(provider);
String ivEntry = ivMap.get(getDirectory() + "/" + name);
if (ivEntry == null) throw new IOException("failed to open file. " + name);
Copy link
Contributor

@kumargu kumargu Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be an IOException. This could happen if the ivMap has gone corrupted or there's a bug in populating the ivMap. I would wrap it in a RuntimeException

success = true;
return indexInput;
} finally {
if (success == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid the usage of the success flag by using try-with-resources (as below). I am scared that a concurrent thread may change the state of the success flag.

@Override
public IndexInput openInput(String name, IOContext context) throws IOException {
    if (name.contains("segments_") || name.endsWith(".si")) {
        return super.openInput(name, context);
    }
    
    ensureOpen();
    ensureCanRead(name);
    Path path = getDirectory().resolve(name);
    
    try (FileChannel fc = FileChannel.open(path, StandardOpenOption.READ)) {
        Cipher cipher = CipherFactory.getCipher(provider);
        String ivEntry = ivMap.get(getDirectory() + "/" + name);
        if (ivEntry == null) {
            throw new IOException("failed to open file. " + name);
        }
        byte[] iv = Base64.getDecoder().decode(ivEntry);
        CipherFactory.initCipher(cipher, this, Optional.of(iv), Cipher.DECRYPT_MODE, 0);
        
        return new CryptoBufferedIndexInput("CryptoBufferedIndexInput(path=\"" + path + "\")", fc, context, cipher, this);
    }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

success is a local variable and is not shared among threads. the try-with-resource pattern however should be ok

} catch (java.nio.file.NoSuchFileException fnfe) {

}
IndexOutput out = super.createOutput("ivMap", new IOContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds weird to me that we are doing a create Output within a close.

try {
deleteFile("ivMap");
} catch (java.nio.file.NoSuchFileException fnfe) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to handle the error here.

out.writeMapOfStrings(ivMap);
out.close();
isOpen = false;
deletePendingFiles();
Copy link
Contributor

@kumargu kumargu Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deletePendingFiles is also synchronised.. can we get into a deadlock?

we need to use ReentrantLock lock or the lock from the lockFactory

random.nextBytes(iv);
if (dataKey == null) throw new RuntimeException("dataKey is null!");
CipherFactory.initCipher(cipher, this, Optional.of(iv), Cipher.ENCRYPT_MODE, 0);
ivMap.put(getDirectory() + "/" + name, Base64.getEncoder().encodeToString(iv));
Copy link
Contributor

@kumargu kumargu Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make this operation thread-safe while reducing the scope of the lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we immediately persist ivMap to minimize the risk of loosing it.

ivMap.put(getDirectory() + "/" + name, Base64.getEncoder().encodeToString(iv));

IndexOutput out = super.createOutput("ivMap", new IOContext());
        out.writeMapOfStrings(ivMap);
        out.close();

public final class CryptoDirectory extends NIOFSDirectory {
private Path location;
private Key dataKey;
private ConcurrentSkipListMap<String, String> ivMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this final?

Comment on lines +99 to +114
try {
in = super.openInput("ivMap", new IOContext());
} catch (java.nio.file.NoSuchFileException nsfe) {
in = null;
}
if (in != null) {
Map<String, String> tmp = in.readMapOfStrings();
ivMap.putAll(tmp);
in.close();
dataKey = new SecretKeySpec(keyProvider.decryptKey(getWrappedKey()), "AES");
} else {
DataKeyPair dataKeyPair = keyProvider.generateDataPair();
dataKey = new SecretKeySpec(dataKeyPair.getRawKey(), "AES");
storeWrappedKey(dataKeyPair.getEncryptedKey());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this logic to an init for clearer code?

    public static CryptoDirectory create(LockFactory lockFactory, Path location, Provider provider, MasterKeyProvider keyProvider) throws IOException {
        CryptoDirectory directory = new CryptoDirectory(lockFactory, location, provider);
        directory.init(keyProvider);
        return directory;
    }

    private CryptoDirectory(LockFactory lockFactory, Path location, Provider provider) throws IOException {
        super(location, lockFactory);
        this.location = location;
        this.provider = provider;
        this.ivMap = new ConcurrentSkipListMap<>();
    }

/**
* Creates a new CryptoDirectoryFactory
*/
public CryptoDirectoryFactory() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we invoke the factory?

@kumargu
Copy link
Contributor

kumargu commented Jan 17, 2025

@asonje we'll need to figure out how we can run the plugin against the performance benchmarking tool.

Similarly, do you know if there's a way to run the Opensearch core integration tests with this plugin enabled? That would give us a lot of coverage on correctness.

@cwperks to share any insights on above.

}

private byte[] getWrappedKey() {
try (IndexInput in = super.openInput("keyfile", new IOContext())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexInput is not thread-safe, can we have a contention with write?

*/
@Override
public IndexInput openInput(String name, IOContext context) throws IOException {
if (name.contains("segments_") || name.endsWith(".si")) return super.openInput(name, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this special handling about? could we please add a comment on the intention?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means files with this extension are not encrypted. The server reads them on startup

in = null;
}
if (in != null) {
Map<String, String> tmp = in.readMapOfStrings();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please rename tmp to maybe loadedIVMap?

* @opensearch.internal
*/
public final class CryptoDirectory extends NIOFSDirectory {
private Path location;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised, this has zero usage.

boolean success = false;
try {
Cipher cipher = CipherFactory.getCipher(provider);
String ivEntry = ivMap.get(getDirectory() + "/" + name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering ivMap.get(location) would have same results?

@asonje
Copy link
Author

asonje commented Jan 17, 2025

@asonje we'll need to figure out how we can run the plugin against the performance benchmarking tool.

Similarly, do you know if there's a way to run the Opensearch core integration tests with this plugin enabled? That would give us a lot of coverage on correctness.

@cwperks to share any insights on above.

#12472 is still a blocker for this feature
feel free to suggest more integration tests

@cwperks
Copy link
Member

cwperks commented Jan 17, 2025

@asonje we'll need to figure out how we can run the plugin against the performance benchmarking tool.

Similarly, do you know if there's a way to run the Opensearch core integration tests with this plugin enabled? That would give us a lot of coverage on correctness.

@cwperks to share any insights on above.

Do you just want the plugin to be installed? I don't think that would tell much if the plugin is installed, but not configured.

Typically for plugin repos they would write their own integration tests to test out core functionality of the plugin.

For example with security, their are integ tests that test different types of security configurations whether is be different SSL configurations or more general security settings like testing with different roles mappings that contains permutations on FLS/DLS/FieldMasking.

I can route you to some examples of how to setup Github Actions to run integ tests w/ a plugin installed.

@cwperks
Copy link
Member

cwperks commented Jan 17, 2025

This is a more involved example (which includes setup w/ security), but this is how ISM creates a testcluster to run integ tests against: https://github.com/opensearch-project/index-management/blob/main/build.gradle#L347-L413

When the integTests task runs it will run tests against that testcluster.

In addition to the setup in build.gradle there would also be a Github workflow that directly (or indirectly) triggers the ./gradlew integTest task to run. For example, it would be called in this step in ISM: https://github.com/opensearch-project/index-management/blob/main/.github/workflows/test-and-build-workflow.yml#L56

@cwperks
Copy link
Member

cwperks commented Jan 17, 2025

One thing to consider when introducing a new plugin is whether to build it directly in the core repo (native plugin) vs creating a separate repository.

If a plugin is introduced in a separate repository it can still be included in the default distribution of OpenSearch.

Some advantages of a separate repo is having a separate set of maintainers that are specialized in that plugin. The core is a big repo and each maintainer may be specialized in a different area of the core.

Another advantage of a separate repo is being able to setup GH actions specific to that plugin's tests.

That being said, the drawbacks of that approach may be that there could be a bit of scaffolding to get setup. Any plugin included in the default distribution would need to have an active set of maintainers.

@kumargu
Copy link
Contributor

kumargu commented Jan 17, 2025

One thing to consider when introducing a new plugin is whether to build it directly in the core repo (native plugin) vs creating a separate repository.

I think we will separate it into a different repo (it will be easier to maintain and develop). I am fine for now to get it shipped in this state and open a different to move it to a different repo.

Comment on lines +229 to +231
try {
deleteFile("ivMap");
} catch (java.nio.file.NoSuchFileException fnfe) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I don't understand the intention here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on close the ivMap is written to disk. This overwrites the existing file with the live version. perhaps the ivMap can be synced more frequently

Copy link
Contributor

@kumargu kumargu Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the ivMap can be synced more frequently.

Yeah. we cannot afford loosing the ivMap. We can probably write ivMap to disk, read back, cache and use for enc/dec. That way we ensure durability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: we can use mmap to directly write ivMap contents to disk for better durability.

*
* @opensearch.internal
*/
final class CryptoBufferedIndexInput extends BufferedIndexInput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please move it to a different class of its own?

res = cipher.update(b, offset, chunk);
if (res != null) out.write(res);
} catch (IllegalStateException e) {
throw new IllegalStateException("count is " + count + " " + e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think count variable adds much value.

byte[] res;
while (length > 0) {
count++;
final int chunk = Math.min(length, CHUNK_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffering may be ineffective for small writes. Maybe we can maintain an enum of variable CHUNK sizes and buffer smaller writes.

 private enum ChunkSize {
        LARGE(8192),
        MEDIUM(4096),
        SMALL(1024);

        private final int size;

        ChunkSize(int size) {
            this.size = size;
        }

        public int getSize() {
            return size;
        }

        public static ChunkSize bestPossibleBufferLength(int length) {
            if (length >= LARGE.size) return LARGE;
            if (length >= MEDIUM.size) return MEDIUM;
            return SMALL;
        }
    }
 ChunkSize chunkSize = ChunkSize.bestPossibleBufferLength(length);
 int bytesToWrite = Math.min(chunkSize.getSize(), length);
          

final class CryptoBufferedIndexInput extends BufferedIndexInput {
/** The maximum chunk size for reads of 16384 bytes. */
private static final int CHUNK_SIZE = 16384;
ByteBuffer tmpBuffer = ByteBuffer.allocate(CHUNK_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can use directBuffers here and that would result in sequential IO.
https://stackoverflow.com/questions/5670862/bytebuffer-allocate-vs-bytebuffer-allocatedirect

this.directBuffer = ByteBuffer.allocateDirect(CHUNK_SIZE);

(note to self: should we make the CHUNK_SIZE configurable?)

}

@SuppressForbidden(reason = "FileChannel#read is a faster alternative to synchronized block")
private int read(ByteBuffer dst, long position) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with directBuffers, read would look slightly different.


    private int read(ByteBuffer dst, long position) throws IOException {
        directBuffer.clear().limit(dst.remaining());
        int bytesRead = channel.read(directBuffer, position);
        if (bytesRead > 0) {
            directBuffer.flip();
            try {
                int decrypted = cipher.update(directBuffer, dst);
                
             // handle logic, when we have reached the end-of-file.
              if (position + bytesRead == end) {
                   ...
                }
                return decrypted;
            }  catch (..) {

}
        }
        return bytesRead;
    }

super.rename(source, dest);
if (!(source.contains("segments_") || source.endsWith(".si"))) ivMap.put(
getDirectory() + "/" + dest,
ivMap.remove(getDirectory() + "/" + source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a null check on ivMap.remove()

* @param name the name of the file to be deleted
*/
@Override
public void deleteFile(String name) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also cover files deleted during segment merges?

*/
@Override
public IndexOutput createOutput(String name, IOContext context) throws IOException {
if (name.contains("segments_") || name.endsWith(".si")) return super.createOutput(name, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are also encrypting ivmap ? i think we should be skip encrypting ivMap.
we will need a integration test to test ivMap is been loaded correctly post a restart.

* @param path the shard file path
*/
@Override
public Directory newDirectory(IndexSettings indexSettings, ShardPath path) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess, no special handling is required on index deletion?

@cwperks
Copy link
Member

cwperks commented Jan 21, 2025

One thing to consider when introducing a new plugin is whether to build it directly in the core repo (native plugin) vs creating a separate repository.

I think we will separate it into a different repo (it will be easier to maintain and develop). I am fine for now to get it shipped in this state and open a different to move it to a different repo.

Another thing to consider is adding this feature to the security plugin: https://github.com/opensearch-project/security

You would get all of the existing test infrastructure that exists in the security repo and I think it would be a natural feature of the plugin.

That being said, you may want this to be a separate plugin if the index-level encryption should be available to clusters regardless if using the full features of the security plugin or not.

Just because the security plugin is installed doesn't mean a cluster needs to be full FGAC. You can disable security and have the plugin installed, or you can run in SSL only mode (no FGAC) so we can build this into the security plugin and provide support regardless if a cluster uses all features of the security plugin or not.

The security plugin also has a companion dashboards-plugin for configuration through OpenSearch-Dashboards. If there are plans to support configuring index-level encryption through OpenSearch Dashboards, then that could be built into the companion dashboards plugin.

@kumargu
Copy link
Contributor

kumargu commented Jan 21, 2025

should be available to clusters regardless if using the full features of the security plugin or not.

Yeah, that's my mental model. You can have index-level enc enabled without having security plugin (features) and vice versa.


I am leaning towards bundling this as a different plugin; how hard is it to setup the test infra?

@kumargu
Copy link
Contributor

kumargu commented Jan 21, 2025

Another thing to consider is adding this feature to the security plugin: https://github.com/opensearch-project/security

@cwperks let's discuss this with more folks during security-triage-meeting.

@cwperks
Copy link
Member

cwperks commented Jan 21, 2025

should be available to clusters regardless if using the full features of the security plugin or not.

Yeah, that's my mental model. You can have index-level enc enabled without having security plugin (features) and vice versa.

I am leaning towards bundling this as a different plugin; how hard is it to setup the test infra?

I think you're right. This makes more sense as a standalone plugin. There is a template repo to help get bootstrapped. I will allocate some time after 2.19 code freeze to demonstrate creating a plugin repo and writing a few integ tests w/ different configurations.

CryptoDirectory(LockFactory lockFactory, Path location, Provider provider, MasterKeyProvider keyProvider) throws IOException {
super(location, lockFactory);
this.location = location;
ivMap = new ConcurrentSkipListMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ivMap will auto shrink on segment merge? asking because I was initially concerned about size of ivMap with risk of collisions and lock contention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline: @asonje will run some instrumentation to verify segment merges are happening as expected and ivmap entry for deleted files are removed from on segment merges.

@kumargu
Copy link
Contributor

kumargu commented Jan 21, 2025

@asonje: I started taking a look on snapshots would work for us. Could you please help with a few clarifications?

  1. keyfile which stores the data-key would be also moved to snapshots ? We will need the data-key for decrypting data when an Opensearch index is recovered from snapshot
  2. Similarly, ivMap will be carried over to snapshots?

(edit: discussed offline: Both keyfile and ivMap would be carried over to snapshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making feature New feature or request RFC Issues requesting major changes security Anything security related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On The Fly Encryption Feature Proposal
9 participants