Merge branch 'main' into remote-recovery-it-fix

Signed-off-by: Varun Bansal <bansvaru@amazon.com>
opensearch-project · Sep 1, 2023 · 0ed237f · 0ed237f
2 parents ba41aa9 + ff4b23b
commit 0ed237f
Show file tree

Hide file tree

Showing 237 changed files with 8,527 additions and 1,330 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - Add events correlation engine plugin ([#6854](https://github.com/opensearch-project/OpenSearch/issues/6854))
 - Introduce new dynamic cluster setting to control slice computation for concurrent segment search ([#9107](https://github.com/opensearch-project/OpenSearch/pull/9107))
 - Implement on behalf of token passing for extensions ([#8679](https://github.com/opensearch-project/OpenSearch/pull/8679))
+- Added encryption-sdk lib to provide encryption and decryption capabilities ([#8466](https://github.com/opensearch-project/OpenSearch/pull/8466))
 
 ### Dependencies
 - Bump `log4j-core` from 2.18.0 to 2.19.0
@@ -40,6 +41,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - Bump `org.bouncycastle:bcprov-jdk15on` to `org.bouncycastle:bcprov-jdk15to18` version 1.75 ([#8247](https://github.com/opensearch-project/OpenSearch/pull/8247))
 - Bump `org.bouncycastle:bcmail-jdk15on` to `org.bouncycastle:bcmail-jdk15to18` version 1.75 ([#8247](https://github.com/opensearch-project/OpenSearch/pull/8247))
 - Bump `org.bouncycastle:bcpkix-jdk15on` to `org.bouncycastle:bcpkix-jdk15to18` version 1.75 ([#8247](https://github.com/opensearch-project/OpenSearch/pull/8247))
+- Add Encryption SDK dependencies ([#8466](https://github.com/opensearch-project/OpenSearch/pull/8466))
 
 ### Changed
 - [CCR] Add getHistoryOperationsFromTranslog method to fetch the history snapshot from translogs ([#3948](https://github.com/opensearch-project/OpenSearch/pull/3948))
@@ -89,7 +91,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - [Segment Replication] Support realtime reads for GET requests ([#9212](https://github.com/opensearch-project/OpenSearch/pull/9212))
 - [Feature] Expose term frequency in Painless script score context ([#9081](https://github.com/opensearch-project/OpenSearch/pull/9081))
 - Add support for reading partial files to HDFS repository ([#9513](https://github.com/opensearch-project/OpenSearch/issues/9513))
-- Add support for extensions to search responses using SearchExtBuilder ([#9379](https://github.com/opensearch-project/OpenSearch/pull/9379)) 
+- Add support for extensions to search responses using SearchExtBuilder ([#9379](https://github.com/opensearch-project/OpenSearch/pull/9379))
+- [Remote State] Create service to publish cluster state to remote store ([#9160](https://github.com/opensearch-project/OpenSearch/pull/9160))
+- [BWC and API enforcement] Decorate the existing APIs with proper annotations (part 1) ([#9520](https://github.com/opensearch-project/OpenSearch/pull/9520))
+- Add concurrent segment search related metrics to node and index stats ([#9622](https://github.com/opensearch-project/OpenSearch/issues/9622))
+- Decouple replication lag from logic to fail stale replicas ([#9507](https://github.com/opensearch-project/OpenSearch/pull/9507))
+- Expose DelimitedTermFrequencyTokenFilter to allow providing term frequencies along with terms ([#9479](https://github.com/opensearch-project/OpenSearch/pull/9479))
+- APIs for performing async blob reads and async downloads from the repository using multiple streams ([#9592](https://github.com/opensearch-project/OpenSearch/issues/9592))
+- Introduce cluster default remote translog buffer interval setting ([#9584](https://github.com/opensearch-project/OpenSearch/pull/9584))
 
 ### Dependencies
 - Bump `org.apache.logging.log4j:log4j-core` from 2.17.1 to 2.20.0 ([#8307](https://github.com/opensearch-project/OpenSearch/pull/8307))
@@ -162,7 +171,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - Separate request-based and settings-based concurrent segment search controls and introduce AggregatorFactory method to determine concurrent search support ([#9469](https://github.com/opensearch-project/OpenSearch/pull/9469))
 - [Remote Store] Rate limiter integration for remote store uploads and downloads([#9448](https://github.com/opensearch-project/OpenSearch/pull/9448/))
 - [Remote Store] Implicitly use replication type SEGMENT for remote store clusters ([#9264](https://github.com/opensearch-project/OpenSearch/pull/9264))
+- Redefine telemetry context restoration and propagation ([#9617](https://github.com/opensearch-project/OpenSearch/pull/9617))
 - Use non-concurrent path for sort request on timeseries index and field([#9562](https://github.com/opensearch-project/OpenSearch/pull/9562))
+- Added sampler based on `Blanket Probabilistic Sampling rate` and `Override for on demand` ([#9621](https://github.com/opensearch-project/OpenSearch/issues/9621))
 
 ### Deprecated
 
@@ -177,8 +188,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 - Fix condition to remove index create block ([#9437](https://github.com/opensearch-project/OpenSearch/pull/9437))
 - Add support to clear archived index setting ([#9019](https://github.com/opensearch-project/OpenSearch/pull/9019))
 - [Segment Replication] Fixed bug where replica shard temporarily serves stale data during an engine reset ([#9495](https://github.com/opensearch-project/OpenSearch/pull/9495))
+- [Segment Replication] Fixed bug where bytes behind metric is not accurate ([#9686](https://github.com/opensearch-project/OpenSearch/pull/9686))
 
 ### Security
 
 [Unreleased 3.0]: https://github.com/opensearch-project/OpenSearch/compare/2.x...HEAD
-[Unreleased 2.x]: https://github.com/opensearch-project/OpenSearch/compare/2.10...2.x
+[Unreleased 2.x]: https://github.com/opensearch-project/OpenSearch/compare/2.10...2.x
diff --git a/.../main/java/org/opensearch/benchmark/index/mapper/CustomBinaryDocValuesFieldBenchmark.java b/.../main/java/org/opensearch/benchmark/index/mapper/CustomBinaryDocValuesFieldBenchmark.java
@@ -0,0 +1,81 @@
+/*
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * The OpenSearch Contributors require contributions made to
+ * this file be licensed under the Apache-2.0 license or a
+ * compatible open source license.
+ */
+
+package org.opensearch.benchmark.index.mapper;
+
+import org.apache.lucene.util.BytesRef;
+import org.opensearch.index.mapper.BinaryFieldMapper;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+
+@Warmup(iterations = 1)
+@Measurement(iterations = 1)
+@Fork(1)
+@BenchmarkMode(Mode.Throughput)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@State(Scope.Thread)
+@SuppressWarnings("unused") // invoked by benchmarking framework
+public class CustomBinaryDocValuesFieldBenchmark {
+
+    static final String FIELD_NAME = "dummy";
+    static final String SEED_VALUE = "seed";
+
+    @Benchmark
+    public void add(CustomBinaryDocValuesFieldBenchmark.BenchmarkParameters parameters, Blackhole blackhole) {
+        // Don't use the parameter binary doc values object.
+        // Start with a fresh object every call and add maximum number of entries
+        BinaryFieldMapper.CustomBinaryDocValuesField customBinaryDocValuesField = new BinaryFieldMapper.CustomBinaryDocValuesField(
+            FIELD_NAME,
+            new BytesRef(SEED_VALUE).bytes
+        );
+        for (int i = 0; i < parameters.maximumNumberOfEntries; ++i) {
+            ThreadLocalRandom.current().nextBytes(parameters.bytes);
+            customBinaryDocValuesField.add(parameters.bytes);
+        }
+    }
+
+    @Benchmark
+    public void binaryValue(CustomBinaryDocValuesFieldBenchmark.BenchmarkParameters parameters, Blackhole blackhole) {
+        blackhole.consume(parameters.customBinaryDocValuesField.binaryValue());
+    }
+
+    @State(Scope.Benchmark)
+    public static class BenchmarkParameters {
+        @Param({ "8", "32", "128", "512" })
+        int maximumNumberOfEntries;
+
+        @Param({ "8", "32", "128", "512" })
+        int entrySize;
+
+        BinaryFieldMapper.CustomBinaryDocValuesField customBinaryDocValuesField;
+        byte[] bytes;
+
+        @Setup
+        public void setup() {
+            customBinaryDocValuesField = new BinaryFieldMapper.CustomBinaryDocValuesField(FIELD_NAME, new BytesRef(SEED_VALUE).bytes);
+            bytes = new byte[entrySize];
+            for (int i = 0; i < maximumNumberOfEntries; ++i) {
+                ThreadLocalRandom.current().nextBytes(bytes);
+                customBinaryDocValuesField.add(bytes);
+            }
+        }
+    }
+}
diff --git a/buildSrc/version.properties b/buildSrc/version.properties
@@ -39,7 +39,7 @@ httpcore          = 4.4.16
 httpasyncclient   = 4.1.5
 commonslogging    = 1.2
 commonscodec      = 1.15
-
+commonslang       = 3.13.0
 # plugin dependencies
 aws               = 2.20.55
 reactivestreams   = 1.0.4

diff --git a/libs/common/src/main/java/org/opensearch/common/CheckedConsumer.java b/libs/common/src/main/java/org/opensearch/common/CheckedConsumer.java
@@ -32,13 +32,16 @@
 
 package org.opensearch.common;
 
+import org.opensearch.common.annotation.PublicApi;
+
 import java.util.function.Consumer;
 
 /**
  * A {@link Consumer}-like interface which allows throwing checked exceptions.
  *
  * @opensearch.api
  */
+@PublicApi(since = "1.0.0")
 @FunctionalInterface
 public interface CheckedConsumer<T, E extends Exception> {
     void accept(T t) throws E;

diff --git a/libs/common/src/main/java/org/opensearch/common/action/ActionFuture.java b/libs/common/src/main/java/org/opensearch/common/action/ActionFuture.java
@@ -32,6 +32,7 @@
 
 package org.opensearch.common.action;
 
+import org.opensearch.common.annotation.PublicApi;
 import org.opensearch.common.unit.TimeValue;
 
 import java.util.concurrent.Future;
@@ -42,6 +43,7 @@
  *
  * @opensearch.api
  */
+@PublicApi(since = "1.0.0")
 public interface ActionFuture<T> extends Future<T> {
 
     /**

diff --git a/libs/common/src/main/java/org/opensearch/common/crypto/CryptoHandler.java b/libs/common/src/main/java/org/opensearch/common/crypto/CryptoHandler.java
@@ -0,0 +1,114 @@
+/*
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * The OpenSearch Contributors require contributions made to
+ * this file be licensed under the Apache-2.0 license or a
+ * compatible open source license.
+ */
+
+package org.opensearch.common.crypto;
+
+import org.opensearch.common.io.InputStreamContainer;
+
+import java.io.IOException;
+import java.io.InputStream;
+
+/**
+ * Crypto provider abstractions for encryption and decryption of data. Allows registering multiple providers
+ * for defining different ways of encrypting or decrypting data.
+ *
+ * T - Encryption Metadata / CryptoContext
+ * U - Parsed Encryption Metadata / CryptoContext
+ */
+public interface CryptoHandler<T, U> {
+
+    /**
+     * To initialise or create a new crypto metadata to be used in encryption. This is needed to set the context before
+     * beginning encryption.
+     *
+     * @return crypto metadata instance
+     */
+    T initEncryptionMetadata();
+
+    /**
+     * To load crypto metadata to be used in encryption from content header.
+     * Note that underlying information in the loaded metadata object is same as present in the object created during
+     * encryption but object type may differ.
+     *
+     * @return crypto metadata instance used in decryption.
+     */
+    U loadEncryptionMetadata(EncryptedHeaderContentSupplier encryptedHeaderContentSupplier) throws IOException;
+
+    /**
+     * Few encryption algorithms have certain conditions on the unit of content to be encrypted. This requires the
+     * content size to be re adjusted in order to fulfil these conditions for partial writes. If write requests for
+     * encryption of a part of content do not fulfil these conditions then encryption fails or can result in corrupted
+     * content depending on the algorithm used. This method exposes a means to re-adjust sizes of such writes.
+     *
+     * @param cryptoContext crypto metadata instance
+     * @param contentSize Size of the raw content
+     * @return Adjusted size of the content.
+     */
+    long adjustContentSizeForPartialEncryption(T cryptoContext, long contentSize);
+
+    /**
+     * Estimate length of the encrypted content. It should only be used to determine length of entire content after
+     * encryption.
+     *
+     * @param cryptoContext crypto metadata instance consisting of encryption metadata used in encryption.
+     * @param contentLength Size of the raw content
+     * @return Calculated size of the encrypted content.
+     */
+    long estimateEncryptedLengthOfEntireContent(T cryptoContext, long contentLength);
+
+    /**
+     * For given encrypted content length, estimate the length of the decrypted content.
+     * @param cryptoContext crypto metadata instance consisting of encryption metadata used in encryption.
+     * @param contentLength Size of the encrypted content
+     * @return Calculated size of the decrypted content.
+     */
+    long estimateDecryptedLength(U cryptoContext, long contentLength);
+
+    /**
+     * Wraps a raw InputStream with encrypting stream
+     *
+     * @param encryptionMetadata created earlier to set the crypto metadata.
+     * @param stream Raw InputStream to encrypt
+     * @return encrypting stream wrapped around raw InputStream.
+     */
+    InputStreamContainer createEncryptingStream(T encryptionMetadata, InputStreamContainer stream);
+
+    /**
+     * Provides encrypted stream for a raw stream emitted for a part of content.
+     *
+     * @param cryptoContext crypto metadata instance.
+     * @param stream raw stream for which encrypted stream has to be created.
+     * @param totalStreams Number of streams being used for the entire content.
+     * @param streamIdx Index of the current stream.
+     * @return Encrypted stream for the provided raw stream.
+     */
+    InputStreamContainer createEncryptingStreamOfPart(T cryptoContext, InputStreamContainer stream, int totalStreams, int streamIdx);
+
+    /**
+     * This method accepts an encrypted stream and provides a decrypting wrapper.
+     * @param encryptingStream to be decrypted.
+     * @return Decrypting wrapper stream
+     */
+    InputStream createDecryptingStream(InputStream encryptingStream);
+
+    /**
+     * This method creates a {@link DecryptedRangedStreamProvider} which provides a wrapped stream to decrypt the
+     * underlying stream. This also provides adjusted range against the actual range which should be used for fetching
+     * and supplying the encrypted content for decryption. Extra content outside the range is trimmed down and returned
+     * by the decrypted stream.
+     * For partial reads of encrypted content, few algorithms require the range of content to be adjusted for
+     * successful decryption. Adjusted range may or may not be same as the provided range. If range is adjusted then
+     * starting offset of resultant range can be lesser than the starting offset of provided range and end
+     * offset can be greater than the ending offset of the provided range.
+     *
+     * @param cryptoContext crypto metadata instance.
+     * @param startPosOfRawContent starting position in the raw/decrypted content
+     * @param endPosOfRawContent ending position in the raw/decrypted content
+     */
+    DecryptedRangedStreamProvider createDecryptingStreamOfRange(U cryptoContext, long startPosOfRawContent, long endPosOfRawContent);
+}
diff --git a/libs/common/src/main/java/org/opensearch/common/crypto/DataKeyPair.java b/libs/common/src/main/java/org/opensearch/common/crypto/DataKeyPair.java
@@ -0,0 +1,42 @@
+/*
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * The OpenSearch Contributors require contributions made to
+ * this file be licensed under the Apache-2.0 license or a
+ * compatible open source license.
+ */
+package org.opensearch.common.crypto;
+
+/**
+ * Key pair generated by {@link MasterKeyProvider}
+ */
+public class DataKeyPair {
+    private final byte[] rawKey;
+    private final byte[] encryptedKey;
+
+    /**
+     * Constructor to initialize key-pair values
+     * @param rawKey Unencrypted data key used for encryption and decryption
+     * @param encryptedKey Encrypted version of rawKey
+     */
+    public DataKeyPair(byte[] rawKey, byte[] encryptedKey) {
+        this.rawKey = rawKey;
+        this.encryptedKey = encryptedKey;
+    }
+
+    /**
+     * Returns raw key
+     * @return raw/decrypted key
+     */
+    public byte[] getRawKey() {
+        return rawKey;
+    }
+
+    /**
+     * Returns encrypted key
+     * @return encrypted key
+     */
+    public byte[] getEncryptedKey() {
+        return encryptedKey;
+    }
+}
diff --git a/libs/common/src/main/java/org/opensearch/common/crypto/DecryptedRangedStreamProvider.java b/libs/common/src/main/java/org/opensearch/common/crypto/DecryptedRangedStreamProvider.java
@@ -0,0 +1,49 @@
+/*
+ * SPDX-License-Identifier: Apache-2.0
+ *
+ * The OpenSearch Contributors require contributions made to
+ * this file be licensed under the Apache-2.0 license or a
+ * compatible open source license.
+ */
+
+package org.opensearch.common.crypto;
+
+import java.io.InputStream;
+import java.util.function.UnaryOperator;
+
+/**
+ * Contains adjusted range of partial encrypted content which needs to be used for decryption.
+ */
+public class DecryptedRangedStreamProvider {
+
+    private final long[] adjustedRange;
+    private final UnaryOperator<InputStream> decryptedStreamProvider;
+
+    /**
+     * To construct adjusted encrypted range.
+     * @param adjustedRange range of partial encrypted content which needs to be used for decryption.
+     * @param decryptedStreamProvider stream provider for decryption and range re-adjustment.
+     */
+    public DecryptedRangedStreamProvider(long[] adjustedRange, UnaryOperator<InputStream> decryptedStreamProvider) {
+        this.adjustedRange = adjustedRange;
+        this.decryptedStreamProvider = decryptedStreamProvider;
+    }
+
+    /**
+     * Adjusted range of partial encrypted content which needs to be used for decryption.
+     * @return adjusted range
+     */
+    public long[] getAdjustedRange() {
+        return adjustedRange;
+    }
+
+    /**
+     * A utility stream provider which supplies the stream responsible for decrypting the content and reading the
+     * desired range of decrypted content by skipping extra content which got decrypted as a result of range adjustment.
+     * @return stream provider for decryption and supplying the desired range of content.
+     */
+    public UnaryOperator<InputStream> getDecryptedStreamProvider() {
+        return decryptedStreamProvider;
+    }
+
+}