Skip to content

Commit

Permalink
Change remote state setting conditions (#16)
Browse files Browse the repository at this point in the history
* Optimize global ordinal includes/excludes for prefix matching (opensearch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <froh@amazon.com>

* Add unit test

Signed-off-by: Michael Froh <froh@amazon.com>

* Add changelog entry

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Signed-off-by: Michael Froh <froh@amazon.com>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <froh@amazon.com>

* Address comments from @mch2

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>

* Adding access to noSubMatches and noOverlappingMatches in Hyphenation… (opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <evankielley@gmail.com>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: linting

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

---------

Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>

* Add Settings related to Workload Management feature (opensearch-project#15028)

* add QeryGroup Service tests
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add PR to changelog
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* change the test directory
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* modify comments to be more specific
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add test coverage
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* remove QUERY_GROUP_RUN_INTERVAL_SETTING as we'll define it in QueryGroupService
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* Update affiliation for @nknize. (opensearch-project#15322)

Signed-off-by: dblock <dblock@amazon.com>

* Add log when download completes with file size (opensearch-project#15224)

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* Support Filtering on Large List encoded by Bitmap (version update) (opensearch-project#15352)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Add support for index level slice count setting (opensearch-project#15336)

Signed-off-by: Ganesh Ramadurai <gramadur@amazon.com>

* Adding allowlist setting for ingest-useragent and ingest-geoip processors (opensearch-project#15325)

* Adding allowlist setting for user-agent, geo-ip and updated tests for ingest-common.

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Remove duplicate test in ingest-common

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Adding changelog

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

---------

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Add Delete QueryGroup API Logic (opensearch-project#14735)

* Add Delete QueryGroup API Logic
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* modify changelog
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* include comments from create pr
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* remove delete all
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* rebase and address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* rebase
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add UT coverage
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* [Star Tree] Lucene Abstractions for Star Tree File Formats  (opensearch-project#15278)

---------
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

* [Star tree] Changes to handle derived metrics such as avg as part of star tree mapping (opensearch-project#15152)

---------
Signed-off-by: Bharathwaj G <bharath78910@gmail.com>

* relaxing the join validation for nodes which have only store disabled but only publication enabled

* relaxing the join validation for nodes which have only store disabled but only publication enabled

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Signed-off-by: dblock <dblock@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Ganesh Ramadurai <gramadur@amazon.com>
Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>
Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Co-authored-by: Michael Froh <froh@amazon.com>
Co-authored-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>
Co-authored-by: Ruirui Zhang <mariazrr@amazon.com>
Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
Co-authored-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Ganesh Krishna Ramadurai <gramadur@icloud.com>
Co-authored-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>
Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com>
Co-authored-by: Bharathwaj G <bharath78910@gmail.com>
Co-authored-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
  • Loading branch information
13 people committed Sep 2, 2024
1 parent 08e2b50 commit 214f929
Show file tree
Hide file tree
Showing 73 changed files with 4,237 additions and 121 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Fix for hasInitiatedFetching to fix allocation explain and manual reroute APIs (([#14972](https://github.com/opensearch-project/OpenSearch/pull/14972))
- [Workload Management] Add queryGroupId to Task ([14708](https://github.com/opensearch-project/OpenSearch/pull/14708))
- Add setting to ignore throttling nodes for allocation of unassigned primaries in remote restore ([#14991](https://github.com/opensearch-project/OpenSearch/pull/14991))
- [Workload Management] Add Delete QueryGroup API Logic ([#14735](https://github.com/opensearch-project/OpenSearch/pull/14735))
- [Streaming Indexing] Enhance RestClient with a new streaming API support ([#14437](https://github.com/opensearch-project/OpenSearch/pull/14437))
- Add basic aggregation support for derived fields ([#14618](https://github.com/opensearch-project/OpenSearch/pull/14618))
- [Workload Management] Add Create QueryGroup API Logic ([#14680](https://github.com/opensearch-project/OpenSearch/pull/14680))- [Workload Management] Add Create QueryGroup API Logic ([#14680](https://github.com/opensearch-project/OpenSearch/pull/14680))
Expand All @@ -18,9 +19,13 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Add `rangeQuery` and `regexpQuery` for `constant_keyword` field type ([#14711](https://github.com/opensearch-project/OpenSearch/pull/14711))
- Add took time to request nodes stats ([#15054](https://github.com/opensearch-project/OpenSearch/pull/15054))
- [Workload Management] Add Get QueryGroup API Logic ([14709](https://github.com/opensearch-project/OpenSearch/pull/14709))
- [Workload Management] Add Settings for Workload Management feature ([#15028](https://github.com/opensearch-project/OpenSearch/pull/15028))
- [Workload Management] QueryGroup resource tracking framework changes ([#13897](https://github.com/opensearch-project/OpenSearch/pull/13897))
- Support filtering on a large list encoded by bitmap ([#14774](https://github.com/opensearch-project/OpenSearch/pull/14774))
- Add slice execution listeners to SearchOperationListener interface ([#15153](https://github.com/opensearch-project/OpenSearch/pull/15153))
- Add allowlist setting for ingest-geoip and ingest-useragent ([#15325](https://github.com/opensearch-project/OpenSearch/pull/15325))
- Adding access to noSubMatches and noOverlappingMatches in Hyphenation ([#13895](https://github.com/opensearch-project/OpenSearch/pull/13895))
- Add support for index level max slice count setting for concurrent segment search ([#15336](https://github.com/opensearch-project/OpenSearch/pull/15336))

### Dependencies
- Bump `netty` from 4.1.111.Final to 4.1.112.Final ([#15081](https://github.com/opensearch-project/OpenSearch/pull/15081))
Expand All @@ -44,6 +49,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

### Changed
- Add lower limit for primary and replica batch allocators timeout ([#14979](https://github.com/opensearch-project/OpenSearch/pull/14979))
- Optimize regexp-based include/exclude on aggregations when pattern matches prefixes ([#14371](https://github.com/opensearch-project/OpenSearch/pull/14371))
- Replace and block usages of org.apache.logging.log4j.util.Strings ([#15238](https://github.com/opensearch-project/OpenSearch/pull/15238))

### Deprecated
Expand Down
2 changes: 1 addition & 1 deletion MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
| Varun Bansal | [linuxpi](https://github.com/linuxpi) | Amazon |
| Marc Handalian | [mch2](https://github.com/mch2) | Amazon |
| Michael Froh | [msfroh](https://github.com/msfroh) | Amazon |
| Nick Knize | [nknize](https://github.com/nknize) | Amazon |
| Nick Knize | [nknize](https://github.com/nknize) | Lucenia |
| Owais Kazi | [owaiskazi19](https://github.com/owaiskazi19) | Amazon |
| Peter Nied | [peternied](https://github.com/peternied) | Amazon |
| Rishikesh Pasham | [Rishikesh1159](https://github.com/Rishikesh1159) | Amazon |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,16 @@
*/
public class HyphenationCompoundWordTokenFilterFactory extends AbstractCompoundWordTokenFilterFactory {

private final boolean noSubMatches;
private final boolean noOverlappingMatches;
private final HyphenationTree hyphenationTree;

HyphenationCompoundWordTokenFilterFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
super(indexSettings, env, name, settings);

noSubMatches = settings.getAsBoolean("no_sub_matches", false);
noOverlappingMatches = settings.getAsBoolean("no_overlapping_matches", false);

String hyphenationPatternsPath = settings.get("hyphenation_patterns_path", null);
if (hyphenationPatternsPath == null) {
throw new IllegalArgumentException("hyphenation_patterns_path is a required setting.");
Expand All @@ -85,7 +90,9 @@ public TokenStream create(TokenStream tokenStream) {
minWordSize,
minSubwordSize,
maxSubwordSize,
onlyLongestMatch
onlyLongestMatch,
noSubMatches,
noOverlappingMatches
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,12 @@
import org.opensearch.test.IndexSettingsModule;
import org.opensearch.test.OpenSearchTestCase;
import org.hamcrest.MatcherAssert;
import org.junit.Before;

import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
Expand All @@ -63,17 +67,27 @@
import static org.hamcrest.Matchers.instanceOf;

public class CompoundAnalysisTests extends OpenSearchTestCase {

Settings[] settingsArr;

@Before
public void initialize() throws IOException {
final Path home = createTempDir();
copyHyphenationPatternsFile(home);
this.settingsArr = new Settings[] { getJsonSettings(home), getYamlSettings(home) };
}

public void testDefaultsCompoundAnalysis() throws Exception {
Settings settings = getJsonSettings();
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("test", settings);
AnalysisModule analysisModule = createAnalysisModule(settings);
TokenFilterFactory filterFactory = analysisModule.getAnalysisRegistry().buildTokenFilterFactories(idxSettings).get("dict_dec");
MatcherAssert.assertThat(filterFactory, instanceOf(DictionaryCompoundWordTokenFilterFactory.class));
for (Settings settings : this.settingsArr) {
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("test", settings);
AnalysisModule analysisModule = createAnalysisModule(settings);
TokenFilterFactory filterFactory = analysisModule.getAnalysisRegistry().buildTokenFilterFactories(idxSettings).get("dict_dec");
MatcherAssert.assertThat(filterFactory, instanceOf(DictionaryCompoundWordTokenFilterFactory.class));
}
}

public void testDictionaryDecompounder() throws Exception {
Settings[] settingsArr = new Settings[] { getJsonSettings(), getYamlSettings() };
for (Settings settings : settingsArr) {
for (Settings settings : this.settingsArr) {
List<String> terms = analyze(settings, "decompoundingAnalyzer", "donaudampfschiff spargelcremesuppe");
MatcherAssert.assertThat(terms.size(), equalTo(8));
MatcherAssert.assertThat(
Expand All @@ -83,6 +97,26 @@ public void testDictionaryDecompounder() throws Exception {
}
}

// Hyphenation Decompounder tests mimic the behavior of lucene tests
// lucene/analysis/common/src/test/org/apache/lucene/analysis/compound/TestHyphenationCompoundWordTokenFilterFactory.java
public void testHyphenationDecompounder() throws Exception {
for (Settings settings : this.settingsArr) {
List<String> terms = analyze(settings, "hyphenationAnalyzer", "min veninde som er lidt af en læsehest");
MatcherAssert.assertThat(terms.size(), equalTo(10));
MatcherAssert.assertThat(terms, hasItems("min", "veninde", "som", "er", "lidt", "af", "en", "læsehest", "læse", "hest"));
}
}

// Hyphenation Decompounder tests mimic the behavior of lucene tests
// lucene/analysis/common/src/test/org/apache/lucene/analysis/compound/TestHyphenationCompoundWordTokenFilterFactory.java
public void testHyphenationDecompounderNoSubMatches() throws Exception {
for (Settings settings : this.settingsArr) {
List<String> terms = analyze(settings, "hyphenationAnalyzerNoSubMatches", "basketballkurv");
MatcherAssert.assertThat(terms.size(), equalTo(3));
MatcherAssert.assertThat(terms, hasItems("basketballkurv", "basketball", "kurv"));
}
}

private List<String> analyze(Settings settings, String analyzerName, String text) throws IOException {
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("test", settings);
AnalysisModule analysisModule = createAnalysisModule(settings);
Expand Down Expand Up @@ -111,21 +145,28 @@ public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
}));
}

private Settings getJsonSettings() throws IOException {
private void copyHyphenationPatternsFile(Path home) throws IOException {
InputStream hyphenation_patterns_path = getClass().getResourceAsStream("da_UTF8.xml");
Path config = home.resolve("config");
Files.createDirectory(config);
Files.copy(hyphenation_patterns_path, config.resolve("da_UTF8.xml"));
}

private Settings getJsonSettings(Path home) throws IOException {
String json = "/org/opensearch/analysis/common/test1.json";
return Settings.builder()
.loadFromStream(json, getClass().getResourceAsStream(json), false)
.put(IndexMetadata.SETTING_VERSION_CREATED, Version.CURRENT)
.put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString())
.put(Environment.PATH_HOME_SETTING.getKey(), home.toString())
.build();
}

private Settings getYamlSettings() throws IOException {
private Settings getYamlSettings(Path home) throws IOException {
String yaml = "/org/opensearch/analysis/common/test1.yml";
return Settings.builder()
.loadFromStream(yaml, getClass().getResourceAsStream(yaml), false)
.put(IndexMetadata.SETTING_VERSION_CREATED, Version.CURRENT)
.put(Environment.PATH_HOME_SETTING.getKey(), createTempDir().toString())
.put(Environment.PATH_HOME_SETTING.getKey(), home.toString())
.build();
}
}
Loading

0 comments on commit 214f929

Please sign in to comment.