Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run core's integration tests with runtime fields #60931

Merged
merged 6 commits into from
Aug 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -724,7 +724,7 @@ setup:
body: { "size" : 0, "aggs" : { "no_field_terms" : { "terms" : { "size": 1 } } } }

---
"string profiler":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this one so I figured I could fix it while I was here. Runtime fields don't have global ordinals so rather than skip the entire test for them I split it into two tests, one that runtime fields can run (below) and this one which runtime fields will skip.

"string profiler via global ordinals":
- skip:
version: " - 7.8.99"
reason: debug information added in 7.9.0
Expand Down Expand Up @@ -776,36 +776,6 @@ setup:
- match: { profile.shards.0.aggregations.0.children.0.type: MaxAggregator }
- match: { profile.shards.0.aggregations.0.children.0.description: max_number }

- do:
search:
index: test_1
body:
profile: true
size: 0
aggs:
str_terms:
terms:
field: str
collect_mode: breadth_first
execution_hint: map
aggs:
max_number:
max:
field: number
- match: { aggregations.str_terms.buckets.0.key: sheep }
- match: { aggregations.str_terms.buckets.0.max_number.value: 3 }
- match: { aggregations.str_terms.buckets.1.key: cow }
- match: { aggregations.str_terms.buckets.1.max_number.value: 1 }
- match: { aggregations.str_terms.buckets.2.key: pig }
- match: { aggregations.str_terms.buckets.2.max_number.value: 1 }
- match: { profile.shards.0.aggregations.0.type: MapStringTermsAggregator }
- match: { profile.shards.0.aggregations.0.description: str_terms }
- match: { profile.shards.0.aggregations.0.breakdown.collect_count: 4 }
- match: { profile.shards.0.aggregations.0.debug.deferred_aggregators: [ max_number ] }
- match: { profile.shards.0.aggregations.0.debug.result_strategy: terms }
- match: { profile.shards.0.aggregations.0.children.0.type: MaxAggregator }
- match: { profile.shards.0.aggregations.0.children.0.description: max_number }

- do:
indices.create:
index: test_3
Expand Down Expand Up @@ -857,6 +827,55 @@ setup:
- match: { profile.shards.0.aggregations.0.children.0.type: MaxAggregator }
- match: { profile.shards.0.aggregations.0.children.0.description: max_number }

---
"string profiler via map":
- skip:
version: " - 7.8.99"
reason: debug information added in 7.9.0
- do:
bulk:
index: test_1
refresh: true
body: |
{ "index": {} }
{ "str": "sheep", "number": 1 }
{ "index": {} }
{ "str": "sheep", "number": 3 }
{ "index": {} }
{ "str": "cow", "number": 1 }
{ "index": {} }
{ "str": "pig", "number": 1 }

- do:
search:
index: test_1
body:
profile: true
size: 0
aggs:
str_terms:
terms:
field: str
collect_mode: breadth_first
execution_hint: map
aggs:
max_number:
max:
field: number
- match: { aggregations.str_terms.buckets.0.key: sheep }
- match: { aggregations.str_terms.buckets.0.max_number.value: 3 }
- match: { aggregations.str_terms.buckets.1.key: cow }
- match: { aggregations.str_terms.buckets.1.max_number.value: 1 }
- match: { aggregations.str_terms.buckets.2.key: pig }
- match: { aggregations.str_terms.buckets.2.max_number.value: 1 }
- match: { profile.shards.0.aggregations.0.type: MapStringTermsAggregator }
- match: { profile.shards.0.aggregations.0.description: str_terms }
- match: { profile.shards.0.aggregations.0.breakdown.collect_count: 4 }
- match: { profile.shards.0.aggregations.0.debug.deferred_aggregators: [ max_number ] }
- match: { profile.shards.0.aggregations.0.debug.result_strategy: terms }
- match: { profile.shards.0.aggregations.0.children.0.type: MaxAggregator }
- match: { profile.shards.0.aggregations.0.children.0.description: max_number }

---
"numeric profiler":
- skip:
Expand Down
1 change: 1 addition & 0 deletions x-pack/plugin/runtime-fields/qa/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
// Empty project so we can pick up its subproject
42 changes: 42 additions & 0 deletions x-pack/plugin/runtime-fields/qa/rest/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
apply plugin: 'elasticsearch.yaml-rest-test'

restResources {
restTests {
includeCore '*'
}
}

testClusters.yamlRestTest {
testDistribution = 'DEFAULT'
}

yamlRestTest {
systemProperty 'tests.rest.suite',
[
'search',
'search.aggregation',
'search.highlight',
'search.inner_hits',
'search_shards',
'suggest',
'msearch',
'field_caps',
].join(',')
systemProperty 'tests.rest.blacklist',
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite the list of exclusions. Seems a bit brittle. One issue I anticipate is future newly added tests also being incompatible with this approach for one reason or another.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is to fix the ones between here and line 36 fairly soon. At least theoretically, tests that fetch the mapping are never going to work but other sorts of tests should work. I could scan the test configuration for the command to fetch the mapping and skip those. It wouldn't be too bad and would shrink these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to add skip style property so these tests can opt themselves out of running against runtime field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a skip property -- that way the 'skip reason' is discoverable and right next to the test itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to bring this notion of runtime fields testing up into the test framework itself though, which seems a bit awkward given how this is currently structured as a specific qa project.

/////// TO FIX ///////
'search/330_fetch_fields/*', // The whole API is not yet supported
'search.aggregation/20_terms/Global ordinals are not loaded with the map execution hint', // Broken. Gotta fix.
'search.highlight/40_keyword_ignore/Plain Highligher should skip highlighting ignored keyword values', // Broken. Gotta fix.
'search/115_multiple_field_collapsing/two levels fields collapsing', // Broken. Gotta fix.
'search/140_pre_filter_search_shards/pre_filter_shard_size with shards that have no hit', // Broken. Gotta fix.
'field_caps/30_filter/Field caps with index filter', // We don't support filtering field caps on runtime fields. What should we do?
'search.aggregation/10_histogram/*', // runtime_script doesn't support sub-fields. Maybe it should?
/////// TO FIX ///////

/////// NOT SUPPORTED ///////
'search.aggregation/280_rare_terms/*', // Requires an index and we won't have it
'search.aggregation/20_terms/string profiler via global ordinals', // Runtime fields don't have global ords
/////// NOT SUPPORTED ///////
].join(',')
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

package org.elasticsearch.xpack.runtimefields.rest;

import com.carrotsearch.randomizedtesting.annotations.Name;
import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;

import org.elasticsearch.index.mapper.IpFieldMapper;
import org.elasticsearch.index.mapper.KeywordFieldMapper;
import org.elasticsearch.index.mapper.NumberFieldMapper.NumberType;
import org.elasticsearch.test.rest.yaml.ClientYamlTestCandidate;
import org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase;
import org.elasticsearch.test.rest.yaml.section.DoSection;
import org.elasticsearch.test.rest.yaml.section.ExecutableSection;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;

public class CoreTestsWithRuntimeFieldsIT extends ESClientYamlSuiteTestCase {
public CoreTestsWithRuntimeFieldsIT(@Name("yaml") ClientYamlTestCandidate testCandidate) {
super(testCandidate);
}

/**
* Builds test parameters similarly to {@link ESClientYamlSuiteTestCase#createParameters()},
* replacing the body of index creation commands so that fields are {@code runtime_script}s
* that load from {@code source} instead of their original type. Test configurations that
* do are not modified to contain runtime fields are not returned as they are tested
* elsewhere.
*/
@ParametersFactory
public static Iterable<Object[]> parameters() throws Exception {
/*
* Map of "setup"s that we've seen - from path to whether or
* not we the setup was modified to include a runtime_script
*/
Map<String, Boolean> seenSetups = new HashMap<>();
List<Object[]> result = new ArrayList<>();
for (Object[] orig : ESClientYamlSuiteTestCase.createParameters()) {
assert orig.length == 1;
ClientYamlTestCandidate candidate = (ClientYamlTestCandidate) orig[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a time when the rest of the Object array is relevant (beyond the first element)? Maybe we could replace orig[0] and still return orig, or simply assert orig.length == 1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array only ever has a single entry, but I'll add an assert just to make sure it stays that way. The funny shape of the return here comes from the method being used primarily for junit's parameterized tests.

boolean modifiedSetup = seenSetups.computeIfAbsent(
candidate.getName(),
k -> modifySection(candidate.getSuitePath() + "/setup", candidate.getSetupSection().getExecutableSections())
);
boolean modifiedTest = modifySection(candidate.getTestPath(), candidate.getTestSection().getExecutableSections());
if (modifiedSetup || modifiedTest) {
result.add(new Object[] { candidate });
}
}
return result;
}

/**
* Replace property configuration in {@code indices.create} with scripts
* that load from the source.
* @return {@code true} if any fields were rewritten into runtime_scripts, {@code false} otherwise.
*/
private static boolean modifySection(String sectionName, List<ExecutableSection> executables) {
boolean include = false;
for (ExecutableSection section : executables) {
if (false == (section instanceof DoSection)) {
continue;
}
DoSection doSection = (DoSection) section;
if (false == doSection.getApiCallSection().getApi().equals("indices.create")) {
continue;
}
for (Map<?, ?> body : doSection.getApiCallSection().getBodies()) {
Object settings = body.get("settings");
if (settings instanceof Map && ((Map<?, ?>) settings).containsKey("sort.field")) {
/*
* You can't sort the index on a runtime_keyword and it is
* hard to figure out if the sort was a runtime_keyword so
* let's just skip this test.
*/
continue;
}
Object mappings = body.get("mappings");
if (false == (mappings instanceof Map)) {
continue;
}
Object properties = ((Map<?, ?>) mappings).get("properties");
if (false == (properties instanceof Map)) {
continue;
}
for (Map.Entry<?, ?> property : ((Map<?, ?>) properties).entrySet()) {
if (false == property.getValue() instanceof Map) {
continue;
}
@SuppressWarnings("unchecked")
Map<String, Object> propertyMap = (Map<String, Object>) property.getValue();
String name = property.getKey().toString();
String type = Objects.toString(propertyMap.get("type"));
if ("false".equals(Objects.toString(propertyMap.get("doc_values")))) {
// If doc_values is false we can't emulate with scripts. `null` and `true` are fine.
continue;
}
if ("false".equals(Objects.toString(propertyMap.get("index")))) {
// If index is false we can't emulate with scripts
continue;
}
if ("true".equals(Objects.toString(propertyMap.get("store")))) {
// If store is true we can't emulate with scripts
continue;
}
if (propertyMap.containsKey("ignore_above")) {
// Scripts don't support ignore_above so we skip those fields
continue;
}
if (propertyMap.containsKey("ignore_malformed")) {
// Our source reading script doesn't emulate ignore_malformed
continue;
}
String toLoad = painlessToLoadFromSource(name, type);
if (toLoad == null) {
continue;
}
propertyMap.put("type", "runtime_script");
propertyMap.put("runtime_type", type);
propertyMap.put("script", toLoad);
propertyMap.remove("store");
propertyMap.remove("index");
propertyMap.remove("doc_values");
include = true;
}
}
}
return include;
}

private static String painlessToLoadFromSource(String name, String type) {
String emit = PAINLESS_TO_EMIT.get(type);
if (emit == null) {
return null;
}
StringBuilder b = new StringBuilder();
b.append("def v = source['").append(name).append("'];\n");
b.append("if (v instanceof Iterable) {\n");
b.append(" for (def vv : ((Iterable) v)) {\n");
b.append(" if (vv != null) {\n");
b.append(" def value = vv;\n");
b.append(" ").append(emit).append("\n");
b.append(" }\n");
b.append(" }\n");
b.append("} else {\n");
b.append(" if (v != null) {\n");
b.append(" def value = v;\n");
b.append(" ").append(emit).append("\n");
b.append(" }\n");
b.append("}\n");
return b.toString();
}

private static final Map<String, String> PAINLESS_TO_EMIT = Map.ofEntries(
// TODO implement dates against the parser
Map.entry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the best approach for now but feels a bit fragile -- we're essentially duplicating the source parsing logic, which can be pretty nuanced. For example, for boolean fields, the value "" is interpreted as false. We also don't cover cases where the value is malformed or missing from _source.

To make this more robust, I guess we could switch to 'source-only fields' when those are available? That way we could reuse the logic from fields retrieval for parsing values from source. Or we could consider adding a way to access nicely-formatted source fields from painless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really is pretty fragile. I was thinking of something similar to what you said - steal the source parsing and expose it to the script. That wouldn't be a "painless feature" so much as a feature of the script context of the runtime field. The advantage of this is that I know exactly which runtime field I'm running against so, for example, dates could use the the field's date parsing logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I think this is the right way to go for this feature branch, but it'd be good to follow-up on source parsing improvements at a later point.

NumberType.DOUBLE.typeName(),
"value(value instanceof Number ? ((Number) value).doubleValue() : Double.parseDouble(value.toString()));"
),
Map.entry(KeywordFieldMapper.CONTENT_TYPE, "value(value.toString());"),
Map.entry(IpFieldMapper.CONTENT_TYPE, "stringValue(value.toString());"),
Map.entry(
NumberType.LONG.typeName(),
"value(value instanceof Number ? ((Number) value).longValue() : Long.parseLong(value.toString()));"
)
);

}