-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mixing set-based and regexp-based include and exclude #63325
Allow mixing set-based and regexp-based include and exclude #63325
Conversation
There is one thing I'm a bit unsure of, around the stream serialization, and which version to set. I set the next major version (8.0.0) as default but I guess if this is merged in a 7.x branch this would need changing. Also, let's say we wanted to backport this feature to our own build of ES v6, do you think it could be done? |
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
@elasticmachine test this please |
@elasticmachine, ok to test |
*/ | ||
@Override | ||
public boolean accept(BytesRef value) { | ||
return ((valids == null) || (valids.contains(value))) && ((invalids == null) || (!invalids.contains(value))); | ||
if (valids != null && !valids.contains(value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (valids != null && !valids.contains(value)) { | |
if (valids != null && (valids.contains(value) == false)) { |
Elastic coding standard prefers this form for readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return false; | ||
} | ||
|
||
if (runAutomaton != null && !runAutomaton.run(value.bytes, value.offset, value.length)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (runAutomaton != null && !runAutomaton.run(value.bytes, value.offset, value.length)) { | |
if (runAutomaton != null && (runAutomaton.run(value.bytes, value.offset, value.length) == false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
public IncludeExclude(RegExp include, RegExp exclude, SortedSet<BytesRef> includeValues, SortedSet<BytesRef> excludeValues) { | ||
if (include == null && exclude == null && includeValues == null && excludeValues == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention here is that at most one of (include
, includeValues
) and at most one of (exclude
, excludeValues
) will be non-null. In other words, while you can mix set-based includes and regex excludes (or vice versa), you can't have both set-based and regex-based includes. That seems like a requirement of the precedence rules, among other things.
I think we should enforce that rule here. I know the parser doesn't currently allow for specifying both a regex and a set at the same time, but it's ultimately this class's contract that both not be set, and this class should check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in IncludeExclude
any combination of the 4 (include
, includeValues
, exclude
, excludeValues
) can work correctly. Meaning that we can have both kinds of includes and/or both kind of excludes and it would "do the right/logical thing", i.e. it would accept terms that are in any of the include(s) but not in any of the exclude(s). I don't think there is precedence between both kinds of includes, nor between both kinds of excludes, just between include(s) and exclude(s).
Also, I don't think restricting to a single include and a single exclude would be more efficient (e.g. allow optimizations in the accept functions).
That's why I didn't forbid having both types of includes or both types of excludes. But I can definitely add a check to forbid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case I am worried about, if you have something like include = "foo.*"
and includeValues = ["bar", "quux"]
, it will incorrectly reject the term "foo" by returning false on line 211, I think. I don't think we need to support that case, but we do need to explicitly reject it, by not allowing both include
and includeValues
to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I was wrong in my previous comment. I think the accept
from the ordinals filter can work with both kinds of includes (or excludes), but indeed not the string filter, where I assumed there was only one kind of each.
I've added a check to forbid this
aggregationBuilder = new TermsAggregationBuilder("_name").userValueTypeHint(ValueType.STRING) | ||
.executionHint(executionHint) | ||
.includeExclude(new IncludeExclude("val00.+", null, null, | ||
new String[]{"val001", "val002", "val003", "val004", "val005", "val006", "val007", "val008"})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate that you added tests to make sure that the precedence is preserved with the new possible configurations, but I think we should test this on IncludeExcludeTests
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored/added tests for all possible combinations of include/exclude in IncludeExcludeTests
.
server/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/IncludeExclude.java
Show resolved
Hide resolved
I think the BWC failure is related to us cutting a branch around the time this test ran, I didn't see anything that looked related to this change. When you make the requested changes, we'll run the whole suite again and if BWC passes then, that's fine. |
All right. Do you want me to merge master into my branch? Or rebase onto it? |
This looks great, thanks. Yeah, if you could merge master into the branch, that should fix the BWC tests. I'm OOO tomorrow, but hopefully we can get this merged Monday or Tuesday. |
I merged master. However, BWC is still failing on my machine, but actually it's even failing for the master branch. It looks like other recent PRs are also failing the same tests though. We'll see how it goes with the CI but I'm not very hopeful. Update: OK so BWC passed in the CI but another task failed. It seems to me that it's not related to my changes. I tried reproducing the two failing tests on my machine with the command lines given in the log outputs, and they both pass successfully. I'm not sure what to do now |
I agree that test failure doesn't look related. I'll look into it. |
@elasticmachine run elasticsearch-ci/1 |
One of those is a known issue: #63719. I'm rerunning the suite to see if a different random seed fixes it. I'll dig in a bit more on the second failure and open an issue if necessary. |
This should be all set. I'll get it merged and backported this week. Thanks for coding it up! |
Awesome! Thank you for the review! |
…63325) * Allow mixing set-based and regexp-based include and exclude * Coding style * Disallow having both set and regexp include (resp. exclude) * Test correctness of every combination of include/exclude
This PR adds the feature discussed in #62246
It allows mixing set-based and regexp-based "include" and "exclude" parameters in Terms (and SignificantTerms, RareTerms) aggregations. Both ways are supported: a set include with a regexp exclude, or a regexp include with a set exclude.