-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct dynamic updates directly via object builders #81449
Construct dynamic updates directly via object builders #81449
Conversation
…te-without-merging
…te-without-merging
…te-without-merging
Pinging @elastic/es-search (Team:Search) |
@elasticmachine update branch |
That sounds lovely. |
…te-without-merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass of this and left a few notes mostly for my own understanding. I didn't really go deep into the old logic in DocumentParser#createDynamicUpdate since it seems to be replaced by a completely different bottom-up-approach, but instead focused on that and the changes in the builders, which make sense to me so far. I still would like to give this a second pass and take another look at the test changes and would also appreciate a second pair of eyes here since this changes a lot of complex logic where I hope we cover most of the edge cases in tests already.
protected volatile Dynamic dynamic; | ||
|
||
protected volatile CopyOnWriteHashMap<String, Mapper> mappers; | ||
protected Map<String, Mapper> mappers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious to understand why this changed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, probably because its not mutable any more now? Could the map be it be final then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not, it gets updated in doMerge()
. Which is a whole 'nother can of worms.
this.runtimeFields.put(runtimeField.name(), runtimeField); | ||
return this; | ||
} | ||
|
||
public RootObjectMapper.Builder setRuntime(Map<String, RuntimeField> runtimeFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if this method now adds all input fields, could we rename it to "addRuntime" to make it clearer this doesn't overwrite the existing map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll update.
String fullChildName = prefix == null ? childName : prefix + "." + childName; | ||
ObjectMapper.Builder childBuilder = findChild(childName, fullChildName, context); | ||
childBuilder.addDynamic(name.substring(firstDotIndex + 1), fullChildName, mapper, context); | ||
mappersBuilders.add(childBuilder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably confused by the recursion here, but would it make sense to add the new childBuilder before the recursive step? Or maybe it doesn't matter, I now see that "findChild" only looks up previous dynamic updates in 'context.getObjectMapper(fullChildName)' and that wouldn't be affected by the mappersBuilders
here? Just a bit confused, maybe you can explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, findChild
doesn't actually consult the existing mappersBuilders
at all. In fact, I think the final branch of findChild
, where we just return a new builder, won't ever be called now, because of #79922 - a dynamically built mapper will always have a parent that is either in the existing mappings, or has also been dynamically added. Maybe worth keeping it as a safety fall-back though.
In terms of adding the childBuilder to mappersBuilders before/after the recursion, it doesn't actually matter. The recursive step knows nothing about its parent (apart from using the passed-in prefix to build a full child name for lookup purposes), and the childBuilder itself is mutable so we can add a new mapper to it either before or after it has been added to mappersBuilders
, it's the same object with the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the thing the feels weird to me is that we're merging stuff from the tree of updates in the context and the tree under the object itself. It's, like, how you have to do it, I think. I left a note about the method itself below. Maybe I've found something? If I have maybe it'd give us something to pry on to simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite under the object itself, it's under a new, empty, objectbuilder that has been generated by the main object. I've tried to explain the recursion more fully in a comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, if I remove the third branch and throw an exception, it causes failures in tests that are checking how fields starting or ending with a dot behave. I'll see if I can pull field name validation out into a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a second pass with a few more questions, other than that LGTM. Still worth checking from another point of view I think.
@@ -87,6 +86,64 @@ public Builder add(Mapper.Builder builder) { | |||
return this; | |||
} | |||
|
|||
public Builder addMappers(Map<String, Mapper> mappers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this can be package private, seems only be used by ObjectMapperMergeTests
b.endObject(); | ||
b.field("field", 10); | ||
b.endObject(); | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your opinion, does changing these tests from directly testing DocumentParser.createDynamicUpdate() to parsing full json objects also change their coverage? As far as I understand, we previously sorted all mapper updates from a parse alphabetically (thus the sorting in the test setups). Now we don't need to do that anymore due to the new recursive strategy. However, I wonder if my re-ordering things in the json we might create a different update order and maybe different results/edge cases? I'm not sure this can happen so just wondering if that might be a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main difference I think is if we have two incompatible dynamic mappers for the same field - let's say you have something that looks like this:
{ "foo" : { "bar" : 1 }, "foo.bar" : "baz" }
We'll generate two mappings for 'foo.bar' here, one long
and one text
+keyword
. Previously they would have been merged together and thrown an exception; now one will overwrite the other, but we'll still get an exception when the document is re-parsed if the long
update is seen second and therefore 'wins'. I think this is OK though?
This also opens up possibilities for automatic widening of dynamic types - we could in the future have some kind of merge detection that says if you have a long
mapper and a text
mapper, you should always accept the text
mapper when you combine the two in a dynamic context.
…te-without-merging
…te-without-merging
Another update, taking advantage of the changes in #82359. This does now in fact cause a change in behaviour, due to #28948. Previously if you send in a document that looked like Given all the weird corner cases that #28948 causes I wonder if we should address it as part of this? It's an easy change to make in the document parsing code, although it may be a bit trickier as part of mapper parsing. |
@elasticmachine run elasticsearch-ci/bwc |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I still don't fully understand it, but I'm closer to understand it than before. And a lot closer to understanding it than the old code. And we have tests.
@elasticmachine update branch |
* upstream/master: (762 commits) [DOCS] Add note to that log4j customization is outside the support scope (elastic#82668) Batch Index Settings Update Requests (elastic#82896) [DOCS] Delete pipeline containing stored script (elastic#83102) Try again to fix changelog areas after reorg (elastic#83100) Bind to non-localhost for transport in some cases (elastic#82973) [DOCS] Reuse multi-level `join` warning (elastic#82976) Remove unnecessary CopyOnWriteHashMap class (elastic#83040) Adjust changelog categories after reorg (elastic#83087) [DOCS] Fix typo in `action.destructive_requires_name` breaking change (elastic#83085) Stack Monitoring: Add Enterprise Search monitoring index templates (elastic#82743) [DOCS] Fix stored script example snippet (elastic#83056) [DOCS] Re-add network traffic para to `term` query (elastic#83047) [DOCS] Rename example stored script (elastic#83054) [ML][DOCS] Add Trained model APIs to the REST APIs index (elastic#82791) [ML] Update running process when global calendar changes (elastic#83044) [Transform] Fix condition on which the transform stops processing buckets (elastic#82852) [DOCS] Fixes field names in ML sum functions. (elastic#83048) [ML] fix NLP tokenization never_split handling around punctuation (elastic#82982) Construct dynamic updates directly via object builders (elastic#81449) Emit trace.id into audit logs (elastic#82849) ... # Conflicts: # client/rest-high-level/src/test/java/org/elasticsearch/client/IndicesClientIT.java # client/rest-high-level/src/test/java/org/elasticsearch/client/documentation/ILMDocumentationIT.java # server/src/main/java/org/elasticsearch/action/admin/indices/rollover/Condition.java # server/src/test/java/org/elasticsearch/action/admin/indices/rollover/ConditionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/RolloverActionTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/TimeseriesLifecycleTypeTests.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/WaitForRolloverReadyStepTests.java
When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field. Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed. There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with elastic#81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error. This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException. Closes elastic#87513
…87622) When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field. Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed. There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with #81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error. This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException. Closes #87513
…lastic#87622) When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field. Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed. There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with elastic#81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error. This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException. Closes elastic#87513
…87622) When we apply dynamic mappings, we iterate over all the dynamic mappers retrieved from the DocumentParserContext, which are registered while parsing the document, and for each field look up their parent (going back multiple levels if necessary), and add it to the dynamic mapping update with the added/modified sub-field. Dynamic mappers that are iterated through consist of a flat set of mappers, which can be both object mappers or field mappers. Most times, the object mappers from such set have no sub-fields as they are mapped as a result of parsing a document where the object appears for the first time, which also has sub-fields that are going to be added to the set of dynamic mappers themselves once they are parsed. There are scenarios though where a dynamic template matches an object, and defines its structure including its subobjects as well as sub-fields. In that case the dynamically mapped object may hold sub-fields as well as define non-default values for dynamic, enabled or subobjects. The described situation was not well covered in tests so far, and is currently affected by a bug introduced with #81449. When an object mapper is dynamically mapped, it is added to the map of dynamic object mappers, which makes it discoverable for sub-fields that will need to be added to it later, as well as to the set of dynamic mappers so that it becomes part of the mappings in case the document being parsed defines no sub-fields for it. What is missing is to recursively add its sub-fields to the dynamic object mappers. As a result of missing this step, intermediate objects that were dynamically mapped are not made discoverable which causes a cryptic "Missing intermediate object" error. This commit fixed the problem by recursively registering inner sub-objects to the dynamic mappers whenever an object mapper is added to the dynamic mappers. It also changes the "missing intermediate object" error to make it more evident that it's an internal error and not a user error: it is now an IllegalStateException instead of an IllegalArgumentException. Closes #87513
Currently, dynamic updates are built in the DocumentParser using a
stack of possibly-dynamic object mappers. This logic, spread across
a number of static methods, frequently assumes that the parents of
a mapper can be found by splitting its name on dots, an assumption
that will fail to hold once we allow objects to hold fields that have dots
in their names.
As a pre-requisite for the dots in field names effort, this commit refactors
the construction of dynamic updates into object mapper builders. Now,
to build an update, we start with a new dynamic root builder, and then
call
addUpdate
on it with each dynamically built mapper in turn. Thebuilder will examine the mapper and see if it can just add it to its own
set of mappers directly; and if not, it will retrieve or build an appropriate
intermediate object mapper and recursively call
addUpdate
on it withthe original mapper.
As a side-effect of this change, ObjectMapper itself no longer updates
its map of child mappers except during construction via merging, and so
we can safely replace CopyOnWriteHashMap here.