-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the algorithm for extracting multifields #48756
Labels
:ml
Machine learning
Comments
Pinging @elastic/ml-core (:ml) |
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Oct 31, 2019
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates elastic#48756
dimitris-athanasiou
added a commit
that referenced
this issue
Nov 1, 2019
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates #48756
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Nov 1, 2019
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates elastic#48756 Backport of elastic#48770
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Nov 1, 2019
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates elastic#48756 Backport of elastic#48770
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Nov 1, 2019
In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that elastic#48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes elastic#48756
dimitris-athanasiou
added a commit
that referenced
this issue
Nov 1, 2019
In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that #48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes #48756
dimitris-athanasiou
added a commit
that referenced
this issue
Nov 1, 2019
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates #48756 Backport of #48770
dimitris-athanasiou
added a commit
that referenced
this issue
Nov 1, 2019
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates #48756 Backport of #48770
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Nov 1, 2019
…48799) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that elastic#48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes elastic#48756 Backport of elastic#48799
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Nov 1, 2019
…48799) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that elastic#48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes elastic#48756 Backport elastic#48799
dimitris-athanasiou
added a commit
that referenced
this issue
Nov 1, 2019
…48806) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that #48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes #48756 Backport of #48799
dimitris-athanasiou
added a commit
that referenced
this issue
Nov 1, 2019
…48807) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that #48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes #48756 Backport #48799
This was referenced Feb 3, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When extracting multifield, only one of the fields (i.e. the first aggregatable) should be extracted.
This should help to fix issues such as:
https://github.com/elastic/ml-team/issues/235
https://github.com/elastic/ml-team/issues/239
which data team raised
The text was updated successfully, but these errors were encountered: