Skip to content

Commit

Permalink
Deprecate sorting in reindex
Browse files Browse the repository at this point in the history
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.

It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.

Related to elastic#47567
  • Loading branch information
henningandersen committed Nov 21, 2019
1 parent eca6003 commit 688f2d0
Show file tree
Hide file tree
Showing 6 changed files with 82 additions and 45 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@
import org.elasticsearch.script.Script;
import org.elasticsearch.script.ScriptType;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.tasks.TaskId;

import java.util.Collections;
Expand Down Expand Up @@ -833,10 +832,6 @@ public void testReindex() throws Exception {
// tag::reindex-request-pipeline
request.setDestPipeline("my_pipeline"); // <1>
// end::reindex-request-pipeline
// tag::reindex-request-sort
request.addSortField("field1", SortOrder.DESC); // <1>
request.addSortField("field2", SortOrder.ASC); // <2>
// end::reindex-request-sort
// tag::reindex-request-script
request.setScript(
new Script(
Expand Down
10 changes: 0 additions & 10 deletions docs/java-rest/high-level/document/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,6 @@ include-tagged::{doc-tests-file}[{api}-request-pipeline]
--------------------------------------------------
<1> set pipeline to `my_pipeline`

If you want a particular set of documents from the source index you’ll need to use sort. If possible, prefer a more
selective query to maxDocs and sort.

["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-sort]
--------------------------------------------------
<1> add descending sort to`field1`
<2> add ascending sort to `field2`

+{request}+ also supports a `script` that modifies the document. It allows you to
also change the document's metadata. The following example illustrates that.

Expand Down
40 changes: 10 additions & 30 deletions docs/reference/docs/reindex.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,9 @@ which defaults to a maximum size of 100 MB.
`sort`:::
(Optional, list) A comma-separated list of `<field>:<direction>` pairs to sort by before indexing.
Use in conjunction with `max_docs` to control what documents are reindexed.
deprecated[7.6, sort in reindex is deprecated] Sorting in reindex was never guaranteed to process
docs in order and prevents future evolution of reindex such as resilience and performance
improvements. If used in combination with `max_docs`, consider using a query filter instead.

`_source`:::
(Optional, string) If `true` reindexes all source fields.
Expand Down Expand Up @@ -602,8 +605,8 @@ POST _reindex
--------------------------------------------------
// TEST[setup:twitter]

[[docs-reindex-select-sort]]
===== Reindex select documents with sort
[[docs-reindex-select-max-docs]]
===== Reindex select documents with `max_docs`

You can limit the number of processed documents by setting `max_docs`.
For example, this request copies a single document from `twitter` to
Expand All @@ -624,28 +627,6 @@ POST _reindex
--------------------------------------------------
// TEST[setup:twitter]

You can use `sort` in conjunction with `max_docs` to select the documents you want to reindex.
Sorting makes the scroll less efficient but in some contexts it's worth it.
If possible, it's better to use a more selective query instead of `max_docs` and `sort`.

For example, following request copies 10000 documents from `twitter` into `new_twitter`:

[source,console]
--------------------------------------------------
POST _reindex
{
"max_docs": 10000,
"source": {
"index": "twitter",
"sort": { "date": "desc" }
},
"dest": {
"index": "new_twitter"
}
}
--------------------------------------------------
// TEST[setup:twitter]

[[docs-reindex-multiple-indices]]
===== Reindex from multiple indices

Expand Down Expand Up @@ -824,11 +805,10 @@ POST _reindex
"index": "twitter",
"query": {
"function_score" : {
"query" : { "match_all": {} },
"random_score" : {}
"random_score" : {},
"min_score" : 0.9 <1>
}
},
"sort": "_score" <1>
}
},
"dest": {
"index": "random_twitter"
Expand All @@ -837,8 +817,8 @@ POST _reindex
----------------------------------------------------------------
// TEST[setup:big_twitter]

<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any
effect unless you override the sort to `_score`.
<1> you may need to adjust the `min_score` depending on the relative amount of
data extracted from source.

[[reindex-scripts]]
===== Modify documents during reindexing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
import org.elasticsearch.cluster.service.ClusterService;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.bytes.BytesReference;
import org.elasticsearch.common.logging.DeprecationLogger;
import org.elasticsearch.common.lucene.uid.Versions;
import org.elasticsearch.common.xcontent.DeprecationHandler;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
Expand All @@ -51,6 +52,7 @@
import org.elasticsearch.index.reindex.remote.RemoteScrollableHitSource;
import org.elasticsearch.script.Script;
import org.elasticsearch.script.ScriptService;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.threadpool.ThreadPool;

import java.io.IOException;
Expand All @@ -71,6 +73,9 @@
public class Reindexer {

private static final Logger logger = LogManager.getLogger(Reindexer.class);
private static final DeprecationLogger deprecationLogger = new DeprecationLogger(logger);
static final String SORT_DEPRECATED_MESSAGE = "The sort option in reindex is deprecated. " +
"Instead consider using query filtering to find the desired subset of data.";

private final ClusterService clusterService;
private final Client client;
Expand All @@ -88,6 +93,10 @@ public class Reindexer {
}

public void initTask(BulkByScrollTask task, ReindexRequest request, ActionListener<Void> listener) {
SearchSourceBuilder searchSource = request.getSearchRequest().source();
if (searchSource != null && searchSource.sorts() != null && searchSource.sorts().isEmpty() == false) {
deprecationLogger.deprecated(SORT_DEPRECATED_MESSAGE);
}
BulkByScrollParallelizationHelper.initTaskState(task, request, client, listener);
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.reindex;

import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.search.sort.SortOrder;
import org.elasticsearch.test.ESSingleNodeTestCase;

import java.util.Arrays;
import java.util.Collection;

import static org.elasticsearch.index.reindex.ReindexTestCase.matcher;
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount;

public class ReindexSingleNodeTests extends ESSingleNodeTestCase {
@Override
protected Collection<Class<? extends Plugin>> getPlugins() {
return Arrays.asList(ReindexPlugin.class);
}

public void testDeprecatedSort() {
int max = between(2, 20);
for (int i = 0; i < max; i++) {
client().prepareIndex("source").setId(Integer.toString(i)).setSource("foo", i).get();
}

client().admin().indices().prepareRefresh("source").get();
assertHitCount(client().prepareSearch("source").setSize(0).get(), max);

// Copy a subset of the docs sorted
int subsetSize = randomIntBetween(1, max - 1);
ReindexRequestBuilder copy = new ReindexRequestBuilder(client(), ReindexAction.INSTANCE)
.source("source").destination("dest").refresh(true);
copy.maxDocs(subsetSize);
copy.request().addSortField("foo", SortOrder.DESC);
assertThat(copy.get(), matcher().created(subsetSize));

assertHitCount(client().prepareSearch("dest").setSize(0).get(), subsetSize);
assertHitCount(client().prepareSearch("dest").setQuery(new RangeQueryBuilder("foo").gte(0).lt(max-subsetSize)).get(), 0);
assertWarnings(Reindexer.SORT_DEPRECATED_MESSAGE);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,10 @@ public ReindexRequest setSourceQuery(QueryBuilder queryBuilder) {
*
* @param name The name of the field to sort by
* @param order The order in which to sort
* @deprecated Specifying a sort field for reindex is deprecated. If using this in combination with maxDocs, consider using a
* query filter instead.
*/
@Deprecated
public ReindexRequest addSortField(String name, SortOrder order) {
this.getSearchRequest().source().sort(name, order);
return this;
Expand Down

0 comments on commit 688f2d0

Please sign in to comment.