Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-Indexing fails on big data #4258

Open
henning-gerhardt opened this issue Mar 6, 2021 · 6 comments · Fixed by #4546
Open

Re-Indexing fails on big data #4258

henning-gerhardt opened this issue Mar 6, 2021 · 6 comments · Fixed by #4546
Assignees
Labels
bug search search, filter

Comments

@henning-gerhardt
Copy link
Collaborator

henning-gerhardt commented Mar 6, 2021

If you must re-index a large dataset f.e. 430.000 processes and did this by a single click on "Start indexing for all" (f.e. processes) then an error page appears and the following error will be logged

org.elasticsearch.client.ResponseException: POST http://internal_elastic_search:9200/kitodo/process/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&search_type=query_then_fetch&batched_reduce_size=512: HTTP/1.1 500 Internal Server Error
{"error":{"root_cause":[{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"kitodo","node":"TpUslxONQ-OkyGO4s_8nnw","reason":{"type":"query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}]},"status":500}

at org.elasticsearch.client.RestClient$1.completed(RestClient.java:354)
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:343)
at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)

Looks like than all data want be accessed instead of accessing this data in batch steps like the configured value in kitodo_config.properties.

@matthias-ronge
Copy link
Collaborator

I see this problem still happening on a customer server with a version built on a recent (March 2022) version of the master.

Stacktrace:

[ERROR] 2022-04-04 09:41:55.196 [IndexAllThread] IndexingService - Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
        at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
        at org.kitodo.data.elasticsearch.search.SearchRestClient.getDocument(SearchRestClient.java:186) ~[kitodo-data-management-3.4.1-SNAPSHOT.jar:?]
        at org.kitodo.data.elasticsearch.search.Searcher.findDocuments(Searcher.java:204) ~[kitodo-data-management-3.4.1-SNAPSHOT.jar:?]
        at org.kitodo.production.services.data.base.SearchService.findAllDocuments(SearchService.java:436) ~[classes/:3.4.1-SNAPSHOT]
        at org.kitodo.production.services.data.base.SearchService.findAllIDs(SearchService.java:175) ~[classes/:3.4.1-SNAPSHOT]
        at org.kitodo.production.services.index.IndexingService.startIndexing(IndexingService.java:262) ~[classes/:3.4.1-SNAPSHOT]
        at org.kitodo.production.services.index.IndexAllThread.run(IndexAllThread.java:37) [classes/:3.4.1-SNAPSHOT]
        Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/kitodo_process/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allo
w_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"kitodo_process","node":"ObYy_f-yQgKawX4MsISb4w","reason":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}},"status":400}
                at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:326) ~[elasticsearch-rest-client-7.10.2.jar:7.10.2]
                at org.elasticsearch.client.RestClient.performRequest(RestClient.java:296) ~[elasticsearch-rest-client-7.10.2.jar:7.10.2]
                at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270) ~[elasticsearch-rest-client-7.10.2.jar:7.10.2]
                at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1632) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
                at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
                at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
                at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088) ~[elasticsearch-rest-high-level-client-7.10.2.jar:7.10.2]
                at org.kitodo.data.elasticsearch.search.SearchRestClient.getDocument(SearchRestClient.java:186) ~[kitodo-data-management-3.4.1-SNAPSHOT.jar:?]
                at org.kitodo.data.elasticsearch.search.Searcher.findDocuments(Searcher.java:204) ~[kitodo-data-management-3.4.1-SNAPSHOT.jar:?]
                at org.kitodo.production.services.data.base.SearchService.findAllDocuments(SearchService.java:436) ~[classes/:3.4.1-SNAPSHOT]
                at org.kitodo.production.services.data.base.SearchService.findAllIDs(SearchService.java:175) ~[classes/:3.4.1-SNAPSHOT]
                at org.kitodo.production.services.index.IndexingService.startIndexing(IndexingService.java:262) ~[classes/:3.4.1-SNAPSHOT]
                at org.kitodo.production.services.index.IndexAllThread.run(IndexAllThread.java:37) [classes/:3.4.1-SNAPSHOT]
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179) ~[elasticsearch-7.10.2.jar:7.10.2]
        ... 12 more
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:179) ~[elasticsearch-7.10.2.jar:7.10.2]
        ... 12 more
[ERROR] 2022-04-04 09:41:55.198 [IndexAllThread] Helper - ElasticsearchStatusException / ElasticsearchException / ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [15000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]

@matthias-ronge matthias-ronge reopened this Apr 4, 2022
@matthias-ronge
Copy link
Collaborator

I think the issue occurs simply because there are more than 10,000 processes on the system, which is an internal limit:

Image

@henning-gerhardt
Copy link
Collaborator Author

I can confirm this but I did not reopened this issue as the hibernate-search integration should fix this and many other indexing issues (which are not all documented as an issue).

@matthias-ronge
Copy link
Collaborator

I'm leaving it open because it's a bug, so at least we know there's an unsolved problem.

@Kathrin-Huber
Copy link
Contributor

This should be fixed.
@henning-gerhardt can you confirm this?

@henning-gerhardt
Copy link
Collaborator Author

@Kathrin-Huber : I did not test this in the last few months (last time March / April 2022) and at this point this was still an issue. So far as I understand the main issue at this is that a the wrong search API is used. Used search API is limited to 10.000 or 15.000 results. If you want more results you must use a other search API. I don't even know why a search to elastic search is executed in this place as this is - in my opinion and with my knowledge - not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug search search, filter
Development

Successfully merging a pull request may close this issue.

4 participants