-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexing - high rate of index access while indexing #5179
Comments
Is this the same behavior you describe in #4726, or, what is the difference in this issue? |
Yes, it's sounds familiar. As I opened this and the others issues I wrote them down from an internal list. I did not checked if there is already an older issue which describe the same or similar issue. |
Changing the polling interval is a single line change, see indexing.xhtml. Btw, I didn't observe any performance difference for my test data. Both 1s refresh, 5s refresh and 60s refresh intervals appear to have the same CPU usage and indexing time on my system. What would be a better refresh interval? |
Refreshing the index page - which is more or less done even in automatic refreshing while indexing - is creating 1.345 lines in the application proxy log:
Asking for the state of every search index is fine but this is wrong. If you are interested into fixing this then maybe set up a application proxy in front of the elastic search to get similiar entries. I don't know how complicated is it to add a select list with refresh entries like "none", "1 second", "5 second", "10 second" and "60 second"? By default a refresh every 10 second should be okay and if anyone need a faster update intervall he/she can choose one. |
I see the problem with the log. The counts (how many objects are already indexed) should be cached instead of re-calculated multiple times when rendering and refreshing the System/Indexing page. Unfortunately, this is not a single line change. Yes, the polling interval could be changed live (during indexing) via Javascript with a drop-down select menu or something similar. |
I already expected that this issue is not so easy to fix but this was the reason why I opened the issue.
This would be fine and so the user can choose which refresh interval he/she needed. |
preface:
The following issue may only occur if you indexing a few ten thousand to million entries and / or adjust the elastic search configuration entries in
kitodo_config.properties
not correct or not fitting to your amount of data.description:
If an indexing is running there is a very high rate accessing the current state of the index - like 10 per second - to refresh the current indexing process. If your elastic search service is behind an application proxy (for restricting access and / or transfer the data in a secure way) a lot of log data is generated only through this refreshing access.
To prevent this you must refresh the indexing browser tab. This is stopping the high access rate but the automatic refresh of the index process too but refreshing in a manual way did not hurt if you must wait a few hours on indexing a lot of data.
goal:
Lowering the refresh rate for index process to an useful and normal rate.
The text was updated successfully, but these errors were encountered: