Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollover improvement #26092

Closed
consulthys opened this issue Aug 8, 2017 · 7 comments
Closed

Rollover improvement #26092

consulthys opened this issue Aug 8, 2017 · 7 comments
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >feature

Comments

@consulthys
Copy link
Contributor

The addition of the Rollover API has greatly simplified the management of time-based indices. Yet, the _rollover endpoint still needs to be called in a recurring manner, either manually or via cron-based tool such as Curator.

However, it would be much simpler if the rollover process was implicit and managed internally. The idea would be to specify the rollover conditions in the alias at index creation time (or equivalently in an index template), basically like this:

# Today is 2017/08/08, so the created index is named logs-2017.08.08-1
PUT /<logs-{now/d}-1>
{
    "mappings": {...},
    "aliases" : {
        "logs-search" : {},
        "logs-write" : {
            "rollover" : {
                "conditions": {
                    "max_age":   "1d",
                    "max_docs":  100000
                }
            }
        }
    }
}

Then when indexing new documents through the logs_write alias (provided that alias only points to a single index, of course), it would know whether a new index needs to be created or not based on the max_age and max_docs conditions. There wouldn't be any need to call _rollover explicitly.

# the same day no index would be created
PUT logs-write/log/253
{ "log": "Some log" }
=> indexed in logs-2017.08.08-1

# the next day, a new index would be created automatically without having to call the rollover API
PUT logs-write/log/6452
{ "log": "Some new log" }
=> indexed in logs-2017.08.09-000002

Would there be any downside to the proposed improvement? Unless there is too much overhead work involved in the rollover process in order to determine whether or not a new index needs to be created, of course.

@cbuescher
Copy link
Member

Hi @consulthys, thanks for the feedback. I'm not too familiar with the cost of rolling over and index, but it could potentially affect other index/search operations on the cluster so my guess is its a better option to not make this an automatic thing. I will mark this feature suggestion as a discussion topic so others can chime in.

@consulthys
Copy link
Contributor Author

Thanks for your feedback @cbuescher. If there is some overhead, it would of course be a "performance vs simplicity" tradeoff that the user can decide to accept or not. If no rollover setting is present in the alias at index creation time, then the behavior would be the same as now, of course.

@s1monw
Copy link
Contributor

s1monw commented Aug 8, 2017

@consulthys the main issue here is not performance but rather the missing back-channel. What do you do if there is an error? How do you communicate this to the user? We can continuously log stuff to disk but there is quite some stuff involved when we are rolling over an index. Having a request response model is quite appealing for testing and error reporting.

I spent quite some time thinking about this an I wonder if we can improve that down the road with support for response headers that tell you that you need to call the _rollover API? This way we still leave it to the user to call the right API but prevent cron-job like systems. WDYT?

@consulthys
Copy link
Contributor Author

consulthys commented Aug 8, 2017

Thanks for your insights @s1monw, much appreciated! I understand the challenge now.

So the problem would basically boil down to a rollover not happening (even though it should based on the rollover conditions) because of some underlying issue that needs to be communicated to the client. Response headers, as you suggest, could be an idea, but how about adding another rollover section inside the index call response with pretty much the same info you get when calling _rollover + an error section with details about the cause.

The presence of the rollover section would indicate to the user that a rollover happened. In addition, if the rollover.error section also occurs, this could hint the user that something wrong happened and needs to be handled somehow. The index call would still work, though, and the document would still be indexed in old_index.

# this should trigger a rollover, but doesn't because of some underlying issue
PUT logs-write/log/6452
{ "log": "Some new log" }
=> still indexed in logs-2017.08.08-1 instead of logs-2017.08.09-000002 as expected

# the index call response could look like this
{
  "_index": "logs-2017.08.08-1",
  "_type": "log",
  "_id": "6452",
  "_version": 1,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "created": true,
  "rollover": {
    "old_index": "logs-2017.08.08-1",
    "new_index": "logs-2017.08.09-000002",
    "rolled_over": false, 
    "dry_run": false, 
    "conditions": { 
      "[max_age: 1d]": true,
      "[max_docs: 100000]": false
    },
    "error" : {
      "root_cause" : [
        {
          "type" : "rollover_exception",
          "reason" : "Failed to rollover index [logs-2017.08.08-1] to [logs-2017.08.09-000002]"
        }
      ],
      "type" : "rollover_exception",
      "reason" : "Failed to rollover index [logs-2017.08.08-1] to [logs-2017.08.09-000002]",
      "caused_by" : {
        "type" : "some_exception",
        "reason" : "Error description"
      }
    }
  }
}

@s1monw
Copy link
Contributor

s1monw commented Aug 8, 2017

yeah so from an API perspective I think it would be nice to not mix the two together. But I have no good answer for this. I think there is room for improvement and I wonder what @clintongormley thinks about this. I will mull on that a bit longer..

@s1monw s1monw removed the discuss label Aug 18, 2017
@lhoss
Copy link

lhoss commented Dec 22, 2017

+1 to have a configurable regular rollover builtin ES, so one does not rely on such a feature on the logging solution (like graylog)
Actually Graylog's Index Retention strategies could serve as a guide.

@gwbrown
Copy link
Contributor

gwbrown commented Jun 26, 2019

We have effectively implemented this functionality in the form of Index Lifecycle Management (ILM) so I'm going to close this issue - please comment here if there's some functionality described in this issue that I've missed that isn't available in ILM.

@gwbrown gwbrown closed this as completed Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >feature
Projects
None yet
Development

No branches or pull requests

6 participants