Rollover improvement #26092

consulthys · 2017-08-08T07:46:22Z

The addition of the Rollover API has greatly simplified the management of time-based indices. Yet, the _rollover endpoint still needs to be called in a recurring manner, either manually or via cron-based tool such as Curator.

However, it would be much simpler if the rollover process was implicit and managed internally. The idea would be to specify the rollover conditions in the alias at index creation time (or equivalently in an index template), basically like this:

# Today is 2017/08/08, so the created index is named logs-2017.08.08-1
PUT /<logs-{now/d}-1>
{
    "mappings": {...},
    "aliases" : {
        "logs-search" : {},
        "logs-write" : {
            "rollover" : {
                "conditions": {
                    "max_age":   "1d",
                    "max_docs":  100000
                }
            }
        }
    }
}

Then when indexing new documents through the logs_write alias (provided that alias only points to a single index, of course), it would know whether a new index needs to be created or not based on the max_age and max_docs conditions. There wouldn't be any need to call _rollover explicitly.

# the same day no index would be created
PUT logs-write/log/253
{ "log": "Some log" }
=> indexed in logs-2017.08.08-1

# the next day, a new index would be created automatically without having to call the rollover API
PUT logs-write/log/6452
{ "log": "Some new log" }
=> indexed in logs-2017.08.09-000002

Would there be any downside to the proposed improvement? Unless there is too much overhead work involved in the rollover process in order to determine whether or not a new index needs to be created, of course.

The text was updated successfully, but these errors were encountered:

cbuescher · 2017-08-08T10:58:17Z

Hi @consulthys, thanks for the feedback. I'm not too familiar with the cost of rolling over and index, but it could potentially affect other index/search operations on the cluster so my guess is its a better option to not make this an automatic thing. I will mark this feature suggestion as a discussion topic so others can chime in.

consulthys · 2017-08-08T11:27:58Z

Thanks for your feedback @cbuescher. If there is some overhead, it would of course be a "performance vs simplicity" tradeoff that the user can decide to accept or not. If no rollover setting is present in the alias at index creation time, then the behavior would be the same as now, of course.

s1monw · 2017-08-08T12:16:45Z

@consulthys the main issue here is not performance but rather the missing back-channel. What do you do if there is an error? How do you communicate this to the user? We can continuously log stuff to disk but there is quite some stuff involved when we are rolling over an index. Having a request response model is quite appealing for testing and error reporting.

I spent quite some time thinking about this an I wonder if we can improve that down the road with support for response headers that tell you that you need to call the _rollover API? This way we still leave it to the user to call the right API but prevent cron-job like systems. WDYT?

consulthys · 2017-08-08T12:39:01Z

Thanks for your insights @s1monw, much appreciated! I understand the challenge now.

So the problem would basically boil down to a rollover not happening (even though it should based on the rollover conditions) because of some underlying issue that needs to be communicated to the client. Response headers, as you suggest, could be an idea, but how about adding another rollover section inside the index call response with pretty much the same info you get when calling _rollover + an error section with details about the cause.

The presence of the rollover section would indicate to the user that a rollover happened. In addition, if the rollover.error section also occurs, this could hint the user that something wrong happened and needs to be handled somehow. The index call would still work, though, and the document would still be indexed in old_index.

# this should trigger a rollover, but doesn't because of some underlying issue
PUT logs-write/log/6452
{ "log": "Some new log" }
=> still indexed in logs-2017.08.08-1 instead of logs-2017.08.09-000002 as expected

# the index call response could look like this
{
  "_index": "logs-2017.08.08-1",
  "_type": "log",
  "_id": "6452",
  "_version": 1,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "created": true,
  "rollover": {
    "old_index": "logs-2017.08.08-1",
    "new_index": "logs-2017.08.09-000002",
    "rolled_over": false, 
    "dry_run": false, 
    "conditions": { 
      "[max_age: 1d]": true,
      "[max_docs: 100000]": false
    },
    "error" : {
      "root_cause" : [
        {
          "type" : "rollover_exception",
          "reason" : "Failed to rollover index [logs-2017.08.08-1] to [logs-2017.08.09-000002]"
        }
      ],
      "type" : "rollover_exception",
      "reason" : "Failed to rollover index [logs-2017.08.08-1] to [logs-2017.08.09-000002]",
      "caused_by" : {
        "type" : "some_exception",
        "reason" : "Error description"
      }
    }
  }
}

s1monw · 2017-08-08T13:09:58Z

yeah so from an API perspective I think it would be nice to not mix the two together. But I have no good answer for this. I think there is room for improvement and I wonder what @clintongormley thinks about this. I will mull on that a bit longer..

lhoss · 2017-12-22T10:53:24Z

+1 to have a configurable regular rollover builtin ES, so one does not rely on such a feature on the logging solution (like graylog)
Actually Graylog's Index Retention strategies could serve as a guide.

gwbrown · 2019-06-26T20:11:04Z

We have effectively implemented this functionality in the form of Index Lifecycle Management (ILM) so I'm going to close this issue - please comment here if there's some functionality described in this issue that I've missed that isn't available in ILM.

cbuescher added :Rollover discuss >feature labels Aug 8, 2017

s1monw removed the discuss label Aug 18, 2017

clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Rollover labels Feb 13, 2018

pavolloffay mentioned this issue Nov 26, 2018

Support archiving traces with ES storage jaegertracing/jaeger#818

Closed

gwbrown closed this as completed Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollover improvement #26092

Rollover improvement #26092

consulthys commented Aug 8, 2017

cbuescher commented Aug 8, 2017

consulthys commented Aug 8, 2017

s1monw commented Aug 8, 2017

consulthys commented Aug 8, 2017 •

edited

Loading

s1monw commented Aug 8, 2017

lhoss commented Dec 22, 2017

gwbrown commented Jun 26, 2019

Rollover improvement #26092

Rollover improvement #26092

Comments

consulthys commented Aug 8, 2017

cbuescher commented Aug 8, 2017

consulthys commented Aug 8, 2017

s1monw commented Aug 8, 2017

consulthys commented Aug 8, 2017 • edited Loading

s1monw commented Aug 8, 2017

lhoss commented Dec 22, 2017

gwbrown commented Jun 26, 2019

consulthys commented Aug 8, 2017 •

edited

Loading