Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new require_alias option to indexing requests #58917

Merged
merged 18 commits into from
Jul 17, 2020

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Jul 2, 2020

This commit adds the require_alias flag to requests that create new documents.

This flag, when true prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or false, the behavior defaults to the action.auto_create_index settings.

This is useful when an alias is required instead of a concrete index.

closes #55267

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Features)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jul 2, 2020
@benwtrent
Copy link
Member Author

//CC
@elastic/ml-core

assertThat(autoCreateIndex.shouldAutoCreate(randomAlphaOfLengthBetween(1, 10), false, buildClusterState()), equalTo(true));
}

public void testAutoCreationEnabledWithNoAutoCreateFlagSet() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the defaulting of request-level setting to index-level setting happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your question.

What is the default of the setting? Or how are the settings adjusted for this test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is more: where is the code that checks the value of the index-level setting should the request-level setting be false.

@dakrone dakrone self-requested a review July 2, 2020 14:48
@dakrone
Copy link
Member

dakrone commented Jul 2, 2020

@benwtrent I think we should avoid putting booleans in names and ending up with double-negatives, so for example no_auto_create=false is a double negative that a user has to resolve ("no" & "false"). The original issue (#55267) had index_auto_create which is easier to resolve than the double negative.

I think I would much prefer either auto_create_index=true|false or index_auto_create=true|false. What do you think?

@benwtrent
Copy link
Member Author

@dakrone

Having a positive auto create flag might cause a security problem.

What if the auto create settings for an index pattern prevent autocreation? Then a user provides a 'auto_create_index=true'?

Is there a better way to provide only a restriction in autocreation policy through a flag?

@dakrone
Copy link
Member

dakrone commented Jul 2, 2020

Having a positive auto create flag might cause a security problem.

I'm not sure what security problem you're talking about, if we flipped it the default would be true and a user (like ML) could set it to false?

What if the auto create settings for an index pattern prevent autocreation? Then a user provides a 'auto_create_index=true'?
Is there a better way to provide only a restriction in autocreation policy through a flag?

What about flipping it so that it's not a double negative then, and a bit clearer: allow_index_auto_create=true|false?

I think regardless, even having a parameter is overriding the index.auto_create setting, I would just prefer that the parameter not use double negatives.

@droberts195
Copy link
Contributor

I would just prefer that the parameter not use double negatives

I agree.

I'm not sure what security problem you're talking about

I think the reasoning here was that if the parameter was positive then if would give the impression that you could override a cluster-wide action.auto_create_index setting of false with a per-request setting of true. In other words, the administrator has set action.auto_create_index: false, then one user decides to set auto_create_index=true on their requests and disregards the administrator's decision. Of course, that's not how the PR is implemented - you can only override one way - so it's not actually a security problem, but more of a problem that the name suggests you can do something that you can't.

One way to avoid the double negative without creating this confusion might be to call the parameter require_index_exists. The underlying implementation of the PR could stay the same apart from flipping the boolean test everywhere, but this name wouldn't suggest that you could override action.auto_create_index: false to true per request.

Another option would be to call the parameter alias_required or require_alias. It seems to me that the primary reason you wouldn't want to auto-create an index is because you expect the name you're accessing it through to be an alias and it's a massive pain to clean up if that alias doesn't exist and you create a concrete index in its place. Having an alias_required option was part of the alias templates proposal that got canned in favour of data streams. Obviously switching to requiring an alias would change the implementation of this PR, although not that much, because the check could still be done in AutoCreateIndex.shouldAutoCreate. But I think it would solve the underlying problem in the use cases I can think of for require_index_exists. It would have the extra benefit that we fail fast in cases where a misguided user has manually created a concrete index that will cause problems later because other code assumes a particular name is an alias.

Since we wouldn't want both these options and it's hard to remove them once they're in I think it's worth more people reviewing which one to add, and the exact name of the argument.

@martijnvg
Copy link
Member

martijnvg commented Jul 3, 2020

Another option would be to call the parameter alias_required or require_alias.

I'm in favour of naming the parameter along these lines, so that it clear that use case for this new parameter is to prevent accidental creation a concrete index, because an alias should exist with that same name. Also this should avoid users needing think about whether they need to use this parameter if they are using data streams.

@benwtrent
Copy link
Member Author

If it were alias_required, would we fail if a concrete index exists of the same name already?

@droberts195
Copy link
Contributor

If it were alias_required, would we fail if a concrete index exists of the same name already?

Yes.

Before you make that change, let's wait and see what the Kibana team say about whether it's ever valid in recent versions to have a concrete .kibana index rather than .kibana being an alias. If having a concrete .kibana index is possible in 7.9 then switching to alias_required would make this feature useless for them, so what to do would need further discussion. (I think for ML failing if a concrete index exists of the same name already would be good.)

/cc @kobelb since you were on the email thread

@dakrone
Copy link
Member

dakrone commented Jul 6, 2020

+1 for require_alias

@kobelb
Copy link
Contributor

kobelb commented Jul 6, 2020

Before you make that change, let's wait and see what the Kibana team say about whether it's ever valid in recent versions to have a concrete .kibana index rather than .kibana being an alias

As far as I'm aware, Kibana doesn't use a concrete .kibana index in 7.9. Kibana does support upgrading from an index named .kibana to a .kibana alias pointing to a .kibana_1 index in 7.9; however, Kibana won't try to write any documents to the .kibana index in this scenario. @rudolf do you mind keeping me honest on this?

However, I do see merit in allowing auto_create_index, or whatever we decide to name this setting, to also work against indices and not just aliases. Prior to Kibana switching to using aliases, we occasionally hit a race condition where an ES call to write a document would occur prior to the call to create the index with the explicit mappings. This occasionally led to a document dynamically creating mappings that were incompatible with the mappings which were explicitly specified. The ability to specify auto_create_index: false on every document write would have been helpful in preventing this.

@benwtrent
Copy link
Member Author

It could be require_created, where it is either an alias OR an index, but that the destination is already created.

This fulfills @kobelb use case as well.

But it does seem very rare for a concrete index to be created with specific mappings but NOT through templates. Templates already satisfy this type of scenario. Unless there is a strange race condition between the template being instantiated and the index write occurring.

I am leaning towards require_alias to cover the most cases with the least confusion + overlap with other features.

Please let me know @droberts195 @dakrone @kobelb @martijnvg if this is good :). I will get cracking on it.

@kobelb
Copy link
Contributor

kobelb commented Jul 6, 2020

But it does seem very rare for a concrete index to be created with specific mappings but NOT through templates. Templates already satisfy this type of scenario. Unless there is a strange race condition between the template being instantiated and the index write occurring.

Kibana isn't using index templates because the mappings can change between versions of Kibana and therefore between different .kibana_n indices. It's possible that Kibana's usage of Elasticsearch indices is an exception compared to the norm, and Kibana still benefits from the proposal that only applies to aliases. I ultimately defer to the wisdom of those more experienced with the common usages of Elasticsearch.

@rudolf
Copy link
Contributor

rudolf commented Jul 7, 2020

As far as I'm aware, Kibana doesn't use a concrete .kibana index in 7.9. Kibana does support upgrading from an index named .kibana to a .kibana alias pointing to a .kibana_1 index in 7.9; however, Kibana won't try to write any documents to the .kibana index in this scenario. @rudolf do you mind keeping me honest on this?

Correct. require_alias would have helped prevent some hard to recover from Kibana bugs and would be useful as a short term mitigation. In the future when .kibana will be moved to a system index we will probably set auto create index to false.

I have limited experience in using Elasticsearch outside of Kibana, but the auto create index always struck me as unhelpful magic. It removes one step when you're getting started and playing with ES but can cause a lot of pain later on. Just out of curiosity, do we know of valid use cases for this feature?

@benwtrent
Copy link
Member Author

Just out of curiosity, do we know of valid use cases for this feature?

Assuming you mean index auto creation.

Removing a single step and (pretty complex) concept early on is an extremely good use case to begin with.

But, another use case is streaming data and auto creating new indices from index templates (see ILM).

Also, it allows administrators to allow their users to create their own indices, but without access to any of the management APIs.

@droberts195
Copy link
Contributor

I am leaning towards require_alias to cover the most cases with the least confusion + overlap with other features.

I am happy with require_alias too. It sounds like that also adds some value for Kibana.

So please switch this PR over to implement require_alias @benwtrent.

@droberts195 droberts195 changed the title Adding new no_auto_create to indexing requests Adding new require_alias option to indexing requests Jul 7, 2020
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @benwtrent!

@benwtrent benwtrent added v7.10.0 and removed v7.9.0 labels Jul 16, 2020
@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent benwtrent merged commit f72b893 into elastic:master Jul 17, 2020
@benwtrent benwtrent deleted the feature/add-no_autocreate-flag branch July 17, 2020 12:45
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jul 17, 2020
This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes elastic#55267
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jul 17, 2020
benwtrent added a commit that referenced this pull request Jul 17, 2020
benwtrent added a commit that referenced this pull request Jul 17, 2020
…#59769)

* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes #55267
@jaymode jaymode mentioned this pull request Aug 25, 2020
23 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add index_auto_create flag per index request
9 participants