Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize snapshot indices parsing so that combinations of included and excluded indices are treated the same regardless of the order they are listed in #5626

Merged
merged 14 commits into from
Jan 3, 2023

Conversation

stephen-crawford
Copy link
Contributor

@stephen-crawford stephen-crawford commented Dec 23, 2022

Description

Adds a custom comparator-based sort call before parsing indices using the SnapshotUtils class. Previously, the lack of sort meant that mixing negated and normal indices in the "indices" field of the REST calls would create different behavior based on the ordering of the indices. The added sorting logic corrects this by always parsing negated indices after all normal indices (during these situations, the indices parsing had been acting as expected).

In addition, the sorting logic includes a removal of empty strings from the index list since these strings would break the comparator's sort.

Three new tests were added to the SnapshotUtilsTests class in order to make sure that the ordering of the negative and normal indices did not change the behavior (and that it was the expected behavior).

The new behavior is as follows:

  1. (Querying a single negated snapshot): If the body of the request reads indices: "-bar", the query will return all indices which are not "bar".
  2. (Querying a single negated snapshot wildcard): If the body of the request reads indices: "-bar*", the query will return all indices which are not a subset of "bar*".
  3. (Querying multiple negated snapshots): If the body of the request reads indices: "-bar, -baz", the query will return all indices which are not "bar" or `"baz".
  4. (Querying multiple negated snapshot wildcards): If the body of the request reads indices: "-bar*, -baz*", the query will return all indices which are not a subset of "bar*" or a subset of `"baz*".
  5. (Querying a single snapshot): If the body of the request reads indices: "bar", the query will return "bar".
  6. (Querying multiple snapshot wildcards): If the body of the request reads indices: "bar*, baz*", the query will return all indices which are a subset of "bar*" or a subset of `"baz*".
  7. (Querying multiple mixed snapshot wildcards): If the body of the request reads indices: "-bar*, ba*", the query will return all indices which are a subset of "ba*" but not a subset of "bar*". This was not the case previously.
  8. (Querying multiple mixed snapshot wildcards): If the body of the request reads indices: "ba*, -bar*", the query will return all indices which are a subset of "ba*" but not a subset of "bar*".

Now behavior is persevered irregardless of the order of positive and negated wildcards queries.

Issues Resolved

Resolves issue
opensearch-project/security#1652

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

stephen-crawford and others added 5 commits December 22, 2022 17:10
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@stephen-crawford
Copy link
Contributor Author

Hi @andrross, thank you for reviewing. To the best of my knowledge the only discussion of the query syntax is found here. It says "You can use , to create a list of indices, * to specify an index pattern, and - to exclude certain indices. Don’t put spaces between items. Default is all indices."

I have not read any other documentation that mentions the use of the special characters. I cannot speak for everyone using OpenSearch but given this issue, it seems like at least a fair number of users expect the behavior to be as this issue would make it. I also think that in general, unless we added documentation and an explanation for why this behavior exists, a change is necessary for user experience etc.

@andrross
Copy link
Member

@scrawfor99 We're going to have to be super clear about the behavior change here, and then make a judgement call as to whether this is a bug fix versus a breaking change that must be deferred to the next major version. Can you clearly enumerate how this changes the behavior of the API in the commit message and PR description? Something similar to this comment is what I'm thinking.

@stephen-crawford
Copy link
Contributor Author

Hi @andrross, thank you for your suggestion. I went ahead and added documentation into the body like you requested. Let me know if you need anything else.

Happy Holidays!

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scrawfor99! So the upshot here is that any call that started with one or more exclusion patterns and also contained explicit inclusion patterns would in effect ignore those inclusions and instead include everything. Is that right? Assuming that is the only behavior change then I do think that is a bug and something worth fixing in a minor version.

@@ -69,6 +69,20 @@ public static List<String> filterIndices(List<String> availableIndices, String[]
if (IndexNameExpressionResolver.isAllIndices(Arrays.asList(selectedIndices))) {
return availableIndices;
}

selectedIndices = Arrays.stream(selectedIndices).filter(s -> !s.isEmpty()).toArray(a -> new String[a]); // Remove all empty strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to rewrite the logic in this method because it is very difficult to follow, but I'm going to resist that impulse for now. I am a bit concerned about possible implications of removing empty strings with the change here. I'm inclined to not sort the list, and instead just move the exclusions to the end. What do you think? That can be done with something like:

// Move the exclusions to end of list to ensure they are processed
// after explicitly selected indices are chosen.
final List<String> excludesAtEndSelectedIndices = Stream.concat(
    Arrays.stream(selectedIndices)
        .filter(s -> s.isEmpty() || s.charAt(0) != '-'),
    Arrays.stream(selectedIndices)
        .filter(s -> !s.isEmpty() && s.charAt(0) == '-'))
    .collect(Collectors.toUnmodifiableList());

Then just tweak the logic below to use this list instead of the selectedIndices array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think that this is a better option that seems fine. It should save a bit of time on larger indices lists (though the indices list probably won't be large enough that this is not negligible). I can swap this over :)

As far as the code logic, I only just started looking at this class and its methods but if you feel there is a way it can be improved that you would like to be done just let me know and I will rewrite it. I have just been trying to make the targeted changes with as little modification as possible so as to be less likely to break anything but I agree that the code could be a bit easier to follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just been trying to make the targeted changes with as little modification as possible

This is definitely the right instinct and why we probably shouldn't completely rewrite this :)

CHANGELOG.md Outdated Show resolved Hide resolved
@stephen-crawford stephen-crawford changed the title Fix index parsing in SnapshotUtils to allow correct reading of Snapshot Restore calls Standardize snapshot indices parsing so that combination of included and excluded indices are treated the same irregardless of the order they are listed in Dec 30, 2022
@stephen-crawford stephen-crawford changed the title Standardize snapshot indices parsing so that combination of included and excluded indices are treated the same irregardless of the order they are listed in Standardize snapshot indices parsing so that combinations of included and excluded indices are treated the same regardless of the order they are listed in Dec 30, 2022
stephen-crawford and others added 2 commits December 30, 2022 09:46
… and excluded indices are treated the same regardless of the order they are listed in

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:


// Move the exclusions to end of list to ensure they are processed
// after explicitly selected indices are chosen.
final List<String> excludesAtEndSelectedIndices = Stream.concat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to replace every usage of selectedIndices below with excludesAtEndSelectedIndices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops--clearly I had vacation brain hah.

CHANGELOG.md Outdated
@@ -79,6 +78,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Remove LegacyESVersion.V_7_6_ and V_7_7_ Constants ([#4837](https://github.com/opensearch-project/OpenSearch/pull/4837))
- Remove LegacyESVersion.V_7_10_ Constants ([#5018](https://github.com/opensearch-project/OpenSearch/pull/5018))
- Remove Version.V_1_ Constants ([#5021](https://github.com/opensearch-project/OpenSearch/pull/5021))
- Remove --enable-preview feature flag since Apache Lucene now patches class files ([#5642](https://github.com/opensearch-project/OpenSearch/pull/5642))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, must have grabbed it from an update pull.

CHANGELOG.md Outdated
@@ -61,7 +60,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Changed http code on create index API with bad input raising NotXContentException from 500 to 400 ([#4773](https://github.com/opensearch-project/OpenSearch/pull/4773))
- Change http code for DecommissioningFailedException from 500 to 400 ([#5283](https://github.com/opensearch-project/OpenSearch/pull/5283))
- Pre conditions check before updating weighted routing metadata ([#4955](https://github.com/opensearch-project/OpenSearch/pull/4955))
- Remove --enable-preview feature flag since Apache Lucene now patches class files ([#5642](https://github.com/opensearch-project/OpenSearch/pull/5642))
- Standardize snapshot indices parsing so that combinations of included and excluded indices are treated the same regardless of the order they are listed in ([#5626](https://github.com/opensearch-project/OpenSearch/pull/5626))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that a user can scan this list and quickly determine whether a change might impact them. It should also be as concise as possible, which can obviously be a challenge. At a minimum, it should list what changed from a user's perspective, which in this case I think is the snapshot restore and clone APIs. What do you think of:

"Fix index exclusion behavior in snapshot restore and clone APIs"

Users that don't use those APIs or don't use exclusions can quickly identify this change as not impacting them, otherwise they'll need to dig into the specifics of the linked issue to get more info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe we should backport this change to 2.x, so the changelog entry should go in the "Unreleased 2.x" section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I had thought that you had requested that the CHANGELOG entry be swapped to the new description alongside the PR title. My misunderstanding. I agree that the new option is much easier to quickly read over.

I will move it to the Unreleased section as well.

@@ -69,6 +69,20 @@ public static List<String> filterIndices(List<String> availableIndices, String[]
if (IndexNameExpressionResolver.isAllIndices(Arrays.asList(selectedIndices))) {
return availableIndices;
}

selectedIndices = Arrays.stream(selectedIndices).filter(s -> !s.isEmpty()).toArray(a -> new String[a]); // Remove all empty strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just been trying to make the targeted changes with as little modification as possible

This is definitely the right instinct and why we probably shouldn't completely rewrite this :)

@andrross
Copy link
Member

@reta @dblock

Just want to raise this issue as I believe we should backport this as a bug fix, though it does change the behavior of the snapshot restore (and clone) APIs.

The upshot here is that, given indexes "foo", "bar", "baz" in a snapshot, and the user provides the pattern "-bar*, ba*" in the restore API, the current behavior results in restoring "foo" and "baz". This is a bug. The correct behavior is to restore only "baz" (select indexes that match "ba*" and exclude those that match "bar*").

There is definitely a chance that some users are (intentionally or not) relying on this buggy behavior and the fix will break them. I think we should backport this as it is clearly wrong. What do you think?

@dblock
Copy link
Member

dblock commented Jan 2, 2023

Looks like a bug fix to me, so yes to backport. When users accidentally or purposely use behavior that is a side effect of a bug, it's still a bug, and not a breaking change. I also think we should release note it well.

Signed-off-by: Stephen Crawford <steecraw@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

Gradle Check (Jenkins) Run Completed with:

@stephen-crawford
Copy link
Contributor Author

@andrross should be all set!

@andrross andrross merged commit 57d4485 into opensearch-project:main Jan 3, 2023
@andrross andrross added the backport 2.x Backport to 2.x branch label Jan 3, 2023
@andrross
Copy link
Member

andrross commented Jan 3, 2023

Thanks @scrawfor99, nice work!

@cwperks
Copy link
Member

cwperks commented Jan 5, 2023

Great work @scrawfor99 and thank you for following this through! 🎸

@stephen-crawford stephen-crawford deleted the restore-snapshot branch March 29, 2023 15:16
@stephen-crawford stephen-crawford restored the restore-snapshot branch March 29, 2023 15:16
@stephen-crawford stephen-crawford deleted the restore-snapshot branch March 30, 2023 16:00
@stephen-crawford stephen-crawford restored the restore-snapshot branch March 30, 2023 16:00
@stephen-crawford stephen-crawford deleted the restore-snapshot branch June 9, 2023 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants