[Proposal] Upgrades Project Workstreams Update #36

chelma · 2022-12-08T16:18:14Z

Objective

The purpose of this doc is to spark conversation around the major work initiatives in the Upgrades Project based upon new design work, developments, and data. The author proposes a set of workstreams that mostly overlap with preceding proposals but deviates in minor ways in terms of scope and sequencing. He hopes to drive alignment around this updated understanding of the project.

At a high level, it is proposed that the Validation Tool is not a "freebie" that naturally results from the development of the Upgrade Testing Framework, but is instead dependent on the Assessment Tool and therefore should be sequenced after it. Additionally, the focus of the Upgrade Testing Framework has shifted, and it is not longer expected to be performing validations of “real” clusters.

Upgrade Project Workstreams

Develop Upgrades "Knowledge Base"

This workstream is to develop a centralized understanding of what is "expected" to "go right" and "go wrong" when an Elasticsearch/OpenSearch cluster is upgraded. Within the bounds of this knowledge base are: data, metadata, configuration, the core ElasticsearchOpenSearch engine, and plugins. At the center of this workstream is developing a library of "expectations" that each express a thing that is expected (a string field should be converted to geopoint), when it applies (going from version X to X+2), and data/tests to confirm the expectation matches reality (e.g. the ability to run an actual upgrade and check to see if the expectation is true). The intention is to capture the community's full understanding of what actually happens during a given upgrade in order to provide better guidance/documentation, and provide solutions to pain points. A further intention is that it should be easy for community members to contribute new expectations to the knowledge base.

Related work:

Develop Tooling to Test Complex Upgrades

This workstream is focused on creating the required tooling to confirm that the expectations presented in the Knowledge Base are true in a repeatable, CI/CD manner. Currently, this workstream is called the Upgrade Testing Framework. Having such a set of tooling has the following benefits. First, it ensures that the expectations captured in the knowledge base are accurate and catches when those expectations change. Second, it facilitates development of fixes to pain points by providing a way to test those fixes. Third, it supports development of major new initiatives like single-hop, multi-version upgrades that existing test tooling does not focus on. Fourth, it provides the community with a higher-fidelity understanding of backwards compatibility than the existing Backwards Compatibility (BWC) tests are designed to provide and is intended to replace those existing tests.

While this tooling is not itself intended to be run directly against users' real clusters, the core code library written for the tooling is intended to be easily adapted to support new workstreams, such as providing an assistant that helps walk a user through the process of performing a migration or upgrade.

Related work

Develop Tooling to Perform Predictive Assessments

This workstream relies on the existance of an accurate, tested knowledge base to provide users a way to predict what issues they may run into when they perform an upgrade of an existing Elasticsearch/OpenSearch cluster to a new, proposed version. The current thinking is that the tool will interrogate the user's existing cluster to determine its configuration, use that understanding to project the expectations in the knowledge base into a subset that apply, and provide a report outlining issues that they may encounter.

Direct effort on this workstream has not yet begun.

Develop Tooling to Validate An Upgraded Cluster As Production-Ready

This workstream is focused on providing tooling to help a user decide whether an upgraded cluster is ready to be promoted to production. While further investigation is required to determine which specific criteria to validate with such tooling, the current thinking is as follows.

First, it is proposed that "is the system behaving as we expect" and "did the upgrade/migration break anything" are different questions requiring different, but probably related, work to answer. The first question is something that the Knowledge Base and Upgrade Testing Framework focus on by actually testing the edge cases in upgrades (e.g. did a field change from string to geopoint). The second is what the user cares about for this workflow when we talk about a Validation Tool (e.g. are the nodes happy and able to read the indices). Some expectations seems to overlap, such as the expectation that the pre- and post-upgrade clusters should have the same number of documents.

Second, attempting to directly confirm the full set of expectations contained in the knowledge base against an already-upgraded cluster does not seem to be either useful or tractable. A more reasonable approach would be to interrogate the post-upgrade cluster and provide a report of issues that may be present, according to the knowledge base, similar to the assessment tool, alongside whatever validation is performed. Many expectations are likely require a pre-upgrade setup phase (e.g. data upload) to confirm that is not relevant to a standalone, post-upgrade cluster and customers won't want to perform against their production cluster. In the event that both the pre-upgrade and post-upgrade cluster were available, a more sophisticated approach could be taken, but it still seems unlikely that the solution here would be to run the full suite of expectation tests as present in the Upgrade Testing Framework.

Third, while the validation tool would likely not be implemented by just running the Upgrade Testing Framework against the real, post-upgrade cluster, it does seem reasonable to test the validation tool with the Upgrade Testing Framework as one of its steps.

Fourth, investigation needs to be performed to determine which datapoints are most useful for validating a post-upgrade cluster is ready for production. A proper understanding of those datapoints is key for designing the tool, as the specific datapoints desired necessarily shape the input/setup requirements for the tool. For example, if it is overwhelmingly important that the number of documents is (roughly) the same pre- and post-upgrade, then the tool must have access to both the pre- and post-upgrade clusters in some way an cannot be run on a standalone, post-upgrade cluster.

Direct effort on this workstream has not yet begun.

Proposed Workstream Sequencing

It is proposed that the Knowledge Base and Upgrade Testing Framework be developed first, and in parallel, per the current plan, as they are the base of all future workstreams and are inextricably intertwined.
It is proposed that the Assessment Tool be developed after that, instead of the Validation Tool. Additional experience in the area has revealed that the tests executed by the Upgrade Testing Framework are unlikely to be directly applicable to the Validation Tool, meaning it is not the "freebie" originally envisioned if the Upgrade Testing Framework merely exists. Instead, it appears the Validation Tool will leverage the Assessment Tool and also require additional research/development independent of the Knowledgebase, Upgrade Testing Framework, and Assessment Tool workstreams.

gregschohn · 2022-12-08T16:48:30Z

Does this imply that the knowledge base will be maintained separate from the tests and their execution outcomes?

First, it ensures that the expectations captured in the knowledge base are accurate and catches when those expectations change.

I'm not sure why predictive assessments don't come for free if you have support for complex testing? You'd need to be able to replicate the customer's environment that you're running complex tests on, but even if that is approximated (version to start w/), it seems like you should be able to drive an assessment immediately. Have I misunderstood some part of the proposal?

Lastly, "Production-Ready" is a very loaded and fuzzy judgment. It will be different for every customer. I'd like to see a different way to describe conformance. Maybe we could say something like "conformant to v1.636 with the following exceptions: ...". In that case, conformant would mean that all known issues in our knowledge base at the point in time that 1.636 existed for the source/target versions as they were handed to the tool behaved completely as expected, unless otherwise noted. For both source and target, we should also dump out the configurations that we understand at the point in time that the tool is being run. It will be up to the customer to determine if the level of conformance that the tool is checking for is enough to determine if it is "production ready" or not.

chelma · 2022-12-08T16:57:42Z

@gregschohn

Does this imply that the knowledge base will be maintained separate from the tests and their execution outcomes?

I don't know what you're asking here. In my mind, the expectation must be separable from its implementing test/data, but it might be fine for them to both be represented by files/data in the same repo. Does that answer your question?

I'm not sure why predictive assessments don't come for free if you have support for complex testing?

Predictive, pre-upgrade assessments come free; validation of an actual, post-upgrade cluster's readiness don't (I think).

Lastly, "Production-Ready" is a very loaded and fuzzy judgment. It will be different for every customer.

I think you're making a lot of assumptions here. For example - having access to both the pre- and post-upgrade clusters in some way. Is that a valid assumption we want to drive our design around? Unclear to me at this point.

Also, it would be helpful if you could be clearer about what datapoints you're imagining are including in a determination of "conformance" and how they will be gathered. My cogitation indicates this is a situation where the devil really is in the details.

Finally, I don't think we're going to give them a report with a big checkbox that says "congratulations, your cluster is production ready". We're going to give them a report that enables them to make that assessment themselves. But the goal of the report should be to facilitate that determination and so I personally have no problem with the current branding, which I think is the source of the disagreement. Open to further thoughts on the topic though.

mikaylathompson · 2022-12-08T17:05:06Z

To answer/rephrase some of this as I understand it (and see if that lines up with other's understandings):

I think one of the things we're starting to realize is that there are two types of validation tests:

Is the system behaving as we expect? (or "as a x.y.z system should?") (example: we run a test that uploads data and checks that the date range query bug can be reproduced in 7.10.2, but can't be in 2.4 -- and then we can warn users about the behavior change).
Did the upgrade/migration break anything? (example: do we have the same number of documents as pre-transition? are the nodes happy and able to read the indices? -- there's a lot more work to do to figure out what else belongs in this bucket)

The knowledge base and the assessment tool operate in conjunction with the first category. These should be tests that are run against testing framework as part of the CI/CD and they prove what we're claiming in the knowledge base.

The second category is probably what customers care about in a real-life context. They want to know if their migration worked, and the first category doesn't really tell them that. Also they probably don't want us uploading our test data to their real clusters.

Maybe we could say something like "conformant to v1.636 with the following exceptions: ...". In that case, conformant would mean that all known issues in our knowledge base at the point in time that 1.636 existed for the source/target versions as they were handed to the tool behaved completely as expected, unless otherwise noted.

If I'm understanding this correctly, this is related to the first category, but the (very fuzzy!) "is this cluster production ready" is trying to get at the second category.

chelma · 2022-12-08T17:17:01Z

Thanks for clarifying @mikaylathompson - I agree 100%. I would suggest that the Knowledge Base could inform some of the things customers should check for post-upgrade as part of our solution to (2). We might not be able to conclusively determine, in all areas, whether the upgrade broke something and instead need to leave it as an exercise for the reader based upon a prompt/warning.

chelma · 2022-12-08T18:27:22Z

@mikaylathompson Incorporated some of your post back into the doc

chelma · 2022-12-08T21:58:23Z

Discussed in-depth w/ @dblock.

The Knowledge Base looks good, but should focus on viable upgrade paths instead of trying to exhaustively cover every case
The UTF also looks good, but some additional phrasing needs to be added to properly describe it (able to reproduce issues encountered in real cluster upgrades, test in the manner of a real upgrade, enables the lifecycle of discover/fix)
The Assessment Tool looks good, but need to clarify UX. Ideally offers user an idea of whether they should attempt the upgrade, what upgrade paths are available, and the pros/cons of each path.
Validation story needs to be fleshed out further. It was unclear to us whether there will ever be value in a standalone script that is just pointed at the post-upgrade cluster or whether validation is intrinsically tied to the full upgrade process, which implies a Validation Tool is really an Upgrade Assistant w/ validation being substeps of it.

Also discussed in-depth w/ @mikaylathompson, @sumobrian, @gregschohn, @kartg, @lewijacn, and @okhasawn.

Agreed the breakdown of workstreams and their boundaries make sense.
Need to invest more, sooner into understanding how to integrate the UTF into existing CI/CD workstreams and make it really easy to add new tests
Lots of uncertainty around what the Validation Tool is, but alignment that it wasn't the UTF. Mostly aligned that it is tied to an upgrade process, and so is likely an Upgrade Assistant.
Need to update wording around what guarantees the Assessment Tool, Validation Tool make to the user so it's clear it's best effort, they can contribute to it, and that they don't guarantee success
Need to update each workstream w/ a guestimate of the MVP user experience

setiah · 2022-12-09T17:58:27Z

Great proposal. Thanks for putting effort in drafting this. Few comments/questions

Develop Upgrades "Knowledge Base"

Could you clarify what shape would the output of knowledge base workstream take — will it be documentation on website, and/or a README in a repo and/or a collection of rules documented in some form. How would users contribute to it and ?

At the center of this workstream is developing a library of "expectations" that each express a thing that is expected (a string field should be converted to geopoint), when it applies (going from version X to X+2), and data/tests to confirm the expectation matches reality (e.g. the ability to run an actual upgrade and check to see if the expectation is true).

Whose responsibility will be to guarantee these "expectations" do not break in OpenSearch software? Today, backward compatibility framework is a source of validation for upgrades. Would the responsibility shift to UTF now? Would this be integrated with OpenSearch CICD pipleines to ensure the violation PRs are detected before merging in and the "expectations" are honored.

Also, the primary user for UTF appears to be OpenSearch developers who would use it to test upgrade process sanctity between versions. Do you also see it being used by anyone else?

I didn't see this mentioned (or I might have missed) but I also see Plugin & Extension developers as a user of this framework. UTF could be used to validate Plugin/Extension upgrade compatibility between versions.

Develop Tooling to Perform Predictive Assessments

I am assuming this would work on real production clusters to provide assessments? Would this cover - identifying breaking changes between the source and the target version, identifying data type incompatibilities, identifying deprecations, and providing recommendations that would help the user to discover and mitigate potential issues prior to starting the upgrade process?

chelma · 2022-12-21T15:00:51Z

Had a good convo w/ @setiah on 2022-12-09, which I forgot to update this thread with the details of.

The Knowledge Base (KB) and UTF are intertwined and intended to replace the existing Backwards Compatibility (BWC) tests for core engine and plugin developers. The combination of the KB and the UTF will also enable us to build the pre-upgrade Assessment Tool. Cluster Admins aren't expected to interact w/ the UTF directly but it's expected/intended that they be able to contribute to the Knowledge Base by adding new tests to it based on the issues they encountered and that this should be possible in an easy, code-free way. We were aligned on this plan.
Talked about the Assessment Tool, and agreed it was reliant on having the KB and UTF in place. Aligned on the plan that it should be able to interrogate a "real" cluster to provide its report, and that the quality of the report will be dependent on the amount of items in the KB we've covered. Aligned on the importance of capturing useful information in the KB.
Also chatted about the validation tool and agreed it's fuzzy what that is exactly, and that workstream should be pushed later so we can firm it up before committing to a path.

chelma · 2024-10-15T13:45:04Z

Stale issue; resolving.

chelma self-assigned this Dec 8, 2022

chelma mentioned this issue Dec 8, 2022

[Proposal] Upgrade Testing Framework UX #29

Closed

This was referenced Dec 9, 2022

Set up in-Python Docker library #6

Closed

Create Framework Step that sets up starting cluster #8

Closed

mikaylathompson mentioned this issue Dec 13, 2022

[Design] Expectations Data Model #24

Closed

chelma closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Upgrades Project Workstreams Update #36

[Proposal] Upgrades Project Workstreams Update #36

chelma commented Dec 8, 2022 •

edited

Loading

gregschohn commented Dec 8, 2022

chelma commented Dec 8, 2022

mikaylathompson commented Dec 8, 2022 •

edited

Loading

chelma commented Dec 8, 2022

chelma commented Dec 8, 2022

chelma commented Dec 8, 2022 •

edited

Loading

setiah commented Dec 9, 2022

chelma commented Dec 21, 2022

chelma commented Oct 15, 2024

[Proposal] Upgrades Project Workstreams Update #36

[Proposal] Upgrades Project Workstreams Update #36

Comments

chelma commented Dec 8, 2022 • edited Loading

Objective

Upgrade Project Workstreams

Develop Upgrades "Knowledge Base"

Develop Tooling to Test Complex Upgrades

Develop Tooling to Perform Predictive Assessments

Develop Tooling to Validate An Upgraded Cluster As Production-Ready

Proposed Workstream Sequencing

gregschohn commented Dec 8, 2022

chelma commented Dec 8, 2022

mikaylathompson commented Dec 8, 2022 • edited Loading

chelma commented Dec 8, 2022

chelma commented Dec 8, 2022

chelma commented Dec 8, 2022 • edited Loading

setiah commented Dec 9, 2022

chelma commented Dec 21, 2022

chelma commented Oct 15, 2024

chelma commented Dec 8, 2022 •

edited

Loading

mikaylathompson commented Dec 8, 2022 •

edited

Loading

chelma commented Dec 8, 2022 •

edited

Loading