UTF able to snapshot from ES 7.10.2 and restore onto OS 1.3.6 #47

chelma · 2022-12-15T23:08:38Z

Description

Addressed PR comments remaining from Updated UTF to spin up a cluster, interrogate it, then spin it down #40
Updated test_config wrangling code substantially to handle more complex configuration
Update Node configuration management to be a bit better abstracted
Added steps to perform snapshot/restore to the UTF main workflow
Added code to handle state-machine edge cases for the Cluster and Node
Moved unit tests to a new directory tree
Added the test_config used to perform testing to a new directory in the repo
Updated application state to be stored correctly into a file

Issues Resolved

Create Framework Step that executes Snapshot/Restore upgrade #7

Testing

Added unit tests for the new code

(.venv) chelma@3c22fba4e266 upgrades % python -m pytest unit_tests/
=========================================== test session starts ===========================================
platform darwin -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /Users/chelma/workspace/Migration-Fork/opensearch-migrations/upgrades
collected 78 items                                                                                        

unit_tests/cluster_management/test_cluster.py ...........                                           [ 14%]
unit_tests/cluster_management/test_container_configuration.py .                                     [ 15%]
unit_tests/cluster_management/test_docker_framework_client.py ..............                        [ 33%]
unit_tests/cluster_management/test_node.py ...........                                              [ 47%]
unit_tests/cluster_management/test_node_configuration.py ..                                         [ 50%]
unit_tests/core/test_exception_base.py ...                                                          [ 53%]
unit_tests/core/test_framework_runner.py ......                                                     [ 61%]
unit_tests/core/test_framework_state.py ...                                                         [ 65%]
unit_tests/core/test_framework_step.py ....                                                         [ 70%]
unit_tests/core/test_logging_wrangler.py .                                                          [ 71%]
unit_tests/core/test_shell_interactions.py .......                                                  [ 80%]
unit_tests/core/test_test_config_wrangling.py .........                                             [ 92%]
unit_tests/core/test_versions_engine.py .....                                                       [ 98%]
unit_tests/core/test_workspace_wrangler.py .                                                        [100%]

=========================================== 78 passed in 1.88s ============================================

Ran the snapshot/restore workflow on my laptop:

(.venv) chelma@3c22fba4e266 upgrades % cat ./test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json
{
    "clusters_def": {
        "source": {
            "engine_version": "ES_7_10_2",
            "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
            "node_count": 2,
            "additional_node_config": {
                "ES_JAVA_OPTS": "-Xms512m -Xmx512m"
            }
        },
        "target": {
            "engine_version": "OS_1_3_6",
            "image": "opensearchproject/opensearch:1.3.6",
            "node_count": 2,
            "additional_node_config": {
                "plugins.security.disabled": "true",
                "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m"
            }
        }
    },
    "upgrade_def": {
        "style": "snapshot-restore"
    }
}

(.venv) chelma@3c22fba4e266 upgrades % ./run_utf.py --test_config ./test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json
[FrameworkRunner] Running Step: LoadTestConfig
[LoadTestConfig] Loading test config file...
[LoadTestConfig] Loaded test config file successfully
[FrameworkRunner] Step Succeeded: LoadTestConfig
[FrameworkRunner] Running Step: BootstrapDocker
[BootstrapDocker] Checking if Docker is installed and available...
[BootstrapDocker] Docker appears to be installed and available
[BootstrapDocker] Ensuring the Docker image docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2 is available either locally or remotely...
[BootstrapDocker] Docker image docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2 is available
[BootstrapDocker] Ensuring the Docker image opensearchproject/opensearch:1.3.6 is available either locally or remotely...
[BootstrapDocker] Docker image opensearchproject/opensearch:1.3.6 is available
[FrameworkRunner] Step Succeeded: BootstrapDocker
[FrameworkRunner] Running Step: SnapshotRestoreSetup
[SnapshotRestoreSetup] Creating shared Docker volume to share snapshots...
[SnapshotRestoreSetup] Created shared Docker volume cluster-snapshots-volume
[FrameworkRunner] Step Succeeded: SnapshotRestoreSetup
[FrameworkRunner] Running Step: StartSourceCluster
[StartSourceCluster] Creating source cluster...
[StartSourceCluster] Waiting up to 30 sec for cluster to be active...
Node source-cluster-node-1 is now active
Node source-cluster-node-2 is now active
[StartSourceCluster] Cluster source-cluster is active
[FrameworkRunner] Step Succeeded: StartSourceCluster
[FrameworkRunner] Running Step: TestSourceCluster
[TestSourceCluster] Querying cluster status...
[TestSourceCluster] ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.24.0.2           10          53  60    0.69    0.15     0.14 dimr      -      source-cluster-node-1
172.24.0.3           10          53  54    0.69    0.15     0.14 dimr      *      source-cluster-node-2
[TestSourceCluster] Uploading sample document...
[TestSourceCluster] Retrieving uploaded document...
[TestSourceCluster] Retrieved uploaded doc sucessfully
[TestSourceCluster] {
    "_id": "1",
    "_index": "noldor",
    "_primary_term": 1,
    "_seq_no": 0,
    "_source": {
        "name": "Finwe"
    },
    "_type": "_doc",
    "_version": 1,
    "found": true
}
[FrameworkRunner] Step Succeeded: TestSourceCluster
[FrameworkRunner] Running Step: CreateSourceSnapshot
[CreateSourceSnapshot] Creating snapshot of source cluster...
[CreateSourceSnapshot] Confirming snapshot of source cluster exists...
[CreateSourceSnapshot] Source snapshot created successfully
[CreateSourceSnapshot] {
    "snapshots": [
        {
            "data_streams": [],
            "duration_in_millis": 0,
            "end_time": "1970-01-01T00:00:00.000Z",
            "end_time_in_millis": 0,
            "failures": [],
            "include_global_state": true,
            "indices": [
                "noldor"
            ],
            "shards": {
                "failed": 0,
                "successful": 0,
                "total": 0
            },
            "snapshot": "1",
            "start_time": "2022-12-16T18:13:14.388Z",
            "start_time_in_millis": 1671214394388,
            "state": "IN_PROGRESS",
            "uuid": "xcRVH_LpSiypBYpF_fM_cw",
            "version": "7.10.2",
            "version_id": 7100299
        }
    ]
}
[FrameworkRunner] Step Succeeded: CreateSourceSnapshot
[FrameworkRunner] Running Step: StartTargetCluster
[StartTargetCluster] Creating target cluster...
[StartTargetCluster] Waiting up to 30 sec for cluster to be active...
Node target-cluster-node-1 is now active
Node target-cluster-node-2 is now active
[StartTargetCluster] Cluster target-cluster is active
[FrameworkRunner] Step Succeeded: StartTargetCluster
[FrameworkRunner] Running Step: TestTargetCluster
[TestTargetCluster] ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.25.0.2           15          93  71    2.94    0.72     0.33 dimr      -      target-cluster-node-1
172.25.0.3           33          93  68    2.94    0.72     0.33 dimr      *      target-cluster-node-2
[FrameworkRunner] Step Succeeded: TestTargetCluster
[FrameworkRunner] Running Step: RestoreSourceSnapshot
[RestoreSourceSnapshot] Checking if source snapshots are visible on target...
[RestoreSourceSnapshot] Source snapshot visible to target cluster
[RestoreSourceSnapshot] Restoring source snapshot onto target...
[RestoreSourceSnapshot] Waiting a few seconds for the snapshot to be restored...
[RestoreSourceSnapshot] Attempting to retrieve source doc from target cluster...
[RestoreSourceSnapshot] Document retrieved, snapshot restored successfully
[RestoreSourceSnapshot] {
    "_id": "1",
    "_index": "noldor",
    "_primary_term": 1,
    "_seq_no": 0,
    "_source": {
        "name": "Finwe"
    },
    "_type": "_doc",
    "_version": 1,
    "found": true
}
[FrameworkRunner] Step Succeeded: RestoreSourceSnapshot
[FrameworkRunner] Running Step: StopSourceCluster
[StopSourceCluster] Stopping cluster source-cluster...
[StopSourceCluster] Cleaning up underlying resources for cluster source-cluster...
[FrameworkRunner] Step Succeeded: StopSourceCluster
[FrameworkRunner] Running Step: StopTargetCluster
[StopTargetCluster] Stopping cluster target-cluster...
[StopTargetCluster] Cleaning up underlying resources for cluster target-cluster...
[FrameworkRunner] Step Succeeded: StopTargetCluster
[FrameworkRunner] Running Step: SnapshotRestoreTeardown
[SnapshotRestoreTeardown] Removing shared Docker volume cluster-snapshots-volume...
[SnapshotRestoreTeardown] Removed shared Docker volume cluster-snapshots-volume
[FrameworkRunner] Step Succeeded: SnapshotRestoreTeardown
[FrameworkRunner] Ran through all steps successfully
[FrameworkRunner] Saving application state to file...
[FrameworkRunner] Application state saved
[FrameworkRunner] Application state saved to: /tmp/utf/state-file
[FrameworkRunner] Full run details logged to: /tmp/utf/logs/run.log.2022-12-16_12_12_59

State-file:

(.venv) chelma@3c22fba4e266 upgrades % cat /tmp/utf/state-file
{
    "app_state": {
        "current_step": "SnapshotRestoreTeardown",
        "exit_type": "Succeeded",
        "log_file": "/tmp/utf/logs/run.log.2022-12-16_12_29_45",
        "state_file": "/tmp/utf/state-file",
        "test_config_path": "./test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json"
    },
    "shared_volume": {
        "mount_point": "/snapshots",
        "volume": {
            "CreatedAt": "2022-12-16T18:29:45Z",
            "Driver": "local",
            "Labels": null,
            "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
            "Name": "cluster-snapshots-volume",
            "Options": null,
            "Scope": "local"
        }
    },
    "source_cluster": {
        "cluster_config": {
            "additional_node_config": {
                "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                "path.repo": "/snapshots"
            },
            "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
            "node_count": 2
        },
        "cluster_state": "CLEANED_UP",
        "name": "source-cluster",
        "nodes": {
            "source-cluster-node-1": {
                "container": null,
                "container_config": {
                    "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:29:45.705121168Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.26.0.1",
                                    "Subnet": "172.26.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "bc0693d9a6923d641554afdeae99fa0071aec906f8b3ae664bfec7079ad4ce3e",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "source-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9200
                        }
                    ],
                    "rest_port": 9200,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/elasticsearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/source-cluster-volume-1/_data",
                                "Name": "source-cluster-volume-1",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "source-cluster-node-1",
                "node_config": {
                    "config": {
                        "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "source-cluster-node-1,source-cluster-node-2",
                        "cluster.name": "source-cluster",
                        "discovery.seed_hosts": "source-cluster-node-1,source-cluster-node-2",
                        "node.name": "source-cluster-node-1",
                        "path.repo": "/snapshots"
                    },
                    "data_dir": "/usr/share/elasticsearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            },
            "source-cluster-node-2": {
                "container": null,
                "container_config": {
                    "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:29:45.705121168Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.26.0.1",
                                    "Subnet": "172.26.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "bc0693d9a6923d641554afdeae99fa0071aec906f8b3ae664bfec7079ad4ce3e",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "source-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9201
                        }
                    ],
                    "rest_port": 9201,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/elasticsearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:46Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/source-cluster-volume-2/_data",
                                "Name": "source-cluster-volume-2",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "source-cluster-node-2",
                "node_config": {
                    "config": {
                        "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "source-cluster-node-1,source-cluster-node-2",
                        "cluster.name": "source-cluster",
                        "discovery.seed_hosts": "source-cluster-node-1,source-cluster-node-2",
                        "node.name": "source-cluster-node-2",
                        "path.repo": "/snapshots"
                    },
                    "data_dir": "/usr/share/elasticsearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            }
        }
    },
    "target_cluster": {
        "cluster_config": {
            "additional_node_config": {
                "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                "path.repo": "/snapshots",
                "plugins.security.disabled": "true"
            },
            "image": "opensearchproject/opensearch:1.3.6",
            "node_count": 2
        },
        "cluster_state": "CLEANED_UP",
        "name": "target-cluster",
        "nodes": {
            "target-cluster-node-1": {
                "container": null,
                "container_config": {
                    "image": "opensearchproject/opensearch:1.3.6",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:30:02.598090943Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.27.0.1",
                                    "Subnet": "172.27.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "86d725b1d7fd9da75b5dbfec3be460e19339c27752314e4944ee243d97fa8ded",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "target-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9202
                        }
                    ],
                    "rest_port": 9202,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/opensearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:30:02Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/target-cluster-volume-1/_data",
                                "Name": "target-cluster-volume-1",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "target-cluster-node-1",
                "node_config": {
                    "config": {
                        "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "target-cluster-node-1,target-cluster-node-2",
                        "cluster.name": "target-cluster",
                        "discovery.seed_hosts": "target-cluster-node-1,target-cluster-node-2",
                        "node.name": "target-cluster-node-1",
                        "path.repo": "/snapshots",
                        "plugins.security.disabled": "true"
                    },
                    "data_dir": "/usr/share/opensearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            },
            "target-cluster-node-2": {
                "container": null,
                "container_config": {
                    "image": "opensearchproject/opensearch:1.3.6",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:30:02.598090943Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.27.0.1",
                                    "Subnet": "172.27.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "86d725b1d7fd9da75b5dbfec3be460e19339c27752314e4944ee243d97fa8ded",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "target-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9203
                        }
                    ],
                    "rest_port": 9203,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/opensearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:30:03Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/target-cluster-volume-2/_data",
                                "Name": "target-cluster-volume-2",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "target-cluster-node-2",
                "node_config": {
                    "config": {
                        "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "target-cluster-node-1,target-cluster-node-2",
                        "cluster.name": "target-cluster",
                        "discovery.seed_hosts": "target-cluster-node-1,target-cluster-node-2",
                        "node.name": "target-cluster-node-2",
                        "path.repo": "/snapshots",
                        "plugins.security.disabled": "true"
                    },
                    "data_dir": "/usr/share/opensearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            }
        }
    },
    "test_config": {
        "clusters_def": {
            "source": {
                "additional_node_config": {
                    "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                    "path.repo": "/snapshots"
                },
                "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
                "node_count": 2
            },
            "target": {
                "additional_node_config": {
                    "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                    "path.repo": "/snapshots",
                    "plugins.security.disabled": "true"
                },
                "image": "opensearchproject/opensearch:1.3.6",
                "node_count": 2
            }
        },
        "upgrade_def": {
            "style": "snapshot-restore"
        }
    }
}

Check List

New functionality includes testing
- All tests pass, including unit test, integration test and doctest
New functionality has been documented
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chris Helma <chelma+github@amazon.com>

* Moved the unit tests to a directory tree that mirrors the lib * Added basic handling for Node state edge cases * Added basic handling for Cluster state edge cases Signed-off-by: Chris Helma <chelma+github@amazon.com>

* Fixed Git hook for unit test invocation * Fixed some bugs in Cluster/Node state management * Added code so that the app state is saved in full Signed-off-by: Chris Helma <chelma+github@amazon.com>

* Kept the _nodes around after cleanup so their state is stored. Previously, the field was cleared and the final state-file had no information about the node configuration in it. Signed-off-by: Chris Helma <chelma+github@amazon.com>

lewijacn

Nice addition for snapshot restore, and setting up outline for external tests to be plugged in the test steps. Only minor comments to look at

lewijacn · 2022-12-16T20:58:32Z

upgrades/upgrade_testing_framework/core/versions_engine.py

+    elif "OS" == raw_engine:
+        engine = ENGINE_OPENSEARCH
+    else:
+        raise CouldNotParseEngineVersionException(version_string)


Seems like we could provide a little more details for each case we raise an exception in this file for example, "Provided engine should match ES or OS format". Another alternative may be to simply provide an example in the exception class, "i.e. ES_7_10_2 or OS_1_3_6" , for all cases

Sure, makes sense, will do

Improved exception message in next revision.

lewijacn · 2022-12-16T21:10:19Z

upgrades/upgrade_testing_framework/core/framework_state.py

@@ -13,17 +16,34 @@
 """


Seems like we have improved upon this last paragraph some and may need to update this

Yep - will update. Thanks!

Updated in next rev

lewijacn · 2022-12-16T22:01:10Z

upgrades/upgrade_testing_framework/steps/step_test_source_cluster.py

 from upgrade_testing_framework.core.framework_step import FrameworkStep
 import upgrade_testing_framework.core.shell_interactions as shell
-from upgrade_testing_framework.cluster_management.node import Node


 class TestSourceCluster(FrameworkStep):


I'm assuming most of this is placeholder for the actual tests we want to run

Yeah, it's placeholder

lewijacn · 2022-12-16T22:07:54Z

upgrades/upgrade_testing_framework/steps/step_snapshot_restore_setup.py

+            self.fail(f"Unsupported upgrade style - {upgrade_style}")
+
+        self.logger.info("Creating shared Docker volume to share snapshots...")
+        snapshot_volume = docker_client.create_volume("cluster-snapshots-volume")


Maybe a point of preference, seems like we should we be checking for an existing volume or are we relying on the exception of the docker sdk?

IIRC, Docker will create as many volumes with the same name as you ask it to and instead rely on their ID meta-data field; it's networks and containers that collide on a name. The Docker SDK uses the ID behind the scenes instead of the name, too, I think, so you wouldn't have collision.

Regardless, this just falls into the same bucket of issues that @mikaylathompson raised in a previous PR about adding a more formal test setup/teardown mechanism to ensure a "clean" testbed. I'll make an issue for that momentarily.

Created an issue to cover this: #51

lewijacn · 2022-12-16T22:11:25Z

upgrades/upgrade_testing_framework/steps/step_create_source_snapshot.py

+        self.logger.info("Creating snapshot of source cluster...")
+        port = source_cluster.rest_ports[0]
+
+        register_cmd = f"""curl -XPUT 'localhost:{port}/_snapshot/noldor' -H 'Content-Type: application/json' -d '{{


Seems like we should be using a constant here for snapshot repo name and snapshot name

Sure - assuming this is our "real" code. It isn't, however, and will be ripped out/replaced in short order once we write our actual test code. Think of this as prototype/placeholder/demo code that is meant solely to illustrate how the UTF is designed to work.

lewijacn · 2022-12-16T22:12:19Z

upgrades/upgrade_testing_framework/steps/step_create_source_snapshot.py

+                    "location": "{shared_volume.mount_point}"
+            }}
+        }}'
+        """


Would like to see at least a debug log for the name of the snapshot repo created and snapshot name later

The call_shell_command() function logs all that. That said - this curl code is going to be ripped out in the next few days/weeks anyways and replaced with "real" code, so...

lewijacn · 2022-12-16T22:20:01Z

upgrades/test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json

+        }
+    },
+    "upgrade_def": {
+        "style": "snapshot-restore"


Nothing to worry about now, but it would be interesting in the future if a customer could specify an existing snapshot repo they had and restore that on a target cluster we spin up

Potentially. Given the UTF is now primarily focused on OS developers rather than including cluster admins, I think the need for something like that maybe lessened, but I'm absolutely open to it based on user feedback.

Agreed, would want to hear some user desire for this before pursuing

mikaylathompson · 2022-12-19T22:24:30Z

upgrades/README.md

@@ -32,7 +32,7 @@ Learn more about venv [here](https://docs.python.org/3/library/venv.html).
 The unit tests are executed by invoking Pytest:

 ```
-python -m pytest tests/
+python -m pytest unit_tests/


ah, is this to distinguish unit tests from tests run against the cluster? makes sense

mikaylathompson · 2022-12-19T22:25:43Z

upgrades/test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json

+        },
+        "target": {
+            "engine_version": "OS_1_3_6",
+            "image": "opensearchproject/opensearch:1.3.6",


Naively (i.e. 4 files into the review), engine_version and image seem redundant--why do we need both?

The thinking is that a user should be able to roll their own image and use it in the UTF, but the UTF still needs to know the engine version before it spins up any nodes. Consider the use case of a plugin or core developer wanting to run the UTF against a modified version of the codebase by sticking it into a locally build Docker image and using that as the basis of a UTF run.

mikaylathompson · 2022-12-19T22:41:18Z

upgrades/upgrade_testing_framework/cluster_management/cluster.py

            ]
-            container_config = ContainerConfiguration(self._starting_image, network, port_mappings, self._volumes)
+
+            # Assemble the container configuration based on the 


comment looks unfinished

Fixed in next rev

mikaylathompson · 2022-12-19T22:50:53Z

upgrades/upgrade_testing_framework/cluster_management/docker_framework_client.py

@@ -116,7 +132,7 @@ def create_container(self, image: str, container_name: str, network: Network, po
            detach=True,
            environment=env_variables
        )
-        self.logger.debug("Created container")
+        self.logger.debug(f"Created container {container_name}")


very minor question: you changed the logger.debug messages on lines 84 & 87 to info level messages -- why not this one?

Info level logs are visible in the terminal history; debug level logs are only visible in the log file. The fact that we had to pull an image from an external repo seems like something that's more important to surface to that higher level than the fact we created a container, which is a given.

Actually, I think the fact that we're pushing INFO level logs here indicates that this particular method should be at a higher level of abstraction. Will refactor.

Addressed in next rev

mikaylathompson · 2022-12-19T22:51:28Z

upgrades/upgrade_testing_framework/cluster_management/docker_framework_client.py

@@ -131,10 +147,18 @@ def remove_container(self, container: Container):
        self.logger.debug(f"Removed container {container.name}")

    def run(self, container: Container, command: str) -> Tuple[int, str]:
-        # TODO - Need to handle when container isn't stopped
+        # TODO - Need to handle when container isn't running


minor: I think the original is correct?

ah, wait, no -- my comment here is wrong. Leaving this because maybe it's confusing that run doesn't actually refer to "running" the container, but "running" a command in the container, but I don't know if making the method name more verbose is the solution. 🤷‍♀️

Easy fix; will rename to be clearer.

Renamed to run_command

mikaylathompson · 2022-12-19T22:55:55Z

upgrades/upgrade_testing_framework/cluster_management/node.py

    def __init__(self, name: str, container_config: ContainerConfiguration, node_config: NodeConfiguration, 
            docker_client: DockerFrameworkClient, container: Container = None):
        self.logger = logging.getLogger(__name__)
        self.name = name
+        self.rest_port = container_config.rest_port


rest port = port on which the rest api is available?

mikaylathompson · 2022-12-19T23:20:11Z

upgrades/upgrade_testing_framework/cluster_management/node_configuration.py

+def _get_engine_user_for_version(version: ev.EngineVersion) -> str:
+    if ev.ENGINE_ELASTICSEARCH == version.engine:
+        return "elasticsearch"
+    return "elasticsearch"


this should be return "opensearch", yeah?

Surprisingly not; for Elastic 1.x my testing indicated the run-as user in the container was "elasticsearch". Snapshot/restore certainly seems to work with the OpenSearch container's snapshots volume having its linux ownership perms set to the "elasticsearch" user.

well that's weird (but as a side note, this explains why the original snapshot walkthrough only needed the owner of the repo to be set once).

why have the if at all then?

Because I'm betting OpenSearch 2.x+ has a different user, and this is looking ahead to that eventuality.

mikaylathompson · 2022-12-19T23:25:09Z

upgrades/upgrade_testing_framework/core/framework_state.py

+        self.docker_client: DockerFrameworkClient = None
+        self.source_cluster: Cluster = None
+        self.target_cluster: Cluster = None
+        self.test_config: TestConfig = None


I know we discussed the Type | None vs Optional[Type] thing last time--if they're being set to None, should they all be Optionals?

I dunno; does it matter either way?

Ah, now I under your question. We currently expect these to be defined over the course of a UTF run, so they're not technically "optional" in a grammatical sense. I therefore didn't use Optional, and instead declared/initialized them as None.

I don't think this is a big deal, so not worth changing in this review, but for future related code: my take would be that if there's a chance they could be None (i.e. should callers check whether it's not-None before using it?), they should be Optional. Is that fair/line up with your understanding?

mikaylathompson · 2022-12-19T23:31:30Z

upgrades/upgrade_testing_framework/steps/step_create_source_snapshot.py

+    def _run(self):
+        """
+        The contents of this are quick/dirty.  Instead of raw curl commands we should probably be using the Python SDK
+        for Elasticsearch/OpenSearch.  At the very least, we should wrap the curl commands in our own pseudo-client.


minor: want to make/link to a gh issue for this task? I agree that at least a pseudo-client sounds like a necessary approach and I'd like to have it on our radar.

Already made and in progress - #50

mikaylathompson · 2022-12-19T23:38:05Z

upgrades/upgrade_testing_framework/steps/step_start_target_cluster.py

+from upgrade_testing_framework.core.framework_step import FrameworkStep
+from upgrade_testing_framework.cluster_management.cluster import Cluster, ClusterNotStartedInTimeException
+
+class StartTargetCluster(FrameworkStep):


Discussion question:
Ideally do we want StartSouceCluster and StartTargetCluster steps to be totally different, or should we just have a StartCluster step that takes in a config?
Changing this would be out of scope for this review. My first reaction is to object to the code duplication, but when I think about it more, source and target are pretty fundamental concepts for a migration/upgrade, and there's not (at this point) any reason to have more than the two, so maybe it's not a problem.

Yeah, I had the same thought about making a base step for this, having them share library code, make a meta class, etc. I ended up deciding it wasn't worth messing with now as there's only two of them. Once we hit three, or it becomes a hassle to maintain the separate "copies", then we can revisit it. There are bigger fish to fry at the moment.

* opensearch-project#47 Signed-off-by: Chris Helma <chelma+github@amazon.com>

lewijacn · 2022-12-20T16:43:43Z

.github/workflows/python-tests.yml

@@ -24,7 +24,7 @@ jobs:
          python -m pip install -r requirements.txt coverage pytest-cov
      - name: Run Tests with Coverage
        run: |
-          python -m pytest tests/ --cov=upgrade_testing_framework --cov-report=xml --cov-branch


chelma added 4 commits December 13, 2022 15:49

UTF now able to spin up source and target clusters

cb0f1f5

Signed-off-by: Chris Helma <chelma+github@amazon.com>

UTF now able to take a snapshot of source cluster

06ac72e

Signed-off-by: Chris Helma <chelma+github@amazon.com>

UTF now restores ES 7.10.2 source snapshot on OS 1.3.6 target

79ecf6e

Signed-off-by: Chris Helma <chelma+github@amazon.com>

Minor improvements and cleanup

e0d4e77

* Moved the unit tests to a directory tree that mirrors the lib * Added basic handling for Node state edge cases * Added basic handling for Cluster state edge cases Signed-off-by: Chris Helma <chelma+github@amazon.com>

chelma requested a review from a team as a code owner December 15, 2022 23:08

chelma mentioned this pull request Dec 15, 2022

Create Framework Step that executes Snapshot/Restore upgrade #7

Closed

chelma added 2 commits December 16, 2022 12:19

More minor improvements and cleanup

e85f268

* Fixed Git hook for unit test invocation * Fixed some bugs in Cluster/Node state management * Added code so that the app state is saved in full Signed-off-by: Chris Helma <chelma+github@amazon.com>

Fixed minor bug in Cluster cleanup

db0c419

* Kept the _nodes around after cleanup so their state is stored. Previously, the field was cleared and the final state-file had no information about the node configuration in it. Signed-off-by: Chris Helma <chelma+github@amazon.com>

lewijacn reviewed Dec 16, 2022

View reviewed changes

This was referenced Dec 19, 2022

Create Python REST Library for Elasticsearch/OpenSearch #50

Closed

Handle UTF Setup/Teardown Edge Cases #51

Closed

mikaylathompson reviewed Dec 19, 2022

View reviewed changes

Responding to PR comments

bf6bb85

* opensearch-project#47 Signed-off-by: Chris Helma <chelma+github@amazon.com>

mikaylathompson approved these changes Dec 20, 2022

View reviewed changes

lewijacn approved these changes Dec 20, 2022

View reviewed changes

chelma merged commit 76c0120 into opensearch-project:main Dec 20, 2022

chelma deleted the snapshot-restore branch December 20, 2022 17:56

UTF able to snapshot from ES 7.10.2 and restore onto OS 1.3.6 #47

UTF able to snapshot from ES 7.10.2 and restore onto OS 1.3.6 #47

Conversation

chelma commented Dec 15, 2022 • edited Loading

Description

Issues Resolved

Testing

Check List

lewijacn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chelma commented Dec 15, 2022 •

edited

Loading