Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF able to snapshot from ES 7.10.2 and restore onto OS 1.3.6 #47

Merged
merged 7 commits into from
Dec 20, 2022

Conversation

chelma
Copy link
Member

@chelma chelma commented Dec 15, 2022

Description

  • Addressed PR comments remaining from Updated UTF to spin up a cluster, interrogate it, then spin it down #40
  • Updated test_config wrangling code substantially to handle more complex configuration
  • Update Node configuration management to be a bit better abstracted
  • Added steps to perform snapshot/restore to the UTF main workflow
  • Added code to handle state-machine edge cases for the Cluster and Node
  • Moved unit tests to a new directory tree
  • Added the test_config used to perform testing to a new directory in the repo
  • Updated application state to be stored correctly into a file

Issues Resolved

Testing

Added unit tests for the new code

(.venv) chelma@3c22fba4e266 upgrades % python -m pytest unit_tests/
=========================================== test session starts ===========================================
platform darwin -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /Users/chelma/workspace/Migration-Fork/opensearch-migrations/upgrades
collected 78 items                                                                                        

unit_tests/cluster_management/test_cluster.py ...........                                           [ 14%]
unit_tests/cluster_management/test_container_configuration.py .                                     [ 15%]
unit_tests/cluster_management/test_docker_framework_client.py ..............                        [ 33%]
unit_tests/cluster_management/test_node.py ...........                                              [ 47%]
unit_tests/cluster_management/test_node_configuration.py ..                                         [ 50%]
unit_tests/core/test_exception_base.py ...                                                          [ 53%]
unit_tests/core/test_framework_runner.py ......                                                     [ 61%]
unit_tests/core/test_framework_state.py ...                                                         [ 65%]
unit_tests/core/test_framework_step.py ....                                                         [ 70%]
unit_tests/core/test_logging_wrangler.py .                                                          [ 71%]
unit_tests/core/test_shell_interactions.py .......                                                  [ 80%]
unit_tests/core/test_test_config_wrangling.py .........                                             [ 92%]
unit_tests/core/test_versions_engine.py .....                                                       [ 98%]
unit_tests/core/test_workspace_wrangler.py .                                                        [100%]

=========================================== 78 passed in 1.88s ============================================

Ran the snapshot/restore workflow on my laptop:

(.venv) chelma@3c22fba4e266 upgrades % cat ./test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json
{
    "clusters_def": {
        "source": {
            "engine_version": "ES_7_10_2",
            "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
            "node_count": 2,
            "additional_node_config": {
                "ES_JAVA_OPTS": "-Xms512m -Xmx512m"
            }
        },
        "target": {
            "engine_version": "OS_1_3_6",
            "image": "opensearchproject/opensearch:1.3.6",
            "node_count": 2,
            "additional_node_config": {
                "plugins.security.disabled": "true",
                "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m"
            }
        }
    },
    "upgrade_def": {
        "style": "snapshot-restore"
    }
}
(.venv) chelma@3c22fba4e266 upgrades % ./run_utf.py --test_config ./test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json
[FrameworkRunner] Running Step: LoadTestConfig
[LoadTestConfig] Loading test config file...
[LoadTestConfig] Loaded test config file successfully
[FrameworkRunner] Step Succeeded: LoadTestConfig
[FrameworkRunner] Running Step: BootstrapDocker
[BootstrapDocker] Checking if Docker is installed and available...
[BootstrapDocker] Docker appears to be installed and available
[BootstrapDocker] Ensuring the Docker image docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2 is available either locally or remotely...
[BootstrapDocker] Docker image docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2 is available
[BootstrapDocker] Ensuring the Docker image opensearchproject/opensearch:1.3.6 is available either locally or remotely...
[BootstrapDocker] Docker image opensearchproject/opensearch:1.3.6 is available
[FrameworkRunner] Step Succeeded: BootstrapDocker
[FrameworkRunner] Running Step: SnapshotRestoreSetup
[SnapshotRestoreSetup] Creating shared Docker volume to share snapshots...
[SnapshotRestoreSetup] Created shared Docker volume cluster-snapshots-volume
[FrameworkRunner] Step Succeeded: SnapshotRestoreSetup
[FrameworkRunner] Running Step: StartSourceCluster
[StartSourceCluster] Creating source cluster...
[StartSourceCluster] Waiting up to 30 sec for cluster to be active...
Node source-cluster-node-1 is now active
Node source-cluster-node-2 is now active
[StartSourceCluster] Cluster source-cluster is active
[FrameworkRunner] Step Succeeded: StartSourceCluster
[FrameworkRunner] Running Step: TestSourceCluster
[TestSourceCluster] Querying cluster status...
[TestSourceCluster] ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.24.0.2           10          53  60    0.69    0.15     0.14 dimr      -      source-cluster-node-1
172.24.0.3           10          53  54    0.69    0.15     0.14 dimr      *      source-cluster-node-2
[TestSourceCluster] Uploading sample document...
[TestSourceCluster] Retrieving uploaded document...
[TestSourceCluster] Retrieved uploaded doc sucessfully
[TestSourceCluster] {
    "_id": "1",
    "_index": "noldor",
    "_primary_term": 1,
    "_seq_no": 0,
    "_source": {
        "name": "Finwe"
    },
    "_type": "_doc",
    "_version": 1,
    "found": true
}
[FrameworkRunner] Step Succeeded: TestSourceCluster
[FrameworkRunner] Running Step: CreateSourceSnapshot
[CreateSourceSnapshot] Creating snapshot of source cluster...
[CreateSourceSnapshot] Confirming snapshot of source cluster exists...
[CreateSourceSnapshot] Source snapshot created successfully
[CreateSourceSnapshot] {
    "snapshots": [
        {
            "data_streams": [],
            "duration_in_millis": 0,
            "end_time": "1970-01-01T00:00:00.000Z",
            "end_time_in_millis": 0,
            "failures": [],
            "include_global_state": true,
            "indices": [
                "noldor"
            ],
            "shards": {
                "failed": 0,
                "successful": 0,
                "total": 0
            },
            "snapshot": "1",
            "start_time": "2022-12-16T18:13:14.388Z",
            "start_time_in_millis": 1671214394388,
            "state": "IN_PROGRESS",
            "uuid": "xcRVH_LpSiypBYpF_fM_cw",
            "version": "7.10.2",
            "version_id": 7100299
        }
    ]
}
[FrameworkRunner] Step Succeeded: CreateSourceSnapshot
[FrameworkRunner] Running Step: StartTargetCluster
[StartTargetCluster] Creating target cluster...
[StartTargetCluster] Waiting up to 30 sec for cluster to be active...
Node target-cluster-node-1 is now active
Node target-cluster-node-2 is now active
[StartTargetCluster] Cluster target-cluster is active
[FrameworkRunner] Step Succeeded: StartTargetCluster
[FrameworkRunner] Running Step: TestTargetCluster
[TestTargetCluster] ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.25.0.2           15          93  71    2.94    0.72     0.33 dimr      -      target-cluster-node-1
172.25.0.3           33          93  68    2.94    0.72     0.33 dimr      *      target-cluster-node-2
[FrameworkRunner] Step Succeeded: TestTargetCluster
[FrameworkRunner] Running Step: RestoreSourceSnapshot
[RestoreSourceSnapshot] Checking if source snapshots are visible on target...
[RestoreSourceSnapshot] Source snapshot visible to target cluster
[RestoreSourceSnapshot] Restoring source snapshot onto target...
[RestoreSourceSnapshot] Waiting a few seconds for the snapshot to be restored...
[RestoreSourceSnapshot] Attempting to retrieve source doc from target cluster...
[RestoreSourceSnapshot] Document retrieved, snapshot restored successfully
[RestoreSourceSnapshot] {
    "_id": "1",
    "_index": "noldor",
    "_primary_term": 1,
    "_seq_no": 0,
    "_source": {
        "name": "Finwe"
    },
    "_type": "_doc",
    "_version": 1,
    "found": true
}
[FrameworkRunner] Step Succeeded: RestoreSourceSnapshot
[FrameworkRunner] Running Step: StopSourceCluster
[StopSourceCluster] Stopping cluster source-cluster...
[StopSourceCluster] Cleaning up underlying resources for cluster source-cluster...
[FrameworkRunner] Step Succeeded: StopSourceCluster
[FrameworkRunner] Running Step: StopTargetCluster
[StopTargetCluster] Stopping cluster target-cluster...
[StopTargetCluster] Cleaning up underlying resources for cluster target-cluster...
[FrameworkRunner] Step Succeeded: StopTargetCluster
[FrameworkRunner] Running Step: SnapshotRestoreTeardown
[SnapshotRestoreTeardown] Removing shared Docker volume cluster-snapshots-volume...
[SnapshotRestoreTeardown] Removed shared Docker volume cluster-snapshots-volume
[FrameworkRunner] Step Succeeded: SnapshotRestoreTeardown
[FrameworkRunner] Ran through all steps successfully
[FrameworkRunner] Saving application state to file...
[FrameworkRunner] Application state saved
[FrameworkRunner] Application state saved to: /tmp/utf/state-file
[FrameworkRunner] Full run details logged to: /tmp/utf/logs/run.log.2022-12-16_12_12_59

State-file:

(.venv) chelma@3c22fba4e266 upgrades % cat /tmp/utf/state-file
{
    "app_state": {
        "current_step": "SnapshotRestoreTeardown",
        "exit_type": "Succeeded",
        "log_file": "/tmp/utf/logs/run.log.2022-12-16_12_29_45",
        "state_file": "/tmp/utf/state-file",
        "test_config_path": "./test_configs/snapshot_restore_es_7_10_2_to_os_1_3_6.json"
    },
    "shared_volume": {
        "mount_point": "/snapshots",
        "volume": {
            "CreatedAt": "2022-12-16T18:29:45Z",
            "Driver": "local",
            "Labels": null,
            "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
            "Name": "cluster-snapshots-volume",
            "Options": null,
            "Scope": "local"
        }
    },
    "source_cluster": {
        "cluster_config": {
            "additional_node_config": {
                "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                "path.repo": "/snapshots"
            },
            "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
            "node_count": 2
        },
        "cluster_state": "CLEANED_UP",
        "name": "source-cluster",
        "nodes": {
            "source-cluster-node-1": {
                "container": null,
                "container_config": {
                    "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:29:45.705121168Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.26.0.1",
                                    "Subnet": "172.26.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "bc0693d9a6923d641554afdeae99fa0071aec906f8b3ae664bfec7079ad4ce3e",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "source-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9200
                        }
                    ],
                    "rest_port": 9200,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/elasticsearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/source-cluster-volume-1/_data",
                                "Name": "source-cluster-volume-1",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "source-cluster-node-1",
                "node_config": {
                    "config": {
                        "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "source-cluster-node-1,source-cluster-node-2",
                        "cluster.name": "source-cluster",
                        "discovery.seed_hosts": "source-cluster-node-1,source-cluster-node-2",
                        "node.name": "source-cluster-node-1",
                        "path.repo": "/snapshots"
                    },
                    "data_dir": "/usr/share/elasticsearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            },
            "source-cluster-node-2": {
                "container": null,
                "container_config": {
                    "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:29:45.705121168Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.26.0.1",
                                    "Subnet": "172.26.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "bc0693d9a6923d641554afdeae99fa0071aec906f8b3ae664bfec7079ad4ce3e",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "source-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9201
                        }
                    ],
                    "rest_port": 9201,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/elasticsearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:46Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/source-cluster-volume-2/_data",
                                "Name": "source-cluster-volume-2",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "source-cluster-node-2",
                "node_config": {
                    "config": {
                        "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "source-cluster-node-1,source-cluster-node-2",
                        "cluster.name": "source-cluster",
                        "discovery.seed_hosts": "source-cluster-node-1,source-cluster-node-2",
                        "node.name": "source-cluster-node-2",
                        "path.repo": "/snapshots"
                    },
                    "data_dir": "/usr/share/elasticsearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            }
        }
    },
    "target_cluster": {
        "cluster_config": {
            "additional_node_config": {
                "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                "path.repo": "/snapshots",
                "plugins.security.disabled": "true"
            },
            "image": "opensearchproject/opensearch:1.3.6",
            "node_count": 2
        },
        "cluster_state": "CLEANED_UP",
        "name": "target-cluster",
        "nodes": {
            "target-cluster-node-1": {
                "container": null,
                "container_config": {
                    "image": "opensearchproject/opensearch:1.3.6",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:30:02.598090943Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.27.0.1",
                                    "Subnet": "172.27.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "86d725b1d7fd9da75b5dbfec3be460e19339c27752314e4944ee243d97fa8ded",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "target-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9202
                        }
                    ],
                    "rest_port": 9202,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/opensearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:30:02Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/target-cluster-volume-1/_data",
                                "Name": "target-cluster-volume-1",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "target-cluster-node-1",
                "node_config": {
                    "config": {
                        "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "target-cluster-node-1,target-cluster-node-2",
                        "cluster.name": "target-cluster",
                        "discovery.seed_hosts": "target-cluster-node-1,target-cluster-node-2",
                        "node.name": "target-cluster-node-1",
                        "path.repo": "/snapshots",
                        "plugins.security.disabled": "true"
                    },
                    "data_dir": "/usr/share/opensearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            },
            "target-cluster-node-2": {
                "container": null,
                "container_config": {
                    "image": "opensearchproject/opensearch:1.3.6",
                    "network": {
                        "Attachable": false,
                        "ConfigFrom": {
                            "Network": ""
                        },
                        "ConfigOnly": false,
                        "Containers": {},
                        "Created": "2022-12-16T18:30:02.598090943Z",
                        "Driver": "bridge",
                        "EnableIPv6": false,
                        "IPAM": {
                            "Config": [
                                {
                                    "Gateway": "172.27.0.1",
                                    "Subnet": "172.27.0.0/16"
                                }
                            ],
                            "Driver": "default",
                            "Options": null
                        },
                        "Id": "86d725b1d7fd9da75b5dbfec3be460e19339c27752314e4944ee243d97fa8ded",
                        "Ingress": false,
                        "Internal": false,
                        "Labels": {},
                        "Name": "target-cluster-network",
                        "Options": {},
                        "Scope": "local"
                    },
                    "port_mappings": [
                        {
                            "container_port": 9200,
                            "host_port": 9203
                        }
                    ],
                    "rest_port": 9203,
                    "ulimits": [
                        {
                            "hard": -1,
                            "name": "memlock",
                            "soft": -1
                        }
                    ],
                    "volumes": [
                        {
                            "mount_point": "/usr/share/opensearch/data",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:30:03Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/target-cluster-volume-2/_data",
                                "Name": "target-cluster-volume-2",
                                "Options": null,
                                "Scope": "local"
                            }
                        },
                        {
                            "mount_point": "/snapshots",
                            "volume": {
                                "CreatedAt": "2022-12-16T18:29:45Z",
                                "Driver": "local",
                                "Labels": null,
                                "Mountpoint": "/var/lib/docker/volumes/cluster-snapshots-volume/_data",
                                "Name": "cluster-snapshots-volume",
                                "Options": null,
                                "Scope": "local"
                            }
                        }
                    ]
                },
                "name": "target-cluster-node-2",
                "node_config": {
                    "config": {
                        "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                        "bootstrap.memory_lock": "true",
                        "cluster.initial_master_nodes": "target-cluster-node-1,target-cluster-node-2",
                        "cluster.name": "target-cluster",
                        "discovery.seed_hosts": "target-cluster-node-1,target-cluster-node-2",
                        "node.name": "target-cluster-node-2",
                        "path.repo": "/snapshots",
                        "plugins.security.disabled": "true"
                    },
                    "data_dir": "/usr/share/opensearch/data",
                    "user": "elasticsearch"
                },
                "node_state": "CLEANED_UP"
            }
        }
    },
    "test_config": {
        "clusters_def": {
            "source": {
                "additional_node_config": {
                    "ES_JAVA_OPTS": "-Xms512m -Xmx512m",
                    "path.repo": "/snapshots"
                },
                "image": "docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2",
                "node_count": 2
            },
            "target": {
                "additional_node_config": {
                    "OPENSEARCH_JAVA_OPTS": "-Xms512m -Xmx512m",
                    "path.repo": "/snapshots",
                    "plugins.security.disabled": "true"
                },
                "image": "opensearchproject/opensearch:1.3.6",
                "node_count": 2
            }
        },
        "upgrade_def": {
            "style": "snapshot-restore"
        }
    }
}

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chris Helma <chelma+github@amazon.com>
Signed-off-by: Chris Helma <chelma+github@amazon.com>
Signed-off-by: Chris Helma <chelma+github@amazon.com>
* Moved the unit tests to a directory tree that mirrors the lib
* Added basic handling for Node state edge cases
* Added basic handling for Cluster state edge cases

Signed-off-by: Chris Helma <chelma+github@amazon.com>
* Fixed Git hook for unit test invocation
* Fixed some bugs in Cluster/Node state management
* Added code so that the app state is saved in full

Signed-off-by: Chris Helma <chelma+github@amazon.com>
* Kept the _nodes around after cleanup so their state is stored.
  Previously, the field was cleared and the final state-file had no
  information about the node configuration in it.

Signed-off-by: Chris Helma <chelma+github@amazon.com>
Copy link
Collaborator

@lewijacn lewijacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition for snapshot restore, and setting up outline for external tests to be plugged in the test steps. Only minor comments to look at

elif "OS" == raw_engine:
engine = ENGINE_OPENSEARCH
else:
raise CouldNotParseEngineVersionException(version_string)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could provide a little more details for each case we raise an exception in this file for example, "Provided engine should match ES or OS format". Another alternative may be to simply provide an example in the exception class, "i.e. ES_7_10_2 or OS_1_3_6" , for all cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense, will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved exception message in next revision.

@@ -13,17 +16,34 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we have improved upon this last paragraph some and may need to update this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - will update. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in next rev

from upgrade_testing_framework.core.framework_step import FrameworkStep
import upgrade_testing_framework.core.shell_interactions as shell
from upgrade_testing_framework.cluster_management.node import Node


class TestSourceCluster(FrameworkStep):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming most of this is placeholder for the actual tests we want to run

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's placeholder

self.fail(f"Unsupported upgrade style - {upgrade_style}")

self.logger.info("Creating shared Docker volume to share snapshots...")
snapshot_volume = docker_client.create_volume("cluster-snapshots-volume")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a point of preference, seems like we should we be checking for an existing volume or are we relying on the exception of the docker sdk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Docker will create as many volumes with the same name as you ask it to and instead rely on their ID meta-data field; it's networks and containers that collide on a name. The Docker SDK uses the ID behind the scenes instead of the name, too, I think, so you wouldn't have collision.

Regardless, this just falls into the same bucket of issues that @mikaylathompson raised in a previous PR about adding a more formal test setup/teardown mechanism to ensure a "clean" testbed. I'll make an issue for that momentarily.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to cover this: #51

self.logger.info("Creating snapshot of source cluster...")
port = source_cluster.rest_ports[0]

register_cmd = f"""curl -XPUT 'localhost:{port}/_snapshot/noldor' -H 'Content-Type: application/json' -d '{{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should be using a constant here for snapshot repo name and snapshot name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - assuming this is our "real" code. It isn't, however, and will be ripped out/replaced in short order once we write our actual test code. Think of this as prototype/placeholder/demo code that is meant solely to illustrate how the UTF is designed to work.

"location": "{shared_volume.mount_point}"
}}
}}'
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see at least a debug log for the name of the snapshot repo created and snapshot name later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call_shell_command() function logs all that. That said - this curl code is going to be ripped out in the next few days/weeks anyways and replaced with "real" code, so...

}
},
"upgrade_def": {
"style": "snapshot-restore"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to worry about now, but it would be interesting in the future if a customer could specify an existing snapshot repo they had and restore that on a target cluster we spin up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially. Given the UTF is now primarily focused on OS developers rather than including cluster admins, I think the need for something like that maybe lessened, but I'm absolutely open to it based on user feedback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, would want to hear some user desire for this before pursuing

@@ -32,7 +32,7 @@ Learn more about venv [here](https://docs.python.org/3/library/venv.html).
The unit tests are executed by invoking Pytest:

```
python -m pytest tests/
python -m pytest unit_tests/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, is this to distinguish unit tests from tests run against the cluster? makes sense

},
"target": {
"engine_version": "OS_1_3_6",
"image": "opensearchproject/opensearch:1.3.6",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively (i.e. 4 files into the review), engine_version and image seem redundant--why do we need both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking is that a user should be able to roll their own image and use it in the UTF, but the UTF still needs to know the engine version before it spins up any nodes. Consider the use case of a plugin or core developer wanting to run the UTF against a modified version of the codebase by sticking it into a locally build Docker image and using that as the basis of a UTF run.

]
container_config = ContainerConfiguration(self._starting_image, network, port_mappings, self._volumes)

# Assemble the container configuration based on the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment looks unfinished

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next rev

@@ -116,7 +132,7 @@ def create_container(self, image: str, container_name: str, network: Network, po
detach=True,
environment=env_variables
)
self.logger.debug("Created container")
self.logger.debug(f"Created container {container_name}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor question: you changed the logger.debug messages on lines 84 & 87 to info level messages -- why not this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info level logs are visible in the terminal history; debug level logs are only visible in the log file. The fact that we had to pull an image from an external repo seems like something that's more important to surface to that higher level than the fact we created a container, which is a given.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the fact that we're pushing INFO level logs here indicates that this particular method should be at a higher level of abstraction. Will refactor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in next rev

@@ -131,10 +147,18 @@ def remove_container(self, container: Container):
self.logger.debug(f"Removed container {container.name}")

def run(self, container: Container, command: str) -> Tuple[int, str]:
# TODO - Need to handle when container isn't stopped
# TODO - Need to handle when container isn't running
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think the original is correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, wait, no -- my comment here is wrong. Leaving this because maybe it's confusing that run doesn't actually refer to "running" the container, but "running" a command in the container, but I don't know if making the method name more verbose is the solution. 🤷‍♀️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy fix; will rename to be clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to run_command

def __init__(self, name: str, container_config: ContainerConfiguration, node_config: NodeConfiguration,
docker_client: DockerFrameworkClient, container: Container = None):
self.logger = logging.getLogger(__name__)
self.name = name
self.rest_port = container_config.rest_port
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest port = port on which the rest api is available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

def _get_engine_user_for_version(version: ev.EngineVersion) -> str:
if ev.ENGINE_ELASTICSEARCH == version.engine:
return "elasticsearch"
return "elasticsearch"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be return "opensearch", yeah?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly not; for Elastic 1.x my testing indicated the run-as user in the container was "elasticsearch". Snapshot/restore certainly seems to work with the OpenSearch container's snapshots volume having its linux ownership perms set to the "elasticsearch" user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well that's weird (but as a side note, this explains why the original snapshot walkthrough only needed the owner of the repo to be set once).

why have the if at all then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I'm betting OpenSearch 2.x+ has a different user, and this is looking ahead to that eventuality.

self.docker_client: DockerFrameworkClient = None
self.source_cluster: Cluster = None
self.target_cluster: Cluster = None
self.test_config: TestConfig = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed the Type | None vs Optional[Type] thing last time--if they're being set to None, should they all be Optionals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno; does it matter either way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I under your question. We currently expect these to be defined over the course of a UTF run, so they're not technically "optional" in a grammatical sense. I therefore didn't use Optional, and instead declared/initialized them as None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a big deal, so not worth changing in this review, but for future related code: my take would be that if there's a chance they could be None (i.e. should callers check whether it's not-None before using it?), they should be Optional. Is that fair/line up with your understanding?

def _run(self):
"""
The contents of this are quick/dirty. Instead of raw curl commands we should probably be using the Python SDK
for Elasticsearch/OpenSearch. At the very least, we should wrap the curl commands in our own pseudo-client.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: want to make/link to a gh issue for this task? I agree that at least a pseudo-client sounds like a necessary approach and I'd like to have it on our radar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already made and in progress - #50

from upgrade_testing_framework.core.framework_step import FrameworkStep
from upgrade_testing_framework.cluster_management.cluster import Cluster, ClusterNotStartedInTimeException

class StartTargetCluster(FrameworkStep):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion question:
Ideally do we want StartSouceCluster and StartTargetCluster steps to be totally different, or should we just have a StartCluster step that takes in a config?
Changing this would be out of scope for this review. My first reaction is to object to the code duplication, but when I think about it more, source and target are pretty fundamental concepts for a migration/upgrade, and there's not (at this point) any reason to have more than the two, so maybe it's not a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same thought about making a base step for this, having them share library code, make a meta class, etc. I ended up deciding it wasn't worth messing with now as there's only two of them. Once we hit three, or it becomes a hassle to maintain the separate "copies", then we can revisit it. There are bigger fish to fry at the moment.

* opensearch-project#47

Signed-off-by: Chris Helma <chelma+github@amazon.com>
@@ -24,7 +24,7 @@ jobs:
python -m pip install -r requirements.txt coverage pytest-cov
- name: Run Tests with Coverage
run: |
python -m pytest tests/ --cov=upgrade_testing_framework --cov-report=xml --cov-branch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@chelma chelma merged commit 76c0120 into opensearch-project:main Dec 20, 2022
@chelma chelma deleted the snapshot-restore branch December 20, 2022 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants