From 6c4d140f1e39572e788a6981e7b8403eb8ef3a83 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Opala?= Date: Mon, 29 Jun 2020 14:29:09 +0200 Subject: [PATCH] Feature backup / restore (#1359) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * backup/recovery: uncommenting and refactoring legacy code (#1241) * backup/recovery: adding ansible code for syncing files between hosts (#1257) * backup/recovery: adding ansible code for syncing files between hosts * backup/recovery: adding rsync to package requirements * backup/recovery: fixing rsync code + adding random name for private key * backup/recovery: adding ssh code (transfer via pipeline) * backup/recovery/prometheus: enabling admin API + initial implementation * backup/recovery: adding initial code for restoring monitoring snapshots (#1259) * backup/recovery: moving sync code to roles + cleanups * backup/recovery: adding initial code for restoring monitoring snapshots * backup/recovery: adding kubernetes backup sync (copy-only) * backup/recovery: checking if snapshots are available first * backup/recovery: adding on-the-fly sha1 calculation (#1262) * backup/recovery: adding on-the-fly sha1 calculation in the download_via_ssh copy method * backup/recovery: fixing on-the-fly sha1 calculation - monitoring: reverting invalid tar argument order - download_via_ssh: making the inline_script fail properly * Postgres Backup Initial * postgres backup * Add all config files to backup * backup/recovery: monitoring refactor (#1270) - switching from ssh to rsync - adding grafana backup/restore - adding prometheus etc backup/restore * backup/recovery: adding loadbalancer (#1279) * backup/recovery: logging (WIP) (#1277) - work in progress, does not work with elasticsearch clusters yet * backup/recovery: adding rabbitmq (without messages) (#1291) * backup/recovery: adding rabbitmq (without messages) * backup/recovery: adding rabbitmq (archive fix) * Epicli backup/restore improvements and refactor (#1299) * epicli: moving component parser to separate module * backup/recovery: moving backup storage to repository hosts * backup/recovery: PatchEngine refactor (adding yaml based config) * backup/recovery: adding backup common tasks * backup/recovery: adding recovery common tasks * backup/recovery: kubernetes refactor * backup/recovery: load_balancer refactor * backup/recovery: logging refactor * backup/recovery: monitoring refactor * backup/recovery: rabbitmq refactor * backup/recovery: postgresql refactor (unfinished) * backup/recovery: adding optional -b build_directory parameter (#1301) * merge, modify postgres to use new setup (#1308) * Postgres recovery fix (#1339) * merge, modify postgres to use new setup * recovery fix * backup/recovery: fixing user vs default config merge logic (or lack of it) (#1345) * Modify backup / restore roles according to test results (#1356) * merge, modify postgres to use new setup * recovery fix * centos testing changes * fix after redhat tests * merge * Add provider to backup / restore * Notes about kubernetes * backup/recovery: adding missing provider value to role vars (fix) (#1358) * backup/recovery: epicli invocation refactor + adding schema validations * Change in Postgresql recovery task * backup/recovery: moving schema validation to earlier stage * backup/recovery: splitting PatchEngine into BackupEngine and RecoveryEngine * SchemaValidator: adding ability to check individual documents (without the base schema) * backup/recovery: reusing modified SchemaValidator * backup/recovery: removing unneeded/broken kubernetes recovery code * backup/recovery: removing unused code and trailing whitespace * backup/recovery: removing abandoned component parser * doc_list_helpers: adding missing unit tests for the "select_single" helper * Backup / recovery documentation (#1377) * Feature/restore and backup validation (#1379) * Add recovery components validation * Reformating * Update core/src/epicli/data/common/validation/configuration/recovery.yml Co-authored-by: Michał Opala * Update core/src/epicli/data/common/validation/configuration/recovery.yml Co-authored-by: Michał Opala * Update core/src/epicli/data/common/validation/configuration/recovery.yml Co-authored-by: Michał Opala * Update core/src/epicli/data/common/validation/configuration/recovery.yml Co-authored-by: Michał Opala * Update core/src/epicli/data/common/validation/configuration/recovery.yml Co-authored-by: Michał Opala * Add regex for deeper validation * Add validations config for backup manifest * Fix true/false in additional values Co-authored-by: Michał Opala * Fix postgres check for RedHat (#1389) * Hotfix for the elasticsearch recovery procedure (#1381) * backup/recovery: logging/elasticsearch snapshot restore fix - ensuring kibana is always down - deleting all indices prior to restore * backup/recovery: logging/elasticsearch snapshot restore fix - ensuring all filebeat instances are always down * backup/recovery: logging/elasticsearch snapshot restore fix - ensuring kibana and filebeat instances will not be started (via reboot) during restore * backup/recovery: fixing load_balancer in RedHat (#1390) * Fix database check (#1391) Co-authored-by: Irek Głownia <48471627+plirglo@users.noreply.github.com> Co-authored-by: Marcin Pyrka --- core/src/epicli/cli/engine/BackupEngine.py | 28 ++ .../cli/engine/BackupRecoveryEngineBase.py | 120 +++++++++ core/src/epicli/cli/engine/PatchEngine.py | 46 ---- core/src/epicli/cli/engine/RecoveryEngine.py | 28 ++ .../cli/engine/schema/SchemaValidator.py | 28 +- core/src/epicli/cli/epicli.py | 39 ++- core/src/epicli/cli/helpers/Step.py | 2 +- .../epicli/cli/helpers/doc_list_helpers.py | 8 +- .../data/common/ansible/playbooks/backup.yml | 9 - .../ansible/playbooks/backup_kubernetes.yml | 13 + .../playbooks/backup_load_balancer.yml | 16 ++ .../ansible/playbooks/backup_logging.yml | 37 +++ .../ansible/playbooks/backup_monitoring.yml | 37 +++ .../ansible/playbooks/backup_postgresql.yml | 17 ++ .../ansible/playbooks/backup_rabbitmq.yml | 20 ++ .../common/ansible/playbooks/recovery.yml | 9 - .../playbooks/recovery_load_balancer.yml | 16 ++ .../ansible/playbooks/recovery_logging.yml | 34 +++ .../ansible/playbooks/recovery_monitoring.yml | 35 +++ .../ansible/playbooks/recovery_postgresql.yml | 17 ++ .../ansible/playbooks/recovery_rabbitmq.yml | 30 +++ .../playbooks/roles/backup/defaults/main.yml | 6 +- .../tasks/common/create_snapshot_archive.yml | 66 +++++ .../tasks/common/create_snapshot_checksum.yml | 32 +++ .../tasks/common/download_via_rsync.yml | 91 +++++++ .../roles/backup/tasks/kubernetes.yml | 99 ++++++++ .../tasks/load_balancer_haproxy_etc.yml | 33 +++ .../tasks/logging_elasticsearch_etc.yml | 27 ++ .../tasks/logging_elasticsearch_snapshot.yml | 84 ++++++ .../roles/backup/tasks/logging_kibana_etc.yml | 27 ++ .../playbooks/roles/backup/tasks/main.yml | 69 ----- .../backup/tasks/monitoring_grafana_data.yml | 27 ++ .../tasks/monitoring_prometheus_etc.yml | 27 ++ .../tasks/monitoring_prometheus_snapshot.yml | 52 ++++ .../roles/backup/tasks/postgresql.yml | 73 ++++++ .../tasks/rabbitmq_rabbitmq_definitions.yml | 50 ++++ .../backup/tasks/rabbitmq_rabbitmq_etc.yml | 27 ++ .../playbooks/roles/common/tasks/Debian.yml | 1 + .../playbooks/roles/common/tasks/RedHat.yml | 1 + .../tasks/configure-es.yml | 9 +- .../templates/elasticsearch.yml.j2 | 8 +- .../roles/recovery/defaults/main.yml | 6 +- .../playbooks/roles/recovery/meta/main.yml | 3 + .../tasks/common/clear_directories.yml | 45 ++++ .../tasks/common/find_snapshot_archive.yml | 56 ++++ .../tasks/common/upload_via_rsync.yml | 81 ++++++ .../tasks/common/verify_snapshot_checksum.yml | 31 +++ .../tasks/load_balancer_haproxy_etc.yml | 47 ++++ .../tasks/logging_elasticsearch_etc.yml | 38 +++ .../tasks/logging_elasticsearch_snapshot.yml | 117 +++++++++ .../recovery/tasks/logging_kibana_etc.yml | 38 +++ .../playbooks/roles/recovery/tasks/main.yml | 132 ---------- .../tasks/monitoring_grafana_data.yml | 38 +++ .../tasks/monitoring_prometheus_etc.yml | 38 +++ .../tasks/monitoring_prometheus_snapshot.yml | 38 +++ .../roles/recovery/tasks/postgresql.yml | 240 ++++++++++++++++++ .../tasks/rabbitmq_rabbitmq_definitions.yml | 55 ++++ .../recovery/tasks/rabbitmq_rabbitmq_etc.yml | 38 +++ .../centos-7/requirements.txt | 4 + .../redhat-7/requirements.txt | 4 + .../ubuntu-18.04/requirements.txt | 3 + .../common/defaults/configuration/backup.yml | 19 ++ .../configuration/feature-mapping.yml | 1 - .../common/defaults/configuration/logging.yml | 1 + .../defaults/configuration/prometheus.yml | 1 + .../defaults/configuration/recovery.yml | 20 ++ .../validation/configuration/backup.yml | 82 ++++++ .../validation/configuration/recovery.yml | 91 +++++++ .../tests/helpers/test_doc_list_helpers.py | 34 ++- docs/home/HOWTO.md | 7 + docs/home/howto/BACKUP.md | 155 +++++++++++ 71 files changed, 2567 insertions(+), 294 deletions(-) create mode 100644 core/src/epicli/cli/engine/BackupEngine.py create mode 100644 core/src/epicli/cli/engine/BackupRecoveryEngineBase.py delete mode 100644 core/src/epicli/cli/engine/PatchEngine.py create mode 100644 core/src/epicli/cli/engine/RecoveryEngine.py delete mode 100644 core/src/epicli/data/common/ansible/playbooks/backup.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/backup_kubernetes.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/backup_load_balancer.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/backup_logging.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/backup_monitoring.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/backup_postgresql.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/backup_rabbitmq.yml delete mode 100644 core/src/epicli/data/common/ansible/playbooks/recovery.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/recovery_load_balancer.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/recovery_logging.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/recovery_monitoring.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/recovery_postgresql.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/recovery_rabbitmq.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_archive.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_checksum.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/download_via_rsync.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/kubernetes.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/load_balancer_haproxy_etc.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_etc.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_snapshot.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_kibana_etc.yml delete mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/main.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_grafana_data.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_etc.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_snapshot.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/postgresql.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_definitions.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_etc.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/meta/main.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/clear_directories.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/find_snapshot_archive.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/upload_via_rsync.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/verify_snapshot_checksum.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/load_balancer_haproxy_etc.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_etc.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_snapshot.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_kibana_etc.yml delete mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/main.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_grafana_data.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_etc.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_snapshot.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/postgresql.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_definitions.yml create mode 100644 core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_etc.yml create mode 100644 core/src/epicli/data/common/defaults/configuration/backup.yml create mode 100644 core/src/epicli/data/common/defaults/configuration/recovery.yml create mode 100644 core/src/epicli/data/common/validation/configuration/backup.yml create mode 100644 core/src/epicli/data/common/validation/configuration/recovery.yml create mode 100644 docs/home/howto/BACKUP.md diff --git a/core/src/epicli/cli/engine/BackupEngine.py b/core/src/epicli/cli/engine/BackupEngine.py new file mode 100644 index 0000000000..78b153b8a8 --- /dev/null +++ b/core/src/epicli/cli/engine/BackupEngine.py @@ -0,0 +1,28 @@ +from cli.helpers.doc_list_helpers import select_single +from cli.engine.BackupRecoveryEngineBase import BackupRecoveryEngineBase + + +class BackupEngine(BackupRecoveryEngineBase): + """Perform backup operations.""" + + def __init__(self, input_data): + super(BackupRecoveryEngineBase, self).__init__(__name__) # late call of the Step.__init__(__name__) + super(BackupEngine, self).__init__(input_data) + + def backup(self): + """Backup all enabled components.""" + + self._process_input_docs() + self._process_configuration_docs() + + # Get backup config document + backup_doc = select_single(self.configuration_docs, lambda x: x.kind == 'configuration/backup') + + self._update_role_files_and_vars('backup', backup_doc) + + # Execute all enabled component playbooks sequentially + for component_name, component_config in sorted(backup_doc.specification.components.items()): + if component_config.enabled: + self._update_playbook_files_and_run('backup', component_name) + + return 0 diff --git a/core/src/epicli/cli/engine/BackupRecoveryEngineBase.py b/core/src/epicli/cli/engine/BackupRecoveryEngineBase.py new file mode 100644 index 0000000000..d3cef1e35e --- /dev/null +++ b/core/src/epicli/cli/engine/BackupRecoveryEngineBase.py @@ -0,0 +1,120 @@ +import os +import copy + +from cli.version import VERSION +from cli.helpers.Step import Step + +from cli.helpers.build_saver import get_inventory_path_for_build +from cli.helpers.build_saver import copy_files_recursively, copy_file +from cli.helpers.build_saver import MANIFEST_FILE_NAME + +from cli.helpers.yaml_helpers import dump +from cli.helpers.data_loader import load_yamls_file, load_yaml_obj, types as data_types +from cli.helpers.doc_list_helpers import select_single, ExpectedSingleResultException + +from cli.engine.schema.SchemaValidator import SchemaValidator +from cli.engine.schema.DefaultMerger import DefaultMerger + +from cli.engine.ansible.AnsibleCommand import AnsibleCommand +from cli.engine.ansible.AnsibleRunner import AnsibleRunner + + +class BackupRecoveryEngineBase(Step): + """Perform backup and recovery operations (abstract base class).""" + + def __init__(self, input_data): + # super(BackupRecoveryEngineBase, self).__init__(__name__) needs to be called in any subclass + self.file = input_data.file + self.build_directory = input_data.build_directory + self.manifest_docs = list() + self.input_docs = list() + self.configuration_docs = list() + self.cluster_model = None + self.backup_doc = None + self.recovery_doc = None + self.ansible_command = AnsibleCommand() + + def __enter__(self): + super().__enter__() + return self + + def __exit__(self, exc_type, exc_value, traceback): + super().__exit__(exc_type, exc_value, traceback) + + def _process_input_docs(self): + """Load, validate and merge (with defaults) input yaml documents.""" + + path_to_manifest = os.path.join(self.build_directory, MANIFEST_FILE_NAME) + if not os.path.isfile(path_to_manifest): + raise Exception('No manifest.yml inside the build folder') + + # Get existing manifest config documents + self.manifest_docs = load_yamls_file(path_to_manifest) + self.cluster_model = select_single(self.manifest_docs, lambda x: x.kind == 'epiphany-cluster') + + # Load backup / recovery configuration documents + self.input_docs = load_yamls_file(self.file) + + # Validate input documents + with SchemaValidator(self.cluster_model, self.input_docs) as schema_validator: + schema_validator.run_for_individual_documents() + + # Merge the input docs with defaults + with DefaultMerger(self.input_docs) as doc_merger: + self.input_docs = doc_merger.run() + + def _process_configuration_docs(self): + """Populate input yaml documents with additional required ad-hoc data.""" + + # Seed the self.configuration_docs + self.configuration_docs = copy.deepcopy(self.input_docs) + + # Please notice using DefaultMerger is not needed here, since it is done already at this point. + # We just check if documents are missing and insert default ones without the unneeded merge operation. + for kind in {'configuration/backup', 'configuration/recovery'}: + try: + # Check if the required document is in user inputs + document = select_single(self.configuration_docs, lambda x: x.kind == kind) + except ExpectedSingleResultException: + # If there is no document provided by the user, then fallback to defaults + document = load_yaml_obj(data_types.DEFAULT, 'common', kind) + # Inject the required "version" attribute + document['version'] = VERSION + # Copy the "provider" value from the cluster model + document['provider'] = self.cluster_model.provider + # Save the document for later use + self.configuration_docs.append(document) + finally: + # Copy the "provider" value to the specification as well + document.specification['provider'] = document['provider'] + + def _update_role_files_and_vars(self, action, document): + """Render mandatory vars files for backup/recovery ansible roles inside the existing build directory.""" + + self.logger.info(f'Updating {action} role files...') + + # Copy role files + roles_build_path = os.path.join(self.build_directory, 'ansible/roles', action) + roles_source_path = os.path.join(AnsibleRunner.ANSIBLE_PLAYBOOKS_PATH, 'roles', action) + copy_files_recursively(roles_source_path, roles_build_path) + + # Render role vars + vars_dir = os.path.join(roles_build_path, 'vars') + os.makedirs(vars_dir, exist_ok=True) + vars_file_path = os.path.join(vars_dir, 'main.yml') + with open(vars_file_path, 'w') as stream: + dump(document, stream) + + def _update_playbook_files_and_run(self, action, component): + """Update backup/recovery ansible playbooks inside the existing build directory and run the provisioning.""" + + self.logger.info(f'Running {action} on {component}...') + + # Copy playbook file + playbook_build_path = os.path.join(self.build_directory, 'ansible', f'{action}_{component}.yml') + playbook_source_path = os.path.join(AnsibleRunner.ANSIBLE_PLAYBOOKS_PATH, f'{action}_{component}.yml') + copy_file(playbook_source_path, playbook_build_path) + + # Run the playbook + inventory_path = get_inventory_path_for_build(self.build_directory) + self.ansible_command.run_playbook(inventory=inventory_path, playbook_path=playbook_build_path) diff --git a/core/src/epicli/cli/engine/PatchEngine.py b/core/src/epicli/cli/engine/PatchEngine.py deleted file mode 100644 index 0d08e77b5f..0000000000 --- a/core/src/epicli/cli/engine/PatchEngine.py +++ /dev/null @@ -1,46 +0,0 @@ -import os - -from cli.helpers.Step import Step -from cli.engine.ansible.AnsibleCommand import AnsibleCommand -from cli.engine.ansible.AnsibleRunner import AnsibleRunner -from cli.helpers.Config import Config -from cli.helpers.build_saver import copy_files_recursively, copy_file, get_inventory_path_for_build - - -class PatchEngine(Step): - def __init__(self, input_data): - super().__init__(__name__) - self.build_directory = input_data.build_directory - self.ansible_command = AnsibleCommand() - - def __enter__(self): - super().__enter__() - return self - - def __exit__(self, exc_type, exc_value, traceback): - super().__exit__(exc_type, exc_value, traceback) - - def backup(self): - self.upgrade_patch_files_and_run('backup') - return 0 - - def recovery(self): - self.upgrade_patch_files_and_run('recovery') - return 0 - - def upgrade_patch_files_and_run(self, action): - self.logger.info(f'Running {action}...') - - #copy role files - roles_build_path = os.path.join(self.build_directory, 'ansible/roles', action) - roles_source_path = os.path.join(AnsibleRunner.ANSIBLE_PLAYBOOKS_PATH, 'roles', action) - copy_files_recursively(roles_source_path, roles_build_path) - - #copy playbook file - playbook_build_path = os.path.join(self.build_directory, 'ansible/') + action + '.yml' - playbook_source_path = os.path.join(AnsibleRunner.ANSIBLE_PLAYBOOKS_PATH) + action + '.yml' - copy_file(playbook_source_path, playbook_build_path) - - #run the playbook - inventory_path = get_inventory_path_for_build(self.build_directory) - self.ansible_command.run_playbook(inventory=inventory_path, playbook_path=playbook_build_path) \ No newline at end of file diff --git a/core/src/epicli/cli/engine/RecoveryEngine.py b/core/src/epicli/cli/engine/RecoveryEngine.py new file mode 100644 index 0000000000..9a2b12c41d --- /dev/null +++ b/core/src/epicli/cli/engine/RecoveryEngine.py @@ -0,0 +1,28 @@ +from cli.helpers.doc_list_helpers import select_single +from cli.engine.BackupRecoveryEngineBase import BackupRecoveryEngineBase + + +class RecoveryEngine(BackupRecoveryEngineBase): + """Perform recovery operations.""" + + def __init__(self, input_data): + super(BackupRecoveryEngineBase, self).__init__(__name__) # late call of the Step.__init__(__name__) + super(RecoveryEngine, self).__init__(input_data) + + def recovery(self): + """Recover all enabled components.""" + + self._process_input_docs() + self._process_configuration_docs() + + # Get recovery config document + recovery_doc = select_single(self.configuration_docs, lambda x: x.kind == 'configuration/recovery') + + self._update_role_files_and_vars('recovery', recovery_doc) + + # Execute all enabled component playbooks sequentially + for component_name, component_config in sorted(recovery_doc.specification.components.items()): + if component_config.enabled: + self._update_playbook_files_and_run('recovery', component_name) + + return 0 diff --git a/core/src/epicli/cli/engine/schema/SchemaValidator.py b/core/src/epicli/cli/engine/schema/SchemaValidator.py index 2d1e40906d..e132a785d7 100644 --- a/core/src/epicli/cli/engine/schema/SchemaValidator.py +++ b/core/src/epicli/cli/engine/schema/SchemaValidator.py @@ -13,13 +13,13 @@ def __init__(self, cluster_model, validation_docs): self.validation_docs = validation_docs base = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, 'core/base') - definitions = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, 'core/definitions') + self.definitions = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, 'core/definitions') self.base_schema = dict_to_objdict(deepcopy(base)) - self.base_schema['definitions'] = definitions + self.base_schema['definitions'] = self.definitions self.base_schema_no_provider = dict_to_objdict(deepcopy(base)) - self.base_schema_no_provider['definitions'] = definitions + self.base_schema_no_provider['definitions'] = self.definitions del self.base_schema_no_provider.required[0] del self.base_schema_no_provider.properties['provider'] @@ -32,6 +32,27 @@ def get_base_schema(self, kind): schema.properties.kind.pattern = '^(' + kind + ')$' return schema + def run_for_individual_documents(self): + for doc in self.validation_docs: + # Load document schema + schema = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, doc.kind) + + # Include "definitions" + schema['definitions'] = self.definitions + + # Warn the user about the missing validation + if hasattr(schema, '$ref'): + if schema['$ref'] == '#/definitions/unvalidated_specification': + self.logger.warn('No specification validation for ' + doc.kind) + + # Assert the schema + try: + validate(instance=objdict_to_dict(doc), schema=objdict_to_dict(schema)) + except Exception as e: + self.logger.error(f'Failed validating: {doc.kind}') + self.logger.error(e) + raise Exception('Schema validation error, see the error above.') + def run(self): for doc in self.validation_docs: self.logger.info(f'Validating: {doc.kind}') @@ -46,4 +67,3 @@ def run(self): self.logger.error(f'Failed validating: {doc.kind}') self.logger.error(e) raise Exception('Schema validation error, see the error above.') - diff --git a/core/src/epicli/cli/epicli.py b/core/src/epicli/cli/epicli.py index e4d5750250..99696be3fc 100644 --- a/core/src/epicli/cli/epicli.py +++ b/core/src/epicli/cli/epicli.py @@ -11,10 +11,11 @@ import socket from cli.engine.ApplyEngine import ApplyEngine -from cli.engine.PatchEngine import PatchEngine +from cli.engine.BackupEngine import BackupEngine from cli.engine.DeleteEngine import DeleteEngine from cli.engine.InitEngine import InitEngine from cli.engine.PrepareEngine import PrepareEngine +from cli.engine.RecoveryEngine import RecoveryEngine from cli.engine.UpgradeEngine import UpgradeEngine from cli.engine.TestEngine import TestEngine from cli.helpers.Log import Log @@ -92,12 +93,11 @@ def debug_level(x): upgrade_parser(subparsers) delete_parser(subparsers) test_parser(subparsers) - ''' validate_parser(subparsers) + ''' backup_parser(subparsers) recovery_parser(subparsers) - ''' # check if there were any variables and display full help if len(sys.argv) < 2: @@ -260,36 +260,47 @@ def run_validate(args): return engine.validate() sub_parser.set_defaults(func=run_validate) +''' def backup_parser(subparsers): + """Configure and execute backup of cluster components.""" + sub_parser = subparsers.add_parser('backup', - description='[Experimental]: Backups existing Epiphany Platform components.') + description='Create backup of cluster components.') + sub_parser.add_argument('-f', '--file', dest='file', type=str, required=True, + help='Backup configuration definition file to use.') sub_parser.add_argument('-b', '--build', dest='build_directory', type=str, required=True, - help='Absolute path to directory with build artifacts.') + help='Absolute path to directory with build artifacts.', + default=None) def run_backup(args): - experimental_query() - adjust_paths_from_build(args) - with PatchEngine(args) as engine: + adjust_paths_from_file(args) + with BackupEngine(args) as engine: return engine.backup() sub_parser.set_defaults(func=run_backup) def recovery_parser(subparsers): - sub_parser = subparsers.add_parser('recovery', description='[Experimental]: Recover from existing backup.') + """Configure and execute recovery of cluster components.""" + + sub_parser = subparsers.add_parser('recovery', + description='Recover from existing backup.') + sub_parser.add_argument('-f', '--file', dest='file', type=str, required=True, + help='Recovery configuration definition file to use.') sub_parser.add_argument('-b', '--build', dest='build_directory', type=str, required=True, - help='Absolute path to directory with build artifacts.') + help='Absolute path to directory with build artifacts.', + default=None) def run_recovery(args): - experimental_query() - adjust_paths_from_build(args) - with PatchEngine(args) as engine: + if not query_yes_no('Do you really want to perform recovery?'): + return 0 + adjust_paths_from_file(args) + with RecoveryEngine(args) as engine: return engine.recovery() sub_parser.set_defaults(func=run_recovery) -''' def experimental_query(): diff --git a/core/src/epicli/cli/helpers/Step.py b/core/src/epicli/cli/helpers/Step.py index 7adf21be9a..b3c16d07fb 100644 --- a/core/src/epicli/cli/helpers/Step.py +++ b/core/src/epicli/cli/helpers/Step.py @@ -1,6 +1,6 @@ import time from cli.helpers.Log import Log -from abc import ABCMeta, abstractmethod +from abc import ABCMeta class Step(metaclass=ABCMeta): diff --git a/core/src/epicli/cli/helpers/doc_list_helpers.py b/core/src/epicli/cli/helpers/doc_list_helpers.py index 926d9125d1..4e1d793f11 100644 --- a/core/src/epicli/cli/helpers/doc_list_helpers.py +++ b/core/src/epicli/cli/helpers/doc_list_helpers.py @@ -1,3 +1,9 @@ + +class ExpectedSingleResultException(Exception): + """Raised when the query returns none or too many results.""" + pass + + def select_first(documents, query): if documents is not None: for x in documents: @@ -22,5 +28,5 @@ def select_single(documents, query): elements_count = len(results) if elements_count == 1: return results[0] - raise Exception("Expected one element but received: " + str(elements_count)) + raise ExpectedSingleResultException("Expected one element but received: " + str(elements_count)) return None diff --git a/core/src/epicli/data/common/ansible/playbooks/backup.yml b/core/src/epicli/data/common/ansible/playbooks/backup.yml deleted file mode 100644 index 35575988d4..0000000000 --- a/core/src/epicli/data/common/ansible/playbooks/backup.yml +++ /dev/null @@ -1,9 +0,0 @@ ---- -# Ansible playbook for backing up Kubernetes cluster - -- hosts: kubernetes_master - serial: 1 - become: true - become_method: sudo - roles: - - backup diff --git a/core/src/epicli/data/common/ansible/playbooks/backup_kubernetes.yml b/core/src/epicli/data/common/ansible/playbooks/backup_kubernetes.yml new file mode 100644 index 0000000000..6922569b1d --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/backup_kubernetes.yml @@ -0,0 +1,13 @@ +--- +# Ansible playbook for backing up Kubernetes cluster + +- hosts: kubernetes_master[0] + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.kubernetes.enabled | default(false) + block: + - import_role: + name: backup + tasks_from: kubernetes diff --git a/core/src/epicli/data/common/ansible/playbooks/backup_load_balancer.yml b/core/src/epicli/data/common/ansible/playbooks/backup_load_balancer.yml new file mode 100644 index 0000000000..baa0e9695b --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/backup_load_balancer.yml @@ -0,0 +1,16 @@ +--- +# Ansible playbook for backing up load_balancer config + +- hosts: haproxy[0] + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.load_balancer.enabled | default(false) + block: + - include_vars: + file: roles/haproxy/vars/main.yml + name: component_vars + - import_role: + name: backup + tasks_from: load_balancer_haproxy_etc diff --git a/core/src/epicli/data/common/ansible/playbooks/backup_logging.yml b/core/src/epicli/data/common/ansible/playbooks/backup_logging.yml new file mode 100644 index 0000000000..c1252ec696 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/backup_logging.yml @@ -0,0 +1,37 @@ +--- +# Ansible playbook for backing up logging data + +- hosts: logging[0] + gather_facts: true + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.logging.enabled | default(false) + block: + - include_vars: + file: roles/logging/vars/main.yml + name: component_vars + - import_role: + name: backup + tasks_from: logging_elasticsearch_snapshot + - import_role: + name: backup + tasks_from: logging_elasticsearch_etc + +- hosts: kibana[0] + gather_facts: true + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.logging.enabled | default(false) + block: + - include_vars: + file: roles/kibana/vars/main.yml + name: component_vars + - import_role: + name: backup + tasks_from: logging_kibana_etc + vars: + snapshot_name: "{{ hostvars[groups.logging.0].snapshot_name }}" diff --git a/core/src/epicli/data/common/ansible/playbooks/backup_monitoring.yml b/core/src/epicli/data/common/ansible/playbooks/backup_monitoring.yml new file mode 100644 index 0000000000..32b2f61fcf --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/backup_monitoring.yml @@ -0,0 +1,37 @@ +--- +# Ansible playbook for backing up monitoring data + +- hosts: prometheus[0] + gather_facts: true + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.monitoring.enabled | default(false) + block: + - include_vars: + file: roles/prometheus/vars/main.yml + name: component_vars + - import_role: + name: backup + tasks_from: monitoring_prometheus_snapshot + - import_role: + name: backup + tasks_from: monitoring_prometheus_etc + +- hosts: grafana[0] + gather_facts: true + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.monitoring.enabled | default(false) + block: + - include_vars: + file: roles/grafana/vars/main.yml + name: component_vars + - import_role: + name: backup + tasks_from: monitoring_grafana_data + vars: + snapshot_name: "{{ hostvars[groups.prometheus.0].snapshot_name }}" diff --git a/core/src/epicli/data/common/ansible/playbooks/backup_postgresql.yml b/core/src/epicli/data/common/ansible/playbooks/backup_postgresql.yml new file mode 100644 index 0000000000..0d6394bd38 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/backup_postgresql.yml @@ -0,0 +1,17 @@ +--- +# Ansible playbook for backing up Postgresql database + +- hosts: postgresql + become: true + become_method: sudo + tasks: + - when: specification.components.postgresql.enabled | default(false) + block: + - include_vars: + file: roles/postgresql/vars/main.yml + name: component_vars + - import_role: + name: backup + tasks_from: postgresql + vars_files: + - roles/postgresql/defaults/main.yml diff --git a/core/src/epicli/data/common/ansible/playbooks/backup_rabbitmq.yml b/core/src/epicli/data/common/ansible/playbooks/backup_rabbitmq.yml new file mode 100644 index 0000000000..a15b4c692f --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/backup_rabbitmq.yml @@ -0,0 +1,20 @@ +--- +# Ansible playbook for backing up rabbitmq config + +- hosts: rabbitmq[0] + gather_facts: true + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.rabbitmq.enabled | default(false) + block: + - include_vars: + file: roles/rabbitmq/vars/main.yml + name: component_vars + - import_role: + name: backup + tasks_from: rabbitmq_rabbitmq_definitions + - import_role: + name: backup + tasks_from: rabbitmq_rabbitmq_etc diff --git a/core/src/epicli/data/common/ansible/playbooks/recovery.yml b/core/src/epicli/data/common/ansible/playbooks/recovery.yml deleted file mode 100644 index 25281a0ae3..0000000000 --- a/core/src/epicli/data/common/ansible/playbooks/recovery.yml +++ /dev/null @@ -1,9 +0,0 @@ ---- -# Ansible playbook for recovering Kubernetes cluster - -- hosts: kubernetes_master - serial: 1 - become: true - become_method: sudo - roles: - - recovery diff --git a/core/src/epicli/data/common/ansible/playbooks/recovery_load_balancer.yml b/core/src/epicli/data/common/ansible/playbooks/recovery_load_balancer.yml new file mode 100644 index 0000000000..015c757dfe --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/recovery_load_balancer.yml @@ -0,0 +1,16 @@ +--- +# Ansible playbook for recovering load_balancer config + +- hosts: haproxy[0] + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.load_balancer.enabled | default(false) + block: + - include_vars: + file: roles/haproxy/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: load_balancer_haproxy_etc diff --git a/core/src/epicli/data/common/ansible/playbooks/recovery_logging.yml b/core/src/epicli/data/common/ansible/playbooks/recovery_logging.yml new file mode 100644 index 0000000000..796d1c0bae --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/recovery_logging.yml @@ -0,0 +1,34 @@ +--- +# Ansible playbook for recovering logging data + +- hosts: logging[0] + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.logging.enabled | default(false) + block: + - include_vars: + file: roles/logging/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: logging_elasticsearch_etc + - import_role: + name: recovery + tasks_from: logging_elasticsearch_snapshot + +- hosts: kibana[0] + gather_facts: true + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.logging.enabled | default(false) + block: + - include_vars: + file: roles/kibana/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: logging_kibana_etc diff --git a/core/src/epicli/data/common/ansible/playbooks/recovery_monitoring.yml b/core/src/epicli/data/common/ansible/playbooks/recovery_monitoring.yml new file mode 100644 index 0000000000..a0968eebfe --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/recovery_monitoring.yml @@ -0,0 +1,35 @@ +--- +# Ansible playbook for recovering monitoring data + +- hosts: prometheus[0] + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.monitoring.enabled | default(false) + block: + - include_vars: + file: roles/prometheus/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: monitoring_prometheus_etc + - import_role: + name: recovery + tasks_from: monitoring_prometheus_snapshot + vars_files: + - roles/prometheus/vars/main.yml + +- hosts: grafana[0] + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.monitoring.enabled | default(false) + block: + - include_vars: + file: roles/grafana/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: monitoring_grafana_data diff --git a/core/src/epicli/data/common/ansible/playbooks/recovery_postgresql.yml b/core/src/epicli/data/common/ansible/playbooks/recovery_postgresql.yml new file mode 100644 index 0000000000..1e73865bca --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/recovery_postgresql.yml @@ -0,0 +1,17 @@ +--- +# Ansible playbook for recovering Postgresql database + +- hosts: postgresql + become: true + become_method: sudo + tasks: + - when: specification.components.postgresql.enabled | default(false) + block: + - include_vars: + file: roles/postgresql/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: postgresql + vars_files: + - roles/postgresql/defaults/main.yml diff --git a/core/src/epicli/data/common/ansible/playbooks/recovery_rabbitmq.yml b/core/src/epicli/data/common/ansible/playbooks/recovery_rabbitmq.yml new file mode 100644 index 0000000000..799ebe6389 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/recovery_rabbitmq.yml @@ -0,0 +1,30 @@ +--- +# Ansible playbook for recovering rabbitmq config + +- hosts: rabbitmq + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.rabbitmq.enabled | default(false) + block: + - include_vars: + file: roles/rabbitmq/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: rabbitmq_rabbitmq_etc + +- hosts: rabbitmq[0] + become: true + become_method: sudo + serial: 1 + tasks: + - when: specification.components.rabbitmq.enabled | default(false) + block: + - include_vars: + file: roles/rabbitmq/vars/main.yml + name: component_vars + - import_role: + name: recovery + tasks_from: rabbitmq_rabbitmq_definitions diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/defaults/main.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/defaults/main.yml index 7ae2f6b6dd..e28a0c933d 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/backup/defaults/main.yml +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/defaults/main.yml @@ -1,2 +1,6 @@ --- -backup_dir: /home/{{ admin_user.name }}/backupdir +backup_dir: /epibackup +backup_destination_dir: "{{ backup_dir }}/mounted" +backup_destination_host: "{{ resolved_repository_hostname | default(groups.repository.0) }}" +elasticsearch_snapshot_repository_name: epiphany +elasticsearch_snapshot_repository_location: /var/lib/elasticsearch-snapshots diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_archive.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_archive.yml new file mode 100644 index 0000000000..90be85bfd6 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_archive.yml @@ -0,0 +1,66 @@ +--- +# Invoke with (example): +#- set_fact: +# snapshot_prefix: "haproxy_etc" +# snapshot_name: "20200526-102034" +# dirs_to_archive: +# - /etc/haproxy/ +# - /etc/ssl/haproxy/ +# files_to_archive: +# - /var/lib/rabbitmq/definitions/definitions-{{ snapshot_name }}.json + +- name: Assert that the snapshot_prefix fact is defined and valid + assert: + that: + - snapshot_prefix is defined + - snapshot_prefix is string + - snapshot_prefix | length > 0 + fail_msg: The snapshot_prefix fact must be defined and must be a non-empty string. + +- name: Assert that the snapshot_name fact is defined and valid + assert: + that: + - snapshot_name is defined + - snapshot_name is string + - snapshot_name | length > 0 + fail_msg: The snapshot_name fact must be defined and must be a non-empty string. + +- name: Reconstruct the paths_to_archive list + set_fact: + paths_to_archive: >- + {{ (dirs_to_archive_corrected + files_to_archive_corrected) | unique }} + vars: + # remove empty strings and make sure each path ends with single / + dirs_to_archive_corrected: >- + {{ dirs_to_archive | default([]) + | map('regex_replace', '//*$', '') + | select + | map('regex_replace', '$', '/') + | list }} + # remove empty strings + files_to_archive_corrected: >- + {{ files_to_archive | default([]) + | select + | list }} + +- name: Assert that the paths_to_archive list has at least one element + assert: + that: + - paths_to_archive | length > 0 + fail_msg: The paths_to_archive list must contain at least one element. + +- name: Reconstruct the snapshot_path + set_fact: + snapshot_path: "{{ backup_dir }}/{{ snapshot_prefix }}_{{ snapshot_name }}.tar.gz" + +- name: Ensure backup directory exists + file: + path: "{{ backup_dir }}/" + state: directory + +- name: Create the archive + archive: + dest: "{{ snapshot_path }}" + path: "{{ paths_to_archive }}" + format: gz + force_archive: true diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_checksum.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_checksum.yml new file mode 100644 index 0000000000..4460f815c8 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/create_snapshot_checksum.yml @@ -0,0 +1,32 @@ +--- +# Invoke with (example): +#- set_fact: +# snapshot_path: "{{ backup_dir }}/{{ snapshot_prefix }}_{{ snapshot_name }}.tar.gz" + +- name: Assert that the snapshot_path fact is defined and valid + assert: + that: + - snapshot_path is defined + - snapshot_path is string + - snapshot_path | length > 0 + fail_msg: The snapshot_path fact must be defined and must be a non-empty string. + +- name: Ensure backup directory exists + file: + path: "{{ backup_dir }}/" + state: directory + +- name: Calculate the checksum + stat: + path: "{{ snapshot_path }}" + get_attributes: false + get_checksum: true + get_mime: false + checksum_algorithm: sha1 + register: stat_checksum + +- name: Save the checksum + copy: + dest: "{{ snapshot_path }}.sha1" + content: | + {{ stat_checksum.stat.checksum }} {{ snapshot_path | basename }} diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/download_via_rsync.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/download_via_rsync.yml new file mode 100644 index 0000000000..768e055813 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/common/download_via_rsync.yml @@ -0,0 +1,91 @@ +--- +# Invoke with (example): +#- set_fact: +# artifacts: +# - /tmp/artifact1 +# - /tmp/artifact2 + +- name: Assert that the "artifacts" fact is defined and valid + assert: + that: + - artifacts is defined + - artifacts is sequence + - artifacts | length > 0 + fail_msg: The "artifacts" fact must be defined and must be a non-empty list. + +- name: Download artifacts to mounted storage + + delegate_to: "{{ backup_destination_host }}" + + always: + - name: Delete generated files + file: + path: "{{ item }}" + state: absent + loop: + - "{{ private_key_file.path }}" + - "{{ private_key_file.path }}.pub" + + - delegate_to: "{{ inventory_hostname }}" # cancel previous delegate_to + block: + - name: Remove public openssh key from admin's authorized_keys + authorized_key: + user: "{{ admin_user.name }}" + state: absent + key: >- + {{ openssh_keypair.public_key }} + + block: + - name: Ensure that .ssh directory exists + file: + path: ~/.ssh/ + state: directory + + - name: Create a temporary file path to hold the private key in + tempfile: + path: ~/.ssh/ + suffix: .tmp + state: file + register: private_key_file + + - name: Generate openssh keypair for rsync over ssh + openssh_keypair: + path: "{{ private_key_file.path }}" + size: 2048 + force: true + register: openssh_keypair + + - delegate_to: "{{ inventory_hostname }}" # cancel previous delegate_to + block: + - name: Add public openssh key to admin's authorized_keys + authorized_key: + user: "{{ admin_user.name }}" + state: present + key: >- + {{ openssh_keypair.public_key }} + + - name: Ensure destination directory for artifacts exists + file: + path: "{{ backup_destination_dir }}/" + state: directory + + - name: Use rsync to copy all artifacts + synchronize: + mode: pull + dest: "{{ backup_destination_dir }}" + src: "{{ item }}" + checksum: true + rsync_opts: + - --rsh={{ rsh }} + vars: + # this fixes / replaces incorrect path to the private key file that synchronize provides + # (setting private_key parameter has no effect whatsoever, looks like a bug tbh) + rsh: >- + /usr/bin/ssh -S none -i {{ private_key_file.path }} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null + loop: "{{ artifacts }}" + + - name: Remove copied artifacts from source + file: + path: "{{ item }}" + state: absent + loop: "{{ artifacts }}" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/kubernetes.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/kubernetes.yml new file mode 100644 index 0000000000..40c0141794 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/kubernetes.yml @@ -0,0 +1,99 @@ +--- +- name: Set helper facts + set_fact: + snapshot_name: >- + {{ ansible_date_time.iso8601_basic_short | replace('T','-') }} + +- name: Ensure backup directory exists + file: + path: "{{ backup_dir }}/" + state: directory + mode: u=rwx,go= + +- name: Create temporary directory + tempfile: + path: "{{ backup_dir }}/" + suffix: .tmp + state: directory + register: backup_temp_dir + +- name: Save backup and cleanup afterwards + always: + - name: Delete temporary directory + file: + path: "{{ backup_temp_dir.path }}/" + state: absent + + block: + - name: Get etcd image name + shell: | + kubectl get pods \ + --all-namespaces \ + --output jsonpath={{ jsonpath }} + vars: + jsonpath: >- + "{.items[*].spec.containers[?(@.name=='etcd')].image}" + environment: + KUBECONFIG: /etc/kubernetes/admin.conf + register: etcd_image_name + + - name: Save etcd image name to a file + copy: + dest: "{{ backup_temp_dir.path }}/etcd_ver.txt" + content: |- + {{ etcd_image_name.stdout | trim }} + + - name: Save kubernetes PKI + copy: + src: /etc/kubernetes/pki # do not put / at the end here! + dest: "{{ backup_temp_dir.path }}/" + remote_src: true + + - name: Save etcd snapshot + shell: | + docker run \ + -v "{{ backup_temp_dir.path }}/:/backup/" \ + --network host \ + --env ETCDCTL_API=3 \ + --rm "{{ etcd_image_name.stdout | trim }}" \ + etcdctl \ + --endpoints https://127.0.0.1:2379 \ + --cacert /backup/pki/etcd/ca.crt \ + --cert /backup/pki/etcd/healthcheck-client.crt \ + --key /backup/pki/etcd/healthcheck-client.key \ + snapshot save /backup/etcd-snapshot.db + args: + executable: /bin/bash + + - name: Check if kubeadm configuration file exists + stat: + path: /etc/kubeadm/kubeadm-config.yml + get_attributes: false + get_checksum: false + get_mime: false + register: stat_kubeadm_config_yml + + - when: stat_kubeadm_config_yml.stat.exists + block: + - name: Save kubeadm configuration file + copy: + src: "{{ stat_kubeadm_config_yml.stat.path }}" + dest: "{{ backup_temp_dir.path }}/" + remote_src: true + + - name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "k8s_snapshot" + dirs_to_archive: + - "{{ backup_temp_dir.path }}/" + + - name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + + - name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/load_balancer_haproxy_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/load_balancer_haproxy_etc.yml new file mode 100644 index 0000000000..93d31e37e9 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/load_balancer_haproxy_etc.yml @@ -0,0 +1,33 @@ +--- +- name: Set helper facts + set_fact: + snapshot_name: >- + {{ ansible_date_time.iso8601_basic_short | replace('T','-') }} + +- debug: var=snapshot_name + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "haproxy_etc" + dirs_to_archive: >- + {{ _dirs_to_archive_switch[ansible_os_family] | default(_dirs_to_archive_switch.default) }} + # Simulate some basic in-place switch/case expression using a dictionary. + _dirs_to_archive_switch: + RedHat: + - /etc/haproxy/ + - /etc/ssl/haproxy/ + - /etc/opt/rh/rh-haproxy18/haproxy/ + default: + - /etc/haproxy/ + - /etc/ssl/haproxy/ + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_etc.yml new file mode 100644 index 0000000000..b9e2bf79db --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_etc.yml @@ -0,0 +1,27 @@ +--- +- name: Assert that the snapshot_name fact is defined and valid + assert: + that: + - snapshot_name is defined + - snapshot_name is string + - snapshot_name | length > 0 + fail_msg: The snapshot_name fact must be defined and must be a non-empty string. + +- debug: var=snapshot_name + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "elasticsearch_etc" + dirs_to_archive: + - /etc/elasticsearch/ + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_snapshot.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_snapshot.yml new file mode 100644 index 0000000000..b6e44e0ada --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_elasticsearch_snapshot.yml @@ -0,0 +1,84 @@ +--- +- name: Set helper facts + set_fact: + elasticsearch_endpoint: >- + https://{{ ansible_default_ipv4.address }}:9200 + snapshot_name: >- + {{ ansible_date_time.iso8601_basic_short | replace('T','-') }} + vars: + uri_template: &uri + client_cert: /etc/elasticsearch/kirk.pem + client_key: /etc/elasticsearch/kirk-key.pem + validate_certs: false + body_format: json + +- debug: var=snapshot_name + +- name: Check cluster health + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_cluster/health" + method: GET + register: uri_response + until: uri_response is success + retries: 12 + delay: 5 + +- name: Ensure snapshot repository is defined + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_snapshot/{{ elasticsearch_snapshot_repository_name }}" + method: PUT + body: + type: fs + settings: + location: "{{ elasticsearch_snapshot_repository_location }}" + compress: true + +- name: Trigger snapshot creation + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_snapshot/{{ elasticsearch_snapshot_repository_name }}/{{ snapshot_name }}" + method: PUT + +- name: Wait (up to 12h) for snapshot completion + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_snapshot/{{ elasticsearch_snapshot_repository_name }}/{{ snapshot_name }}" + method: GET + register: uri_response + until: (uri_response.json.snapshots | selectattr('snapshot', 'equalto', snapshot_name) | first).state == "SUCCESS" + retries: "{{ (12 * 3600 // 10) | int }}" # 12h + delay: 10 + +- name: Find all snapshots + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_snapshot/{{ elasticsearch_snapshot_repository_name }}/_all" + method: GET + register: uri_response + +- name: Delete old snapshots + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_snapshot/{{ elasticsearch_snapshot_repository_name }}/{{ item }}" + method: DELETE + loop: >- + {{ uri_response.json.snapshots | map(attribute='snapshot') | reject('equalto', snapshot_name) | list }} + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "elasticsearch_snapshot" + dirs_to_archive: + - "{{ elasticsearch_snapshot_repository_location }}/" + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_kibana_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_kibana_etc.yml new file mode 100644 index 0000000000..4b774e7d4f --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/logging_kibana_etc.yml @@ -0,0 +1,27 @@ +--- +- name: Assert that the snapshot_name fact is defined and valid + assert: + that: + - snapshot_name is defined + - snapshot_name is string + - snapshot_name | length > 0 + fail_msg: The snapshot_name fact must be defined and must be a non-empty string. + +- debug: var=snapshot_name + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "kibana_etc" + dirs_to_archive: + - /etc/kibana/ + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/main.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/main.yml deleted file mode 100644 index 582290f177..0000000000 --- a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/main.yml +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Create a backup directory - file: - path: "{{ backup_dir }}" - state: directory - -# Ansible 2.8 -# - name: Backup certificates -# copy: -# src: /etc/kubernetes/pki -# dest: "{{ backup_dir }}/tmp" -# remote_src: yes - -# Ansible 2.7 -- name: Backup certificates - synchronize: - src: /etc/kubernetes/pki - dest: "{{ backup_dir }}/tmp" - recursive: yes - delegate_to: "{{ inventory_hostname }}" - -- name: Get etcd image name - environment: - KUBECONFIG: "/home/{{ admin_user.name }}/.kube/config" - shell: kubectl get pods --all-namespaces -o=jsonpath="{.items[*].spec.containers[?(@.name=='etcd')].image}" - register: etcd_image_name - -- name: Save etcd image name to file - copy: - content: "{{ etcd_image_name.stdout }}" - dest: "{{ backup_dir }}/tmp/etcd_ver.txt" - -- name: Create etcd snapshot - shell: > - docker run -v "{{ backup_dir }}/tmp":/backup \ - --network host \ - --env ETCDCTL_API=3 \ - --rm {{ etcd_image_name.stdout }} \ - etcdctl --endpoints=https://127.0.0.1:2379 \ - --cacert=/backup/pki/etcd/ca.crt \ - --cert=/backup/pki/etcd/healthcheck-client.crt \ - --key=/backup/pki/etcd/healthcheck-client.key \ - snapshot save /backup/etcd-snapshot.db - -- name: Check if kubeadm configuration file exists - stat: - path: /etc/kubeadm/kubeadm-config.yml - register: stat_result - -- name: Backup kubeadm configuration file - copy: - src: /etc/kubeadm/kubeadm-config.yml - dest: "{{ backup_dir }}/tmp" - remote_src: yes - when: stat_result.stat.exists - -- name: Set variable with current timestamp - set_fact: timestamp="{{ lookup('pipe', 'date +%Y%m%d%H%M%S') }}" - -- name: Create a tar gz archive - archive: - path: "{{ backup_dir }}/tmp/" - dest: "{{ backup_dir }}/k8s_backup_{{ timestamp }}.tar.gz" - format: gz - -- name: Clean temporary directory - file: - state: absent - path: "{{ backup_dir }}/tmp/" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_grafana_data.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_grafana_data.yml new file mode 100644 index 0000000000..bb5facc006 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_grafana_data.yml @@ -0,0 +1,27 @@ +--- +- name: Assert that the snapshot_name fact is defined and valid + assert: + that: + - snapshot_name is defined + - snapshot_name is string + - snapshot_name | length > 0 + fail_msg: The snapshot_name fact must be defined and must be a non-empty string. + +- debug: var=snapshot_name + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "grafana_data" + dirs_to_archive: + - "{{ component_vars.specification.grafana_data_dir }}/" + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_etc.yml new file mode 100644 index 0000000000..0eec76e2d8 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_etc.yml @@ -0,0 +1,27 @@ +--- +- name: Assert that the snapshot_name fact is defined and valid + assert: + that: + - snapshot_name is defined + - snapshot_name is string + - snapshot_name | length > 0 + fail_msg: The snapshot_name fact must be defined and must be a non-empty string. + +- debug: var=snapshot_name + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "prometheus_etc" + dirs_to_archive: + - /etc/prometheus/ + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_snapshot.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_snapshot.yml new file mode 100644 index 0000000000..3f7a253423 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/monitoring_prometheus_snapshot.yml @@ -0,0 +1,52 @@ +--- +- name: Set helper facts + set_fact: + prometheus_endpoint: >- + http://{{ ansible_default_ipv4.address }}:9090 + snapshot_name: >- + {{ ansible_date_time.iso8601_basic_short | replace('T','-') }} + vars: + uri_template: &uri + body_format: json + +- debug: var=snapshot_name + +- name: Trigger snapshot creation + uri: + <<: *uri + url: "{{ prometheus_endpoint }}/api/v1/admin/tsdb/snapshot" + method: POST + register: uri_response + until: uri_response is success + retries: 12 + delay: 5 + +- name: Get the prometheus_snapshot_name + set_fact: + prometheus_snapshot_name: "{{ uri_response.json.data.name }}" + +- debug: var=prometheus_snapshot_name + +- name: Create, transfer and cleanup snapshot + always: + - name: Remove snapshot directory (cleanup) + file: + path: "{{ component_vars.specification.storage.data_directory }}/snapshots/{{ prometheus_snapshot_name }}/" + state: absent + block: + - name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "prometheus_snapshot" + dirs_to_archive: + - "{{ component_vars.specification.storage.data_directory }}/snapshots/{{ prometheus_snapshot_name }}/" + + - name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + + - name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/postgresql.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/postgresql.yml new file mode 100644 index 0000000000..223381bde3 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/postgresql.yml @@ -0,0 +1,73 @@ +--- +- name: Set helper facts + set_fact: + snapshot_name: >- + {{ ansible_date_time.iso8601_basic_short | replace('T','-') }} + +- debug: + var: snapshot_name + +- name: Check if database is running on node0 database server + become: true + become_user: postgres + command: "{{ repmgr_pg_bindir[ansible_os_family] }}/pg_isready" + register: node0 + ignore_errors: True + when: groups['postgresql'][0] == inventory_hostname + +- name: Debug + debug: + var: hostvars[groups['postgresql'][0]]['node0'].rc + +- name: Create and export backup + block: + - name: Create temporary directories for backup files + file: + path: "/var/tmp/{{ snapshot_name }}/{{ item }}" + state: directory + mode: 0777 + loop: + - data + - configs + + - name: Create database snapshot + become: yes + become_user: postgres + command: "pg_dumpall -f /var/tmp/{{ snapshot_name }}/data/database_dump.sql" + + - name: Search for config files to back up + shell: "find *.conf" + args: + chdir: "{{ pg_config_dir[ansible_os_family] }}" + register: config_files + + - name: Copy config files into temporary location + copy: + src: "{{ pg_config_dir[ansible_os_family] }}/{{ item }}" + dest: "/var/tmp/{{ snapshot_name }}/configs" + remote_src: yes + loop: "{{ config_files.stdout_lines|flatten(levels=1) }}" + + - name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "postgresql" + dirs_to_archive: + - /var/tmp/{{ snapshot_name }}/ + + - name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + + - name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + + - name: Remove temporary files and content + file: + path: "/var/tmp/{{ snapshot_name }}/" + state: absent + when: (groups['postgresql'][0] == inventory_hostname and hostvars[groups['postgresql'][0]]['node0'].rc == 0) or + (groups['postgresql'][1] == inventory_hostname and hostvars[groups['postgresql'][0]]['node0'].rc != 0) diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_definitions.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_definitions.yml new file mode 100644 index 0000000000..e8e3bb80fb --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_definitions.yml @@ -0,0 +1,50 @@ +--- +- name: Set helper facts + set_fact: + snapshot_name: >- + {{ ansible_date_time.iso8601_basic_short | replace('T','-') }} + +- debug: var=snapshot_name + +- name: Ensure management api is enabled + shell: | + rabbitmq-plugins enable rabbitmq_management + args: + executable: /bin/bash + +- name: Ensure the rabbitmqadmin binary is installed + shell: | + curl -fsSL http://localhost:15672/cli/rabbitmqadmin \ + -o /usr/local/bin/rabbitmqadmin \ + && chmod +x /usr/local/bin/rabbitmqadmin + args: + creates: /usr/local/bin/rabbitmqadmin + executable: /bin/bash + +- name: Ensure the destination directory for definitions exists + file: + path: /var/lib/rabbitmq/definitions/ + state: directory + +- name: Save definitions in a json file + shell: | + /usr/local/bin/rabbitmqadmin export /var/lib/rabbitmq/definitions/definitions-{{ snapshot_name }}.json + args: + executable: /bin/bash + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "rabbitmq_definitions" + files_to_archive: + - /var/lib/rabbitmq/definitions/definitions-{{ snapshot_name }}.json + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_etc.yml new file mode 100644 index 0000000000..728297b41c --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/backup/tasks/rabbitmq_rabbitmq_etc.yml @@ -0,0 +1,27 @@ +--- +- name: Assert that the snapshot_name fact is defined and valid + assert: + that: + - snapshot_name is defined + - snapshot_name is string + - snapshot_name | length > 0 + fail_msg: The snapshot_name fact must be defined and must be a non-empty string. + +- debug: var=snapshot_name + +- name: Create snapshot archive + import_tasks: common/create_snapshot_archive.yml + vars: + snapshot_prefix: "rabbitmq_etc" + dirs_to_archive: + - /etc/rabbitmq/ + +- name: Create snapshot checksum + import_tasks: common/create_snapshot_checksum.yml + +- name: Transfer artifacts via rsync + import_tasks: common/download_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/Debian.yml b/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/Debian.yml index 0d9659d943..e5e7519d0e 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/Debian.yml +++ b/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/Debian.yml @@ -27,6 +27,7 @@ - netcat - openssl - python-setuptools + - rsync - software-properties-common - sshpass - sysstat diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/RedHat.yml b/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/RedHat.yml index 5877657e8f..a00705e88e 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/RedHat.yml +++ b/core/src/epicli/data/common/ansible/playbooks/roles/common/tasks/RedHat.yml @@ -23,6 +23,7 @@ - net-tools - openssl - python-setuptools + - rsync - sysstat - tar - telnet diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/tasks/configure-es.yml b/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/tasks/configure-es.yml index c0c6d95729..6f73e83724 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/tasks/configure-es.yml +++ b/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/tasks/configure-es.yml @@ -1,4 +1,11 @@ --- +- name: Ensure snapshot folder exists + file: + path: "{{ specification.paths.repo }}/" + state: directory + owner: elasticsearch + group: elasticsearch + mode: u=rwx,go= - name: Create Elasticsearch configuration file template: @@ -19,4 +26,4 @@ systemd: name: elasticsearch state: started - enabled: yes \ No newline at end of file + enabled: yes diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/templates/elasticsearch.yml.j2 b/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/templates/elasticsearch.yml.j2 index 49eb652f2d..b18e3a8cb7 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/templates/elasticsearch.yml.j2 +++ b/core/src/epicli/data/common/ansible/playbooks/roles/opendistro_for_elasticsearch/templates/elasticsearch.yml.j2 @@ -32,6 +32,10 @@ node.name: {{ansible_hostname}} # path.data: {{specification.paths.data}} # +# Path to directory where to store the data: +# +path.repo: {{specification.paths.repo}} +# # Path to log files: # path.logs: {{specification.paths.logs}} @@ -52,7 +56,7 @@ path.logs: {{specification.paths.logs}} # # Set the bind address to a specific IP (IPv4 or IPv6): # -network.host: {{ansible_hostname}} +network.host: {{ansible_hostname}} # # Set a custom port for HTTP: # @@ -113,4 +117,4 @@ opendistro_security.check_snapshot_restore_write_privileges: true opendistro_security.restapi.roles_enabled: ["all_access", "security_rest_api_access"] cluster.routing.allocation.disk.threshold_enabled: false node.max_local_storage_nodes: 3 -######## End OpenDistro for Elasticsearch Security Demo Configuration ######## \ No newline at end of file +######## End OpenDistro for Elasticsearch Security Demo Configuration ######## diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/defaults/main.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/defaults/main.yml index 7ae2f6b6dd..80fb354149 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/defaults/main.yml +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/defaults/main.yml @@ -1,2 +1,6 @@ --- -backup_dir: /home/{{ admin_user.name }}/backupdir +recovery_dir: /epibackup +recovery_source_dir: "{{ recovery_dir }}/mounted" +recovery_source_host: "{{ resolved_repository_hostname | default(groups.repository.0) }}" +elasticsearch_snapshot_repository_name: epiphany +elasticsearch_snapshot_repository_location: /var/lib/elasticsearch-snapshots diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/meta/main.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/meta/main.yml new file mode 100644 index 0000000000..745ba4d956 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/meta/main.yml @@ -0,0 +1,3 @@ +--- +dependencies: + - role: preflight_facts diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/clear_directories.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/clear_directories.yml new file mode 100644 index 0000000000..9d79918862 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/clear_directories.yml @@ -0,0 +1,45 @@ +--- +# Invoke with (example): +#- set_fact: +# dirs_to_clear: +# - /etc/haproxy/ +# - /etc/ssl/haproxy/ + +- name: Assert that the dirs_to_clear fact is defined and valid + assert: + that: + - dirs_to_clear is defined + - dirs_to_clear is sequence + - dirs_to_clear | length > 0 + fail_msg: The dirs_to_clear fact must be defined and must be a non-empty list. + +- name: Assert that the dirs_to_clear fact does not contain empty strings + assert: + that: + - (dirs_to_clear | length) == (dirs_to_clear_cleaned | length) + fail_msg: The dirs_to_clear fact must not contain empty strings. + vars: + # remove empty strings + dirs_to_clear_cleaned: >- + {{ dirs_to_clear | select | list }} + +- name: Find everything in target directories + find: + paths: "{{ dirs_to_clear_corrected }}" + patterns: "*" + file_type: any + recurse: false + register: find_everything_in_target_directories + vars: + # make sure each path ends with single / + dirs_to_clear_corrected: >- + {{ dirs_to_clear | map('regex_replace', '//*$', '') + | map('regex_replace', '$', '/') + | list }} + +- name: Remove everything from target directories + file: + path: "{{ item }}" + state: absent + loop: >- + {{ find_everything_in_target_directories.files | map(attribute='path') | list }} diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/find_snapshot_archive.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/find_snapshot_archive.yml new file mode 100644 index 0000000000..9755e39b66 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/find_snapshot_archive.yml @@ -0,0 +1,56 @@ +--- +# Invoke with (example): +#- set_fact: +# snapshot_prefix: "rabbitmq_etc" +# snapshot_name: "20200526-102034" + +- name: Assert that the snapshot_prefix fact is defined and valid + assert: + that: + - snapshot_prefix is defined + - snapshot_prefix is string + - snapshot_prefix | length > 0 + fail_msg: The snapshot_prefix fact must be defined and must be a non-empty string. + +- name: Assert that the snapshot_name fact is defined and valid + assert: + that: + - snapshot_name is defined + - snapshot_name is string + - snapshot_name | length > 0 + fail_msg: The snapshot_name fact must be defined and must be a non-empty string. + +- debug: var=snapshot_name + +- name: Decide what should be the search pattern + set_fact: + search_pattern: >- + {{ (snapshot_name != "latest") | ternary( + snapshot_prefix ~ "_" ~ snapshot_name ~ ".tar.gz", + snapshot_prefix ~ "_" ~ "*-*" ~ ".tar.gz" + ) }} + +- debug: var=search_pattern + +- name: Find all matching archives + delegate_to: "{{ recovery_source_host }}" + find: + paths: "{{ recovery_source_dir }}/" + patterns: "{{ search_pattern }}" + file_type: file + recurse: false + register: find_archives + +- name: Assert that there are archives available + assert: + that: find_archives.matched > 0 + fail_msg: No etc archives found. + +- name: Pick the newest archive (if many) + set_fact: + snapshot_path: >- + {{ find_archives.files | map(attribute='path') | max }} + +- name: Assert that the snapshot_path fact is not an empty string + assert: + that: snapshot_path | length > 0 diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/upload_via_rsync.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/upload_via_rsync.yml new file mode 100644 index 0000000000..427e8d2d1e --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/upload_via_rsync.yml @@ -0,0 +1,81 @@ +--- +# Invoke with (example): +#- set_fact: +# artifacts: +# - {{ recovery_source_dir }}/artifact1 +# - {{ recovery_source_dir }}/artifact2 + +- name: Assert that the "artifacts" fact is defined and valid + assert: + that: + - artifacts is defined + - artifacts is sequence + - artifacts | length > 0 + fail_msg: The "artifacts" fact must be defined and must be a non-empty list. + +- name: Upload artifacts from mounted storage + + delegate_to: "{{ recovery_source_host }}" + + always: + - name: Delete generated files + file: + path: "{{ item }}" + state: absent + loop: + - "{{ private_key_file.path }}" + - "{{ private_key_file.path }}.pub" + + - delegate_to: "{{ inventory_hostname }}" # cancel previous delegate_to + block: + - name: Remove public openssh key from admin's authorized_keys + authorized_key: + user: "{{ admin_user.name }}" + state: absent + key: >- + {{ openssh_keypair.public_key }} + + block: + - name: Create a temporary file path to hold the private key in + tempfile: + path: ~/.ssh/ + suffix: .tmp + state: file + register: private_key_file + + - name: Generate openssh keypair for rsync over ssh + openssh_keypair: + path: "{{ private_key_file.path }}" + size: 2048 + force: true + register: openssh_keypair + + - delegate_to: "{{ inventory_hostname }}" # cancel previous delegate_to + block: + - name: Add public openssh key to admin's authorized_keys + authorized_key: + user: "{{ admin_user.name }}" + state: present + key: >- + {{ openssh_keypair.public_key }} + + - name: Ensure destination directory for artifacts exists + file: + path: "{{ recovery_dir }}/" + state: directory + + - name: Use rsync to copy all artifacts + synchronize: + mode: push + dest: "{{ recovery_dir }}/" + src: "{{ item }}" + checksum: true + rsync_opts: + - --rsh={{ rsh }} + vars: + # this fixes / replaces incorrect path to the private key file that synchronize provides + # (setting private_key parameter has no effect whatsoever, looks like a bug tbh) + rsh: >- + /usr/bin/ssh -S none -i {{ private_key_file.path }} -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null + loop: >- + {{ artifacts }} diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/verify_snapshot_checksum.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/verify_snapshot_checksum.yml new file mode 100644 index 0000000000..d7f55f4a50 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/common/verify_snapshot_checksum.yml @@ -0,0 +1,31 @@ +--- +# Invoke with (example): +#- set_fact: +# snapshot_path: "rabbitmq_etc_20200526-102034.tar.gz" + +- name: Assert that the snapshot_path fact is defined and valid + assert: + that: + - snapshot_path is defined + - snapshot_path is string + - snapshot_path | length > 0 + fail_msg: The snapshot_path fact must be defined and must be a non-empty string. + +- name: Slurp checksum from file + slurp: + path: "{{ recovery_dir }}/{{ snapshot_path | basename }}.sha1" + register: slurp_checksum + +- name: Calculate archive checksum + stat: + path: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + get_attributes: false + get_checksum: true + get_mime: false + checksum_algorithm: sha1 + register: stat_archive + +- name: Compare checksums + assert: + that: (slurp_checksum.content | b64decode | trim).startswith(stat_archive.stat.checksum) + fail_msg: Checksums do not match. diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/load_balancer_haproxy_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/load_balancer_haproxy_etc.yml new file mode 100644 index 0000000000..7633d9cce7 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/load_balancer_haproxy_etc.yml @@ -0,0 +1,47 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "haproxy_etc" + snapshot_name: "{{ specification.components.load_balancer.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Stop haproxy service + systemd: + name: haproxy + state: stopped + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: >- + {{ _dirs_to_clear_switch[ansible_os_family] | default(_dirs_to_clear_switch.default) }} + # Simulate some basic in-place switch/case expression using a dictionary. + _dirs_to_clear_switch: + RedHat: + - /etc/haproxy/ + - /etc/ssl/haproxy/ + - /etc/opt/rh/rh-haproxy18/haproxy/ + default: + - /etc/haproxy/ + - /etc/ssl/haproxy/ + +- name: Extract the archive + unarchive: + dest: /etc/ + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Start haproxy service + systemd: + name: haproxy + state: started diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_etc.yml new file mode 100644 index 0000000000..7c81954bf5 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_etc.yml @@ -0,0 +1,38 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "elasticsearch_etc" + snapshot_name: "{{ specification.components.logging.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Stop elasticsearch service + systemd: + name: elasticsearch + state: stopped + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: + - /etc/elasticsearch/ + +- name: Extract the archive + unarchive: + dest: /etc/elasticsearch/ + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Start elasticsearch service + systemd: + name: elasticsearch + state: started diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_snapshot.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_snapshot.yml new file mode 100644 index 0000000000..1cdb00ab90 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_elasticsearch_snapshot.yml @@ -0,0 +1,117 @@ +--- +- name: Set helper facts + set_fact: + elasticsearch_endpoint: >- + https://{{ ansible_default_ipv4.address }}:9200 + vars: + uri_template: &uri + client_cert: /etc/elasticsearch/kirk.pem + client_key: /etc/elasticsearch/kirk-key.pem + validate_certs: false + body_format: json + +- name: Check cluster health + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_cluster/health" + method: GET + register: uri_response + until: uri_response is success + retries: 12 + delay: 5 + +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "elasticsearch_snapshot" + snapshot_name: "{{ specification.components.logging.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: + - "{{ elasticsearch_snapshot_repository_location }}/" + +- name: Extract the archive + unarchive: + dest: "{{ elasticsearch_snapshot_repository_location }}/" + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Change snapshot directory permissions + file: + path: "{{ elasticsearch_snapshot_repository_location }}/" + owner: elasticsearch + group: elasticsearch + recurse: true + +- name: Reconstruct the snapshot_name + set_fact: + snapshot_name: >- + {{ snapshot_path | basename | regex_replace('^elasticsearch_snapshot_(.*).tar.gz$', '\1') }} + +- debug: var=snapshot_name + +- name: Ensure all kibana and filebeat instances are stopped, then restore the snapshot + + always: + - name: Start all kibana instances + delegate_to: "{{ item }}" + systemd: + name: kibana + state: started + enabled: true + loop: "{{ groups.kibana | default([]) }}" + + - name: Start all filebeat instances + delegate_to: "{{ item }}" + systemd: + name: filebeat + state: started + enabled: true + loop: "{{ groups.filebeat | default([]) }}" + + block: + - name: Stop all kibana instances + delegate_to: "{{ item }}" + systemd: + name: kibana + state: stopped + enabled: false + loop: "{{ groups.kibana | default([]) }}" + + - name: Stop all filebeat instances + delegate_to: "{{ item }}" + systemd: + name: filebeat + state: stopped + enabled: false + loop: "{{ groups.filebeat | default([]) }}" + + - name: Close all indices + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_all/_close" + method: POST + + - name: Delete all indices + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_all" + method: DELETE + + - name: Restore the snapshot + uri: + <<: *uri + url: "{{ elasticsearch_endpoint }}/_snapshot/{{ elasticsearch_snapshot_repository_name }}/{{ snapshot_name }}/_restore" + method: POST diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_kibana_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_kibana_etc.yml new file mode 100644 index 0000000000..3792303795 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/logging_kibana_etc.yml @@ -0,0 +1,38 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "kibana_etc" + snapshot_name: "{{ specification.components.logging.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Stop kibana service + systemd: + name: kibana + state: stopped + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: + - /etc/kibana/ + +- name: Extract the archive + unarchive: + dest: /etc/kibana/ + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Start kibana service + systemd: + name: kibana + state: started diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/main.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/main.yml deleted file mode 100644 index cea8d59c93..0000000000 --- a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/main.yml +++ /dev/null @@ -1,132 +0,0 @@ ---- -- name: Reset kubeadm - shell: kubeadm reset -f - -- name: Create directory for certificates - file: - path: /etc/kubernetes/pki - state: directory - -- name: Create temporary directory - file: - path: "{{ backup_dir }}/tmp" - state: directory - -- name: Get files in a backup directory - find: - paths: "{{ backup_dir }}" - patterns: "k8s_backup_*.tar.gz" - register: found_files - -- name: Get latest file - set_fact: - latest_file: "{{ found_files.files | sort(attribute='mtime',reverse=true) | first }}" - -- name: Unarchive a tar gz archive - unarchive: - src: "{{ latest_file.path }}" - dest: "{{ backup_dir }}/tmp" - remote_src: yes - -# Ansible 2.8 -# - name: Restore certificates -# copy: -# src: "{{ backup_dir }}/tmp/pki/" -# dest: /etc/kubernetes/pki -# remote_src: yes - -# Ansible 2.7 -- name: Restore certificates - synchronize: - src: "{{ backup_dir }}/tmp/pki/" - dest: /etc/kubernetes/pki - recursive: yes - delegate_to: "{{ inventory_hostname }}" - -- name: Create data directory for etcd - file: - path: /var/lib/etcd - state: directory - -- name: Get etcd image name - shell: cat "{{ backup_dir }}/tmp/etcd_ver.txt" - register: etcd_image_name - -- name: Restore etcd backup - shell: > - docker run -v "{{ backup_dir }}/tmp":/backup \ - -v /var/lib/etcd:/var/lib/etcd \ - --env ETCDCTL_API=3 \ - --rm "{{ etcd_image_name.stdout }}" \ - /bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot.db'; mv /default.etcd/member/ /var/lib/etcd/" - -- name: Check if kubeadm configuration file exists - stat: - path: "{{ backup_dir }}/tmp/kubeadm-config.yml" - register: stat_result - -- name: Create directory for kubeadm configuration file - file: - path: /etc/kubeadm - state: directory - when: stat_result.stat.exists - -- name: Restore kubeadm configuration file - copy: - src: "{{ backup_dir }}/tmp/kubeadm-config.yml" - dest: "/etc/kubeadm/kubeadm-config.yml" - remote_src: yes - when: stat_result.stat.exists - -- name: Initialize the master with backup including kubeadm configuration file - shell: kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd,NumCPU --config /etc/kubeadm/kubeadm-config.yml - when: stat_result.stat.exists - -- name: Initialize the master with backup - shell: kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd,NumCPU - when: not stat_result.stat.exists - -- name: Wait for all nodes to be ready - environment: - KUBECONFIG: "/home/{{ admin_user.name }}/.kube/config" - shell: kubectl get nodes -o json - register: output - until: output.stdout|from_json|json_query("items[*].status.conditions[?(@.type=='Ready')].status[]")|unique == ["True"] - retries: 120 - delay: 10 - -- name: Check cluster version - environment: - KUBECONFIG: "/home/{{ admin_user.name }}/.kube/config" - shell: kubectl version --short | grep -i server - register: cluster_version - -# https://github.com/kubernetes/kubeadm/issues/1471 Upgrading a 1.12 cluster thru 1.13 to 1.14 fails - -- name: Validate whether current cluster is upgradeable (from ver. 1.13) - - block: - - name: Show upgrade plan - shell: kubeadm upgrade plan - when: '"1.13" in cluster_version.stdout' - - rescue: - - name: Find the existing etcd server certificates - find: - paths: /etc/kubernetes/pki/etcd - patterns: "*server.*" - register: files_to_delete - - - name: Remove the existing etcd server certificates - file: - path: "{{ item.path }}" - state: absent - with_items: "{{ files_to_delete.files }}" - - - name: Regenerate the etcd server certificates - shell: kubeadm init phase certs etcd-server - -- name: Clean temporary directory - file: - state: absent - path: "{{ backup_dir }}/tmp/" diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_grafana_data.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_grafana_data.yml new file mode 100644 index 0000000000..26e7d9477f --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_grafana_data.yml @@ -0,0 +1,38 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "grafana_data" + snapshot_name: "{{ specification.components.monitoring.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Stop grafana service + systemd: + name: grafana-server + state: stopped + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: + - "{{ component_vars.specification.grafana_data_dir }}/" + +- name: Extract the archive + unarchive: + dest: "{{ component_vars.specification.grafana_data_dir }}/" + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Start grafana service + systemd: + name: grafana-server + state: started diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_etc.yml new file mode 100644 index 0000000000..30c68c4992 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_etc.yml @@ -0,0 +1,38 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "prometheus_etc" + snapshot_name: "{{ specification.components.monitoring.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Stop prometheus service + systemd: + name: prometheus + state: stopped + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: + - /etc/prometheus/ + +- name: Extract the archive + unarchive: + dest: /etc/prometheus/ + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Start prometheus service + systemd: + name: prometheus + state: started diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_snapshot.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_snapshot.yml new file mode 100644 index 0000000000..fb693c26e6 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/monitoring_prometheus_snapshot.yml @@ -0,0 +1,38 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "prometheus_snapshot" + snapshot_name: "{{ specification.components.monitoring.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Stop prometheus service + systemd: + name: prometheus + state: stopped + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: + - "{{ component_vars.specification.storage.data_directory }}/" + +- name: Extract the archive + unarchive: + dest: "{{ component_vars.specification.storage.data_directory }}/" + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Start prometheus service + systemd: + name: prometheus + state: started diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/postgresql.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/postgresql.yml new file mode 100644 index 0000000000..caff1883c4 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/postgresql.yml @@ -0,0 +1,240 @@ +--- +# Actions taken only when replication with repmgr is enabled +- name: Stop repmgr service + block: + - name: Stop repmgr on standby node + service: + name: "{{ repmgr_service_name[ansible_os_family] }}" + state: stopped + when: + - groups['postgresql'][1] == inventory_hostname + + - name: Stop repmgr on primary node + service: + name: "{{ repmgr_service_name[ansible_os_family] }}" + state: stopped + when: + - groups['postgresql'][0] == inventory_hostname + when: + - component_vars.specification.extensions.replication.enabled is defined + - component_vars.specification.extensions.replication.enabled + - component_vars.specification.extensions.replication.use_repmgr is defined + - component_vars.specification.extensions.replication.use_repmgr + +- name: Copy and restore backup files + # Running on primary or only node + block: + - name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "postgresql" + snapshot_name: "{{ specification.components.postgresql.snapshot_name }}" + + - name: Transfer database backup via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + + - name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + + - name: Create temp directories + file: + path: "/var/tmp/{{ item }}" + state: directory + mode: 0777 + loop: + - postgresql_restore_source + - postgresql_temp_config + + - name: Extract backup file + unarchive: + dest: "/var/tmp/postgresql_restore_source/" + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + + - name: Cache existing configuration files + block: + - name: List existing configuration files + shell: "find *.conf" + args: + chdir: "{{ pg_config_dir[ansible_os_family] }}" + register: config_files + + - name: Copy existing configuration files + copy: + src: "{{ pg_config_dir[ansible_os_family] }}/{{ item }}" + dest: "/var/tmp/postgresql_temp_config/" + remote_src: yes + loop: "{{ config_files.stdout_lines|flatten(levels=1) }}" + + - name: Stop database service + systemd: + name: "{{ pg_service_name[ansible_os_family] }}" + state: stopped + + - name: Drop old database / delete main data directory + file: + path: "{{ pg_data_dir[ansible_os_family] }}/" + state: absent + + - name: Initialize database (RedHat) + block: + - name: Ensure that data directory exists + file: + path: "{{ pg_data_dir[ansible_os_family] }}" + state: directory + owner: postgres + group: postgres + + - name: Initialize database + command: "/usr/pgsql-10/bin/postgresql-10-setup initdb {{ pg_service_name[ansible_os_family] }}" + when: + - ansible_os_family == 'RedHat' + + - name: Initialize database (Debian) + become: yes + become_user: postgres + command: "/usr/lib/postgresql/10/bin/initdb -D {{ pg_data_dir[ansible_os_family] }}" + when: + - ansible_os_family == 'Debian' + + - name: Copy cached config files + copy: + src: "/var/tmp/postgresql_temp_config/" + dest: "{{ pg_config_dir[ansible_os_family] }}/" + owner: postgres + group: postgres + remote_src: yes + + - name: Start Postgresql service + systemd: + name: "{{ pg_service_name[ansible_os_family] }}" + state: started + + - name: Import database from dump file + become: yes + become_user: postgres + command: "psql -f /var/tmp/postgresql_restore_source/data/database_dump.sql postgres" + + - name: Configure repmgr + #Repmgr on primary node + block: + - name: Register primary node in repmgr + become: yes + become_user: postgres + shell: "{{ repmgr_bindir[ansible_os_family] }}/repmgr -f {{ repmgr_config_dir[ansible_os_family] }}/repmgr.conf + --force --superuser={{ component_vars.specification.extensions.replication.priviledged_user_name }} primary register -F" + + - name: Start repmgr on primary node + service: + name: "{{ repmgr_service_name[ansible_os_family] }}" + state: started + when: + - component_vars.specification.extensions.replication.enabled is defined + - component_vars.specification.extensions.replication.enabled + - component_vars.specification.extensions.replication.use_repmgr is defined + - component_vars.specification.extensions.replication.use_repmgr + + - name: Remove created temporary files + file: + path: "{{ item }}" + state: absent + loop: + - "/var/tmp/postgresql_restore_source/" + - "/var/tmp/postgresql_temp_config/" + - "{{ recovery_dir }}/{{ snapshot_path | basename }}" + - "{{ recovery_dir }}/{{ snapshot_path | basename }}.sha1" + when: + - groups['postgresql'][0] == inventory_hostname + +- name: Configure repmgr on secondary node + block: + - name: Stop postgresql service + service: + name: "{{ pg_service_name[ansible_os_family] }}" + state: stopped + + - name: Create temporary directory + file: + path: "/var/tmp/postgresql_temp_config" + state: directory + mode: 0666 + + - name: Cache existing configuration files + block: + - name: search for existing configuration files (RedHat) + shell: "find *.conf" + args: + chdir: "{{ pg_config_dir[ansible_os_family] }}" + register: config_files + + - name: Copy existing configuration files + copy: + src: "{{ pg_config_dir[ansible_os_family] }}/{{ item }}" + dest: "/var/tmp/postgresql_temp_config/" + remote_src: yes + loop: "{{ config_files.stdout_lines|flatten(levels=1) }}" + + - name: Delete existing data directory before cloning from primary node + file: + path: "{{ pg_data_dir[ansible_os_family] }}/" + state: absent + + - name: Ensure that data directory exists (RedHat) #This needs to be check pn RedHat family sytems since not always location is created by init + file: + path: "{{ pg_data_dir[ansible_os_family] }}/" + state: directory + owner: postgres + group: postgres + when: + - ansible_os_family == 'RedHat' + + - name: Clone content from primary node using repmgr + become_user: postgres + shell: "{{ repmgr_bindir[ansible_os_family] }}/repmgr -f {{ repmgr_config_dir[ansible_os_family] }}/repmgr.conf -h {{ hostvars[groups['postgresql'][0]]['ansible_default_ipv4']['address'] }} -U {{ component_vars.specification.extensions.replication.priviledged_user_name }} -d {{ component_vars.specification.extensions.replication.repmgr_database }} -p 5432 -F standby clone" + + - name: Copy cached config files back to database configuration location + copy: + src: "/var/tmp/postgresql_temp_config/" + dest: "{{ pg_config_dir[ansible_os_family] }}/" + owner: postgres + group: postgres + remote_src: yes + + - name: Start postgresql service + service: + name: "{{ pg_service_name[ansible_os_family] }}" + state: restarted + + - name: Register secondary node to repmgr cluster + become_user: postgres + shell: "{{ repmgr_bindir[ansible_os_family] }}/repmgr -f {{ repmgr_config_dir[ansible_os_family] }}/repmgr.conf standby register -F" + + - name: Start repmgr service + service: + name: "{{ repmgr_service_name[ansible_os_family] }}" + state: started + + - name: Rejoin secondary node to repmgr cluster + become_user: postgres + shell: "{{ repmgr_bindir[ansible_os_family] }}/repmgr -f {{ repmgr_config_dir[ansible_os_family] }}/repmgr.conf standby follow -F" + when: + - component_vars.specification.extensions.replication.enabled is defined + - component_vars.specification.extensions.replication.enabled + - component_vars.specification.extensions.replication.use_repmgr is defined + - component_vars.specification.extensions.replication.use_repmgr + - groups['postgresql'][1] == inventory_hostname + +- name: Restart repmgrd service + service: + name: "{{ repmgr_service_name[ansible_os_family] }}" + state: restarted + when: + - component_vars.specification.extensions.replication.enabled is defined + - component_vars.specification.extensions.replication.enabled + - component_vars.specification.extensions.replication.use_repmgr is defined + - component_vars.specification.extensions.replication.use_repmgr + diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_definitions.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_definitions.yml new file mode 100644 index 0000000000..204e2b2d6f --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_definitions.yml @@ -0,0 +1,55 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "rabbitmq_definitions" + snapshot_name: "{{ specification.components.rabbitmq.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Ensure a folder to hold definitions in exists + file: + path: /var/lib/rabbitmq/definitions/ + state: directory + +- name: Extract the archive + unarchive: + dest: /var/lib/rabbitmq/definitions/ + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Ensure management api is enabled + shell: | + rabbitmq-plugins enable rabbitmq_management + args: + executable: /bin/bash + +- name: Ensure the rabbitmqadmin binary is installed + shell: | + curl -fsSL http://localhost:15672/cli/rabbitmqadmin \ + -o /usr/local/bin/rabbitmqadmin \ + && chmod +x /usr/local/bin/rabbitmqadmin + args: + creates: /usr/local/bin/rabbitmqadmin + executable: /bin/bash + +- name: Reconstruct the snapshot_name + set_fact: + snapshot_name: >- + {{ snapshot_path | basename | regex_replace('^rabbitmq_definitions_(.*).tar.gz$', '\1') }} + +- debug: var=snapshot_name + +- name: Import definitions json file + shell: | + /usr/local/bin/rabbitmqadmin import /var/lib/rabbitmq/definitions/definitions-{{ snapshot_name }}.json + args: + executable: /bin/bash diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_etc.yml b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_etc.yml new file mode 100644 index 0000000000..b191db3dd2 --- /dev/null +++ b/core/src/epicli/data/common/ansible/playbooks/roles/recovery/tasks/rabbitmq_rabbitmq_etc.yml @@ -0,0 +1,38 @@ +--- +- name: Find snapshot archive + import_tasks: common/find_snapshot_archive.yml + vars: + snapshot_prefix: "rabbitmq_etc" + snapshot_name: "{{ specification.components.rabbitmq.snapshot_name }}" + +- name: Transfer the archive via rsync + import_tasks: common/upload_via_rsync.yml + vars: + artifacts: + - "{{ snapshot_path }}" + - "{{ snapshot_path }}.sha1" + +- name: Verify snapshot checksum + import_tasks: common/verify_snapshot_checksum.yml + +- name: Stop rabbitmq service + systemd: + name: rabbitmq-server + state: stopped + +- name: Clear directories + import_tasks: common/clear_directories.yml + vars: + dirs_to_clear: + - /etc/rabbitmq/ + +- name: Extract the archive + unarchive: + dest: /etc/rabbitmq/ + src: "{{ recovery_dir }}/{{ snapshot_path | basename }}" + remote_src: true + +- name: Start rabbitmq service + systemd: + name: rabbitmq-server + state: started diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/centos-7/requirements.txt b/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/centos-7/requirements.txt index cc88789405..42cddba64e 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/centos-7/requirements.txt +++ b/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/centos-7/requirements.txt @@ -110,6 +110,10 @@ xorg-x11-font-utils # for grafana xorg-x11-server-utils # for grafana yum-plugin-versionlock yum-utils + +# to make remote-to-remote "synchronize" work in ansible +rsync + # K8s upgrade v1.12 kubeadm-1.12.10 kubectl-1.12.10 diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/redhat-7/requirements.txt b/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/redhat-7/requirements.txt index 4a73b84f18..1a29ed4f1f 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/redhat-7/requirements.txt +++ b/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/redhat-7/requirements.txt @@ -108,6 +108,10 @@ xorg-x11-font-utils # for grafana xorg-x11-server-utils # for grafana yum-plugin-versionlock yum-utils + +# to make remote-to-remote "synchronize" work in ansible +rsync + # K8s upgrade v1.12 kubeadm-1.12.10 kubectl-1.12.10 diff --git a/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/ubuntu-18.04/requirements.txt b/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/ubuntu-18.04/requirements.txt index 38cb37dfc5..ad4869781b 100644 --- a/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/ubuntu-18.04/requirements.txt +++ b/core/src/epicli/data/common/ansible/playbooks/roles/repository/files/download-requirements/ubuntu-18.04/requirements.txt @@ -60,6 +60,9 @@ tmux unzip vim +# to make remote-to-remote "synchronize" work in ansible +rsync + # for curl, issue #869 libcurl4 diff --git a/core/src/epicli/data/common/defaults/configuration/backup.yml b/core/src/epicli/data/common/defaults/configuration/backup.yml new file mode 100644 index 0000000000..2a5467b4ad --- /dev/null +++ b/core/src/epicli/data/common/defaults/configuration/backup.yml @@ -0,0 +1,19 @@ +kind: configuration/backup +title: Backup Config +name: default +specification: + components: + load_balancer: + enabled: false + logging: + enabled: false + monitoring: + enabled: false + postgresql: + enabled: false + rabbitmq: + enabled: false +# Kubernetes recovery is not supported by Epiphany at this point. +# You may create backup by enabling this below, but recovery should be done manually according to Kubernetes documentation. + kubernetes: + enabled: false diff --git a/core/src/epicli/data/common/defaults/configuration/feature-mapping.yml b/core/src/epicli/data/common/defaults/configuration/feature-mapping.yml index 21a7b0d3d8..c80090a713 100644 --- a/core/src/epicli/data/common/defaults/configuration/feature-mapping.yml +++ b/core/src/epicli/data/common/defaults/configuration/feature-mapping.yml @@ -141,4 +141,3 @@ specification: - node-exporter - filebeat - firewall - diff --git a/core/src/epicli/data/common/defaults/configuration/logging.yml b/core/src/epicli/data/common/defaults/configuration/logging.yml index d9abe3a020..cc34950f12 100644 --- a/core/src/epicli/data/common/defaults/configuration/logging.yml +++ b/core/src/epicli/data/common/defaults/configuration/logging.yml @@ -13,4 +13,5 @@ specification: clustered: True paths: data: /var/lib/elasticsearch + repo: /var/lib/elasticsearch-snapshots logs: /var/log/elasticsearch diff --git a/core/src/epicli/data/common/defaults/configuration/prometheus.yml b/core/src/epicli/data/common/defaults/configuration/prometheus.yml index b0036a42ec..74f6209276 100644 --- a/core/src/epicli/data/common/defaults/configuration/prometheus.yml +++ b/core/src/epicli/data/common/defaults/configuration/prometheus.yml @@ -15,6 +15,7 @@ specification: - "--web.console.libraries=/etc/prometheus/console_libraries" # Directory should be the same as "config_directory" - "--web.console.templates=/etc/prometheus/consoles" # Directory should be the same as "config_directory" - "--web.listen-address=0.0.0.0:9090" # Address that Prometheus console will be available + - "--web.enable-admin-api" # Enables administrative HTTP API metrics_path: "/metrics" scrape_interval : "15s" diff --git a/core/src/epicli/data/common/defaults/configuration/recovery.yml b/core/src/epicli/data/common/defaults/configuration/recovery.yml new file mode 100644 index 0000000000..fc6da4f091 --- /dev/null +++ b/core/src/epicli/data/common/defaults/configuration/recovery.yml @@ -0,0 +1,20 @@ +kind: configuration/recovery +title: Recovery Config +name: default +specification: + components: + load_balancer: + enabled: false + snapshot_name: latest + logging: + enabled: false + snapshot_name: latest + monitoring: + enabled: false + snapshot_name: latest + postgresql: + enabled: false + snapshot_name: latest + rabbitmq: + enabled: false + snapshot_name: latest diff --git a/core/src/epicli/data/common/validation/configuration/backup.yml b/core/src/epicli/data/common/validation/configuration/backup.yml new file mode 100644 index 0000000000..c61f718127 --- /dev/null +++ b/core/src/epicli/data/common/validation/configuration/backup.yml @@ -0,0 +1,82 @@ +type: object +required: + - name + - provider + - specification +properties: + name: + $ref: '#/definitions/name' + provider: + $ref: '#/definitions/provider' + title: + $ref: '#/definitions/title' + specification: + type: object + required: + - components + properties: + components: + type: object + additionalProperties: false + properties: + load_balancer: + "$id": "#/properties/specification/properties/components/properties/load_balancer" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/load_balancer/properties/enabled" + type: boolean + logging: + "$id": "#/properties/specification/properties/components/properties/logging" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/logging/properties/enabled" + type: boolean + monitoring: + "$id": "#/properties/specification/properties/components/properties/monitoring" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/monitoring/properties/enabled" + type: boolean + postgresql: + "$id": "#/properties/specification/properties/components/properties/postgresql" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/postgresql/properties/enabled" + type: boolean + rabbitmq: + "$id": "#/properties/specification/properties/components/properties/rabbitmq" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/rabbitmq/properties/enabled" + type: boolean + kubernetes: + "$id": "#/properties/specification/properties/components/properties/kubernetes" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/kubernetes/properties/enabled" + type: boolean + diff --git a/core/src/epicli/data/common/validation/configuration/recovery.yml b/core/src/epicli/data/common/validation/configuration/recovery.yml new file mode 100644 index 0000000000..ebdbd93092 --- /dev/null +++ b/core/src/epicli/data/common/validation/configuration/recovery.yml @@ -0,0 +1,91 @@ +type: object +required: + - name + - provider + - specification +properties: + name: + $ref: '#/definitions/name' + provider: + $ref: '#/definitions/provider' + title: + $ref: '#/definitions/title' + specification: + type: object + required: + - components + properties: + components: + type: object + additionalProperties: false + properties: + load_balancer: + "$id": "#/properties/specification/properties/components/properties/load_balancer" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/load_balancer/properties/enabled" + type: boolean + snapshot_name: + "$id": "#/properties/specification/properties/components/properties/load_balancer/properties/snapshot_name" + type: string + pattern: "(^[0-9]{8}-[0-9]{6}$)|(^latest$)" + logging: + "$id": "#/properties/specification/properties/components/properties/logging" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/logging/properties/enabled" + type: boolean + snapshot_name: + "$id": "#/properties/specification/properties/components/properties/logging/properties/snapshot_name" + type: string + pattern: "(^[0-9]{8}-[0-9]{6}$)|(^latest$)" + monitoring: + "$id": "#/properties/specification/properties/components/properties/monitoring" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/monitoring/properties/enabled" + type: boolean + snapshot_name: + "$id": "#/properties/specification/properties/components/properties/monitoring/properties/snapshot_name" + type: string + pattern: "(^[0-9]{8}-[0-9]{6}$)|(^latest$)" + postgresql: + "$id": "#/properties/specification/properties/components/properties/postgresql" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/postgresql/properties/enabled" + type: boolean + snapshot_name: + "$id": "#/properties/specification/properties/components/properties/postgresql/properties/snapshot_name" + type: string + pattern: "(^[0-9]{8}-[0-9]{6}$)|(^latest$)" + rabbitmq: + "$id": "#/properties/specification/properties/components/properties/rabbitmq" + type: object + required: + - enabled + additionalProperties: false + properties: + enabled: + "$id": "#/properties/specification/properties/components/properties/rabbitmq/properties/enabled" + type: boolean + snapshot_name: + "$id": "#/properties/specification/properties/components/properties/rabbitmq/properties/snapshot_name" + type: string + pattern: "(^[0-9]{8}-[0-9]{6}$)|(^latest$)" diff --git a/core/src/epicli/tests/helpers/test_doc_list_helpers.py b/core/src/epicli/tests/helpers/test_doc_list_helpers.py index 02f1835b29..639ccefa05 100644 --- a/core/src/epicli/tests/helpers/test_doc_list_helpers.py +++ b/core/src/epicli/tests/helpers/test_doc_list_helpers.py @@ -1,4 +1,7 @@ +import pytest + from cli.helpers.doc_list_helpers import select_first, select_all +from cli.helpers.doc_list_helpers import select_single, ExpectedSingleResultException from cli.helpers.ObjDict import ObjDict DATA = [ObjDict({'index': 1, 'name': 'test-name-1'}), @@ -22,14 +25,14 @@ def test_select_first_should_return_first_matching_element_when_many_elements_ma def test_select_first_should_return_none_if_there_is_no_matching_elements(): - actual = select_first(DATA, lambda item: item.name == 'name-that-no-exists') + actual = select_first(DATA, lambda item: item.name == 'name-that-does-not-exist') assert(actual is None) def test_select_first_should_return_none_if_data_is_none(): - actual = select_first(None, lambda item: item.name == 'name-that-no-exists') + actual = select_first(None, lambda item: item.name == 'name-that-does-not-exist') assert(actual is None) @@ -43,6 +46,31 @@ def test_select_all_returns_all_matching_elements(): def test_select_all_returns_empty_list_if_there_is_no_matching_elements(): - actual = select_all(DATA, lambda item: item.name == 'name-that-no-exists') + actual = select_all(DATA, lambda item: item.name == 'name-that-does-not-exist') assert(actual == []) + + +def test_select_single_should_return_none_if_data_is_none(): + + actual = select_single(None, lambda item: item.name == 'name-that-does-not-exist') + + assert(actual is None) + + +def test_select_single_should_return_single_matching_element(): + + actual = select_single(DATA, lambda item: item.index == 2) + + assert(isinstance(actual, ObjDict)) + assert(actual.index == 2 and actual.name == DATA[actual.index].name) + + +def test_select_single_should_raise_if_there_are_too_many_matching_elements(): + with pytest.raises(ExpectedSingleResultException): + select_single(DATA, lambda item: item.name == 'test-name23') + + +def test_select_single_should_raise_if_there_is_no_matching_element(): + with pytest.raises(ExpectedSingleResultException): + select_single(DATA, lambda item: item.name == 'name-that-does-not-exist') diff --git a/docs/home/HOWTO.md b/docs/home/HOWTO.md index 02b6dc7594..b51e280ac3 100644 --- a/docs/home/HOWTO.md +++ b/docs/home/HOWTO.md @@ -77,6 +77,13 @@ - [How to start working with Apache Ignite Stateful setup](./howto/DATABASES.md#how-to-start-working-with-apache-ignite-stateful-setup) - [How to start working with Apache Ignite Stateless setup](./howto/DATABASES.md#how-to-start-working-with-apache-ignite-stateless-setup) +- [Backup and Recovery](./howto/BACKUP.md) + - [Epiphany backup and restore](./howto/BACKUP.md#epiphany-backup-and-restore) + - [How to perform backup](./howto/BACKUP.md#1-how-to-perform-backup) + - [How to store backup](./howto/BACKUP.md#2-how-to-store-backup) + - [How to perform recovery](./howto/BACKUP.md#3-how-to-perform-recovery) + - [How backup and recovery work](./howto/BACKUP.md#4-how-backup-and-recovery-work) + - [Data and log retention](./howto/RETENTION.md) - [Elasticsearch](./howto/RETENTION.md#elasticsearch) - [Grafana](./howto/RETENTION.md#grafana) diff --git a/docs/home/howto/BACKUP.md b/docs/home/howto/BACKUP.md new file mode 100644 index 0000000000..39c0e057bd --- /dev/null +++ b/docs/home/howto/BACKUP.md @@ -0,0 +1,155 @@ +## Epiphany backup and restore + +### Introduction + +Epiphany provides solution to create full or partial backup and restore for some components, like: + +- [Load Balancer](#load-balancer) +- [Logging](#logging) +- [Monitoring](#monitoring) +- [Postgresql](#postgresql) +- [RabbitMQ](#rabbitmq) +- [Kubernetes (only backup)](#kubernetes) + +Backup is created directly on the machine where component is running, and it is moved to the ``repository`` host via rsync. On the ``repository`` host backup files are stored in location ``/epibackup/mounted`` mounted on a local filesystem. +See [How to store backup](#2-How-to-store-backup) chapter. + +## 1. How to perform backup + +#### Backup configuration + +Copy default configuration for backup from ``defaults/configuration/backup.yml`` into newly created backup.yml config file, and enable backup for chosen components by setting up ``enabled`` parameter to ``true``. + +This config may also be attached to cluster-config.yml + +``` +kind: configuration/backup +title: Backup Config +name: default +specification: + components: + load_balancer: + enabled: true + logging: + enabled: false + monitoring: + enabled: true + postgresql: + enabled: true + rabbitmq: + enabled: false +# Kubernes recovery is not supported at this point. +# You may create backup by enabling this below, but recovery should be done manually according to Kubernetes documentation. + kubernetes: + enabled: false +``` + +Run ``epicli backup`` command: +``` +epicli backup -f backup.yml -b build_folder +``` + +If backup config is attached to cluster-config.yml use this file instead ``backup.yml`` + +## 2. How to store backup + +Backup location is defined in ``backup`` role as ``backup_destination_host`` and ``backup_destination_dir``. +Default backup location is defined on ``repository`` host in location ``/epibackup/mounted/``. +Use ``mounted`` location as mount point and mount storage you want to use. This might be: +- Azure Blob Storage +- Amazon S3 +- NAS +- Any other attached storage + +Ensure that mounted location has enough space, is reliable and is well protected against disaster. + +### If you don't attach any storage into the mount point location be aware that backups will be store on the local machine. This is not recommended. + +## 3. How to perform recovery + +### Recovery configuration + +Copy existing default configuration from ``defaults/configuration/recovery.yml`` into newly created recovery.yml config file, and set ``enabled`` parameter for component to recovery. It's possible to choose snapshot name by passing date and time part of snapshot name. If snapshot name is not provided, the latest one will be restored. + +This config may also be attached to cluster-config.yml + +``` +kind: configuration/recovery +title: Recovery Config +name: default +specification: + components: + load_balancer: + enabled: true + snapshot_name: latest #restore latest backup + logging: + enabled: true + snapshot_name: 20200604-150829 #restore selected backup + monitoring: + enabled: false + snapshot_name: latest + postgresql: + enabled: false + snapshot_name: latest + rabbitmq: + enabled: false + snapshot_name: latest +``` + +Run ``epicli recovery`` command: + +``epicli recovery -f recovery.yml -b build_folder`` + +If recovery config is attached to cluster-config.yml use this file instead ``recovery.yml`` + +## 4. How backup and recovery work + +### Load Balancer + +Load balancer backup includes: +- Configuration files: ``/etc/haproxy/`` +- SSL certificates: ``/etc/ssl/haproxy/`` + +Recovery includes all backed up files + + +### Logging + +Logging backup includes: +- Elasticsearch database snapshot +- Elasticsearch configuration ``/etc/elasticsearch/`` +- Kibana configuration ``/etc/kibana/`` + +Only single-node Elasticsearch backup is supported. Solution for multi-node Elasticsearch cluster will be added in future release. + +### Monitoring +Monitoring backup includes: +- Prometheus data snapshot +- Prometheus configuration ``/etc/prometheus/`` +- Grafana data snapshot + +Recovery includes all backed up configurations and snapshots. + +### Postgresql +Postgresql backup includes: +- Database data and metadata dump using ``pg_dumpall`` +- Configuration files: ``*.conf`` + +When multinode configuration is used, and failover action has changed database cluster status (one node down, switchover) it's still possible to create backup. But before database restore, cluster needs to be recovered by running ``epicli apply`` and next ``epicli recovery`` to restore database data. +By default we don't support recovery database configuration from backup since this needs to be done using ``epicli apply`` or manually by copying backed up files accordingly to cluster state. The reason of this is that is very risky to restore configuration files among different database cluster configurations. + +### RabbitMQ +RabbitMQ backup includes: +- Messages definitions +- Configuration files: ``/etc/rabbitmq/`` + +Backup does not include RabbitMQ messages. + +Recovery includes all backed up files and configurations. + +### Kubernetes +- Etcd snapshot +- Public Key Infrastructure ``/etc/kubernetes/pki`` +- Kubeadm configuration files + +Epiphany does not support Kubernetes cluster recovery. Use Kubernetes documentation for manual recovery. \ No newline at end of file