Skip to content

Commit

Permalink
Feature backup / restore (#1359)
Browse files Browse the repository at this point in the history
* backup/recovery: uncommenting and refactoring legacy code (#1241)

* backup/recovery: adding ansible code for syncing files between hosts (#1257)

* backup/recovery: adding ansible code for syncing files between hosts

* backup/recovery: adding rsync to package requirements

* backup/recovery: fixing rsync code + adding random name for private key

* backup/recovery: adding ssh code (transfer via pipeline)

* backup/recovery/prometheus: enabling admin API + initial implementation

* backup/recovery: adding initial code for restoring monitoring snapshots (#1259)

* backup/recovery: moving sync code to roles + cleanups

* backup/recovery: adding initial code for restoring monitoring snapshots

* backup/recovery: adding kubernetes backup sync (copy-only)

* backup/recovery: checking if snapshots are available first

* backup/recovery: adding on-the-fly sha1 calculation (#1262)

* backup/recovery: adding on-the-fly sha1 calculation in the download_via_ssh copy method

* backup/recovery: fixing on-the-fly sha1 calculation

- monitoring: reverting invalid tar argument order
- download_via_ssh: making the inline_script fail properly

* Postgres Backup Initial

* postgres backup

* Add all config files to backup

* backup/recovery: monitoring refactor (#1270)

- switching from ssh to rsync
- adding grafana backup/restore
- adding prometheus etc backup/restore

* backup/recovery: adding loadbalancer (#1279)

* backup/recovery: logging (WIP) (#1277)

- work in progress, does not work with elasticsearch clusters yet

* backup/recovery: adding rabbitmq (without messages) (#1291)

* backup/recovery: adding rabbitmq (without messages)

* backup/recovery: adding rabbitmq (archive fix)

* Epicli backup/restore improvements and refactor (#1299)

* epicli: moving component parser to separate module

* backup/recovery: moving backup storage to repository hosts

* backup/recovery: PatchEngine refactor (adding yaml based config)

* backup/recovery: adding backup common tasks

* backup/recovery: adding recovery common tasks

* backup/recovery: kubernetes refactor

* backup/recovery: load_balancer refactor

* backup/recovery: logging refactor

* backup/recovery: monitoring refactor

* backup/recovery: rabbitmq refactor

* backup/recovery: postgresql refactor (unfinished)

* backup/recovery: adding optional -b build_directory parameter (#1301)

* merge, modify postgres to use new setup (#1308)

* Postgres recovery fix (#1339)

* merge, modify postgres to use new setup

* recovery fix

* backup/recovery: fixing user vs default config merge logic (or lack of it) (#1345)

* Modify backup / restore roles according to test results (#1356)

* merge, modify postgres to use new setup

* recovery fix

* centos testing changes

* fix after redhat tests

* merge

* Add provider to backup / restore

* Notes about kubernetes

* backup/recovery: adding missing provider value to role vars (fix) (#1358)

* backup/recovery: epicli invocation refactor + adding schema validations

* Change in Postgresql recovery task

* backup/recovery: moving schema validation to earlier stage

* backup/recovery: splitting PatchEngine into BackupEngine and RecoveryEngine

* SchemaValidator: adding ability to check individual documents (without the base schema)

* backup/recovery: reusing modified SchemaValidator

* backup/recovery: removing unneeded/broken kubernetes recovery code

* backup/recovery: removing unused code and trailing whitespace

* backup/recovery: removing abandoned component parser

* doc_list_helpers: adding missing unit tests for the "select_single" helper

* Backup / recovery documentation (#1377)

* Feature/restore and backup validation (#1379)

* Add recovery components validation

* Reformating

* Update core/src/epicli/data/common/validation/configuration/recovery.yml

Co-authored-by: Michał Opala <sk4zuzu@gmail.com>

* Update core/src/epicli/data/common/validation/configuration/recovery.yml

Co-authored-by: Michał Opala <sk4zuzu@gmail.com>

* Update core/src/epicli/data/common/validation/configuration/recovery.yml

Co-authored-by: Michał Opala <sk4zuzu@gmail.com>

* Update core/src/epicli/data/common/validation/configuration/recovery.yml

Co-authored-by: Michał Opala <sk4zuzu@gmail.com>

* Update core/src/epicli/data/common/validation/configuration/recovery.yml

Co-authored-by: Michał Opala <sk4zuzu@gmail.com>

* Add regex for deeper validation

* Add validations config for backup manifest

* Fix true/false in additional values

Co-authored-by: Michał Opala <sk4zuzu@gmail.com>

* Fix postgres check for RedHat (#1389)

* Hotfix for the elasticsearch recovery procedure (#1381)

* backup/recovery: logging/elasticsearch snapshot restore fix

- ensuring kibana is always down
- deleting all indices prior to restore

* backup/recovery: logging/elasticsearch snapshot restore fix

- ensuring all filebeat instances are always down

* backup/recovery: logging/elasticsearch snapshot restore fix

- ensuring kibana and filebeat instances will not be started (via reboot) during restore

* backup/recovery: fixing load_balancer in RedHat (#1390)

* Fix database check (#1391)

Co-authored-by: Irek Głownia <48471627+plirglo@users.noreply.github.com>
Co-authored-by: Marcin Pyrka <pyrka.marcin@gmail.com>
  • Loading branch information
3 people authored Jun 29, 2020
1 parent 42a8482 commit 6c4d140
Show file tree
Hide file tree
Showing 71 changed files with 2,567 additions and 294 deletions.
28 changes: 28 additions & 0 deletions core/src/epicli/cli/engine/BackupEngine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from cli.helpers.doc_list_helpers import select_single
from cli.engine.BackupRecoveryEngineBase import BackupRecoveryEngineBase


class BackupEngine(BackupRecoveryEngineBase):
"""Perform backup operations."""

def __init__(self, input_data):
super(BackupRecoveryEngineBase, self).__init__(__name__) # late call of the Step.__init__(__name__)
super(BackupEngine, self).__init__(input_data)

def backup(self):
"""Backup all enabled components."""

self._process_input_docs()
self._process_configuration_docs()

# Get backup config document
backup_doc = select_single(self.configuration_docs, lambda x: x.kind == 'configuration/backup')

self._update_role_files_and_vars('backup', backup_doc)

# Execute all enabled component playbooks sequentially
for component_name, component_config in sorted(backup_doc.specification.components.items()):
if component_config.enabled:
self._update_playbook_files_and_run('backup', component_name)

return 0
120 changes: 120 additions & 0 deletions core/src/epicli/cli/engine/BackupRecoveryEngineBase.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import os
import copy

from cli.version import VERSION
from cli.helpers.Step import Step

from cli.helpers.build_saver import get_inventory_path_for_build
from cli.helpers.build_saver import copy_files_recursively, copy_file
from cli.helpers.build_saver import MANIFEST_FILE_NAME

from cli.helpers.yaml_helpers import dump
from cli.helpers.data_loader import load_yamls_file, load_yaml_obj, types as data_types
from cli.helpers.doc_list_helpers import select_single, ExpectedSingleResultException

from cli.engine.schema.SchemaValidator import SchemaValidator
from cli.engine.schema.DefaultMerger import DefaultMerger

from cli.engine.ansible.AnsibleCommand import AnsibleCommand
from cli.engine.ansible.AnsibleRunner import AnsibleRunner


class BackupRecoveryEngineBase(Step):
"""Perform backup and recovery operations (abstract base class)."""

def __init__(self, input_data):
# super(BackupRecoveryEngineBase, self).__init__(__name__) needs to be called in any subclass
self.file = input_data.file
self.build_directory = input_data.build_directory
self.manifest_docs = list()
self.input_docs = list()
self.configuration_docs = list()
self.cluster_model = None
self.backup_doc = None
self.recovery_doc = None
self.ansible_command = AnsibleCommand()

def __enter__(self):
super().__enter__()
return self

def __exit__(self, exc_type, exc_value, traceback):
super().__exit__(exc_type, exc_value, traceback)

def _process_input_docs(self):
"""Load, validate and merge (with defaults) input yaml documents."""

path_to_manifest = os.path.join(self.build_directory, MANIFEST_FILE_NAME)
if not os.path.isfile(path_to_manifest):
raise Exception('No manifest.yml inside the build folder')

# Get existing manifest config documents
self.manifest_docs = load_yamls_file(path_to_manifest)
self.cluster_model = select_single(self.manifest_docs, lambda x: x.kind == 'epiphany-cluster')

# Load backup / recovery configuration documents
self.input_docs = load_yamls_file(self.file)

# Validate input documents
with SchemaValidator(self.cluster_model, self.input_docs) as schema_validator:
schema_validator.run_for_individual_documents()

# Merge the input docs with defaults
with DefaultMerger(self.input_docs) as doc_merger:
self.input_docs = doc_merger.run()

def _process_configuration_docs(self):
"""Populate input yaml documents with additional required ad-hoc data."""

# Seed the self.configuration_docs
self.configuration_docs = copy.deepcopy(self.input_docs)

# Please notice using DefaultMerger is not needed here, since it is done already at this point.
# We just check if documents are missing and insert default ones without the unneeded merge operation.
for kind in {'configuration/backup', 'configuration/recovery'}:
try:
# Check if the required document is in user inputs
document = select_single(self.configuration_docs, lambda x: x.kind == kind)
except ExpectedSingleResultException:
# If there is no document provided by the user, then fallback to defaults
document = load_yaml_obj(data_types.DEFAULT, 'common', kind)
# Inject the required "version" attribute
document['version'] = VERSION
# Copy the "provider" value from the cluster model
document['provider'] = self.cluster_model.provider
# Save the document for later use
self.configuration_docs.append(document)
finally:
# Copy the "provider" value to the specification as well
document.specification['provider'] = document['provider']

def _update_role_files_and_vars(self, action, document):
"""Render mandatory vars files for backup/recovery ansible roles inside the existing build directory."""

self.logger.info(f'Updating {action} role files...')

# Copy role files
roles_build_path = os.path.join(self.build_directory, 'ansible/roles', action)
roles_source_path = os.path.join(AnsibleRunner.ANSIBLE_PLAYBOOKS_PATH, 'roles', action)
copy_files_recursively(roles_source_path, roles_build_path)

# Render role vars
vars_dir = os.path.join(roles_build_path, 'vars')
os.makedirs(vars_dir, exist_ok=True)
vars_file_path = os.path.join(vars_dir, 'main.yml')
with open(vars_file_path, 'w') as stream:
dump(document, stream)

def _update_playbook_files_and_run(self, action, component):
"""Update backup/recovery ansible playbooks inside the existing build directory and run the provisioning."""

self.logger.info(f'Running {action} on {component}...')

# Copy playbook file
playbook_build_path = os.path.join(self.build_directory, 'ansible', f'{action}_{component}.yml')
playbook_source_path = os.path.join(AnsibleRunner.ANSIBLE_PLAYBOOKS_PATH, f'{action}_{component}.yml')
copy_file(playbook_source_path, playbook_build_path)

# Run the playbook
inventory_path = get_inventory_path_for_build(self.build_directory)
self.ansible_command.run_playbook(inventory=inventory_path, playbook_path=playbook_build_path)
46 changes: 0 additions & 46 deletions core/src/epicli/cli/engine/PatchEngine.py

This file was deleted.

28 changes: 28 additions & 0 deletions core/src/epicli/cli/engine/RecoveryEngine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from cli.helpers.doc_list_helpers import select_single
from cli.engine.BackupRecoveryEngineBase import BackupRecoveryEngineBase


class RecoveryEngine(BackupRecoveryEngineBase):
"""Perform recovery operations."""

def __init__(self, input_data):
super(BackupRecoveryEngineBase, self).__init__(__name__) # late call of the Step.__init__(__name__)
super(RecoveryEngine, self).__init__(input_data)

def recovery(self):
"""Recover all enabled components."""

self._process_input_docs()
self._process_configuration_docs()

# Get recovery config document
recovery_doc = select_single(self.configuration_docs, lambda x: x.kind == 'configuration/recovery')

self._update_role_files_and_vars('recovery', recovery_doc)

# Execute all enabled component playbooks sequentially
for component_name, component_config in sorted(recovery_doc.specification.components.items()):
if component_config.enabled:
self._update_playbook_files_and_run('recovery', component_name)

return 0
28 changes: 24 additions & 4 deletions core/src/epicli/cli/engine/schema/SchemaValidator.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ def __init__(self, cluster_model, validation_docs):
self.validation_docs = validation_docs

base = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, 'core/base')
definitions = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, 'core/definitions')
self.definitions = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, 'core/definitions')

self.base_schema = dict_to_objdict(deepcopy(base))
self.base_schema['definitions'] = definitions
self.base_schema['definitions'] = self.definitions

self.base_schema_no_provider = dict_to_objdict(deepcopy(base))
self.base_schema_no_provider['definitions'] = definitions
self.base_schema_no_provider['definitions'] = self.definitions
del self.base_schema_no_provider.required[0]
del self.base_schema_no_provider.properties['provider']

Expand All @@ -32,6 +32,27 @@ def get_base_schema(self, kind):
schema.properties.kind.pattern = '^(' + kind + ')$'
return schema

def run_for_individual_documents(self):
for doc in self.validation_docs:
# Load document schema
schema = load_yaml_obj(types.VALIDATION, self.cluster_model.provider, doc.kind)

# Include "definitions"
schema['definitions'] = self.definitions

# Warn the user about the missing validation
if hasattr(schema, '$ref'):
if schema['$ref'] == '#/definitions/unvalidated_specification':
self.logger.warn('No specification validation for ' + doc.kind)

# Assert the schema
try:
validate(instance=objdict_to_dict(doc), schema=objdict_to_dict(schema))
except Exception as e:
self.logger.error(f'Failed validating: {doc.kind}')
self.logger.error(e)
raise Exception('Schema validation error, see the error above.')

def run(self):
for doc in self.validation_docs:
self.logger.info(f'Validating: {doc.kind}')
Expand All @@ -46,4 +67,3 @@ def run(self):
self.logger.error(f'Failed validating: {doc.kind}')
self.logger.error(e)
raise Exception('Schema validation error, see the error above.')

39 changes: 25 additions & 14 deletions core/src/epicli/cli/epicli.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,11 @@
import socket

from cli.engine.ApplyEngine import ApplyEngine
from cli.engine.PatchEngine import PatchEngine
from cli.engine.BackupEngine import BackupEngine
from cli.engine.DeleteEngine import DeleteEngine
from cli.engine.InitEngine import InitEngine
from cli.engine.PrepareEngine import PrepareEngine
from cli.engine.RecoveryEngine import RecoveryEngine
from cli.engine.UpgradeEngine import UpgradeEngine
from cli.engine.TestEngine import TestEngine
from cli.helpers.Log import Log
Expand Down Expand Up @@ -92,12 +93,11 @@ def debug_level(x):
upgrade_parser(subparsers)
delete_parser(subparsers)
test_parser(subparsers)

'''
validate_parser(subparsers)
'''
backup_parser(subparsers)
recovery_parser(subparsers)
'''

# check if there were any variables and display full help
if len(sys.argv) < 2:
Expand Down Expand Up @@ -260,36 +260,47 @@ def run_validate(args):
return engine.validate()
sub_parser.set_defaults(func=run_validate)
'''


def backup_parser(subparsers):
"""Configure and execute backup of cluster components."""

sub_parser = subparsers.add_parser('backup',
description='[Experimental]: Backups existing Epiphany Platform components.')
description='Create backup of cluster components.')
sub_parser.add_argument('-f', '--file', dest='file', type=str, required=True,
help='Backup configuration definition file to use.')
sub_parser.add_argument('-b', '--build', dest='build_directory', type=str, required=True,
help='Absolute path to directory with build artifacts.')
help='Absolute path to directory with build artifacts.',
default=None)

def run_backup(args):
experimental_query()
adjust_paths_from_build(args)
with PatchEngine(args) as engine:
adjust_paths_from_file(args)
with BackupEngine(args) as engine:
return engine.backup()

sub_parser.set_defaults(func=run_backup)


def recovery_parser(subparsers):
sub_parser = subparsers.add_parser('recovery', description='[Experimental]: Recover from existing backup.')
"""Configure and execute recovery of cluster components."""

sub_parser = subparsers.add_parser('recovery',
description='Recover from existing backup.')
sub_parser.add_argument('-f', '--file', dest='file', type=str, required=True,
help='Recovery configuration definition file to use.')
sub_parser.add_argument('-b', '--build', dest='build_directory', type=str, required=True,
help='Absolute path to directory with build artifacts.')
help='Absolute path to directory with build artifacts.',
default=None)

def run_recovery(args):
experimental_query()
adjust_paths_from_build(args)
with PatchEngine(args) as engine:
if not query_yes_no('Do you really want to perform recovery?'):
return 0
adjust_paths_from_file(args)
with RecoveryEngine(args) as engine:
return engine.recovery()

sub_parser.set_defaults(func=run_recovery)
'''


def experimental_query():
Expand Down
2 changes: 1 addition & 1 deletion core/src/epicli/cli/helpers/Step.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import time
from cli.helpers.Log import Log
from abc import ABCMeta, abstractmethod
from abc import ABCMeta


class Step(metaclass=ABCMeta):
Expand Down
8 changes: 7 additions & 1 deletion core/src/epicli/cli/helpers/doc_list_helpers.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@

class ExpectedSingleResultException(Exception):
"""Raised when the query returns none or too many results."""
pass


def select_first(documents, query):
if documents is not None:
for x in documents:
Expand All @@ -22,5 +28,5 @@ def select_single(documents, query):
elements_count = len(results)
if elements_count == 1:
return results[0]
raise Exception("Expected one element but received: " + str(elements_count))
raise ExpectedSingleResultException("Expected one element but received: " + str(elements_count))
return None
9 changes: 0 additions & 9 deletions core/src/epicli/data/common/ansible/playbooks/backup.yml

This file was deleted.

Loading

0 comments on commit 6c4d140

Please sign in to comment.