Provide a backup and restore utility for DataJoint pipelines

## Feature Request

### Problem

Currently, users (along with admins) do not have a simple, intuitive means to perform restricted backups and restore operations. Workarounds typically place a large burden on user to parse the pipeline or involve server-side support. A possible solution could be to define methods such as:

`dj.backup(backup_root_path, table)`

This would define a working directory for the backup and a `table` as the 'anchor' for the backup. The table may have a restriction condition to restricts the records in `table` and its descendants. With these records, the method would determine all the child and parent dependencies (along with any forks resulting from Master-Part relationships). Once all records in lineage are associated with `table`, they would be read and compressed into an appropriate file format e.g. HDF5, NPZ, Parquet, etc. Additionally, a `restore.py` script could be written that specifies the DataJoint table classes with a last step to decompress and ingest the resulting backup.


`dj.restore(backup_root_path, database_prefix, connection=None)`

This would define a working directory and a namespace (i.e. `database_prefix`) under which to 'load' all of the backup data into. Specifying `connection` would set the target server location but default to `dj.conn()` if set to `None`.

These 2 routines also provide the mechanisms for exporting/publishing data from any given DataJoint pipeline.

### Requirements
- Create a compressed representation of a DataJoint pipeline that can be restricted to a particular subset in origin
- The saved data must be self-describing and accessible by standard tools 
- Load data into a target database server under a specific schema prefix
- Loading must work if the data is already partially loaded, allowing for simple synchronization of new data.
- Maintain comparable (or better) performance to 70% of `mysqldump`'s runtime.

### Justification
- Exposes functionality to typical user looking to 'copy' a pipeline as a local workable version.
- Allows DataJoint to provide admin level functionality to provide a means to automate backups, define disaster relief processes, etc.
- Provides an additional method for sharing the data outside the data pipeline. 

### Alternative Considerations
Current workaround for this involves manual routines by user or server side support via (`mysqldump`, volume-based backups). Both present significant challenges for typical user.

### Additional Research and Context
- Reference for [NPZ](https://numpy.org/doc/stable/reference/generated/numpy.savez.html) files.
- Reference for [Parquet](https://www.upsolver.com/blog/apache-parquet-why-use) files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a backup and restore utility for DataJoint pipelines #864

Feature Request

Problem

Requirements

Justification

Alternative Considerations

Additional Research and Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide a backup and restore utility for DataJoint pipelines #864

Description

Feature Request

Problem

Requirements

Justification

Alternative Considerations

Additional Research and Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions