Update:
I used this for about a year. The tool itself works well enough, but keeping the exports for individual services working is a continual pain. Something is constantly breaking because of OS X adding new permission restrictions, or an app removing functionality I relied on, or whatever. I think it's mostly not worth the effort unless the data in a service is extremely important to me. And if you're only exporting from a handful of services, then a middleman like this tool is probably more trouble than it's worth.
This tool helps manage automatic exports/backups of personal data (for example, from cloud services like Todoist or Goodreads).
It handles:
- Running exports according to a schedule (note: export_manager itself must be run via a scheduler like cron/launchd/etc)
- Keeping data and logs organized
- Cleaning up old data
- Gathering metrics and generating reports to help you see whether your backups are working
Terminology:
- dataset: a collection of parcels from a particular source, and related config and metadata
- parcel: data exported on a single occasion (e.g. your Todoist data retrieved on 2020-0215T01:02:03Z) and related metadata
- Install python3 and pip
pip install export_manager
- Use the tool to set up directories for each dataset:
This will create a directory structure like this:
mkdir ~/exports export_manager init ~/exports/todoist export_manager init ~/exports/goodreads # etc. The names can be anything you want, those are just examples.
~/exports/ todoist/ config.toml metrics.csv data/ incomplete/ log/ goodreads/ config.toml metrics.csv data/ incomplete/ log/
- Edit the
config.toml
file in each subdirectory to specify the schedule and the command to run. For example, if you install exporteer_todoist you could use the following config to export data from Todoist:# Command to invoke. $PARCEL_PATH will be set to ~/exports/todoist/data/DATETIME # $DATASET_PATH is also available and will be set to ~/exports/todoist cmd = "TODOIST_API_TOKEN=your_token exporteer_todoist full_sync > $PARCEL_PATH.json" # Only get a new export if the last one is at least 1 day old. interval = "1 day" # Only keep the most recent 5 successful exports. # If you don't specify this, export_manager will not delete old exports. keep = 5
- Run
export_manager process ~/exports/*
to run all the exports and cleanups that are due. Set up a cron or launchd job to run this periodically.
You shouldn't trust your backups unless you're testing them. Testing has to involve some manual action - if it were fully automated, you'd never know if the automation broke. But ideally, you'd automate the process of collecting all the evidence to prove that a backup is working, so that all you have to do is periodically look at it and say "yep, looks good."
export_manager helps with this by providing a report you can generate:
export_manager report ~/exports/*
This will:
- Warn you of any obvious problems, such as datasets that are overdue or failing.
- Tell you when each dataset was most recently exported successfully.
- Show you metrics about each dataset's most recent successful parcel, and compare them with metrics from 7 days and 180 days ago.
By default, the metrics include the number of bytes and number of files in the parcel when it was produced.
You can also define custom metrics for each dataset in the config.toml
file.
For example, for json data, you might use jq to count some elements of the json.
The following config creates a metric named "tasks" to track the number of tasks in the todoist exports configured above:
metrics.tasks.cmd = "jq '.items | length' $PARCEL_PATH"
The process
command gathers all the configured metrics every time a new parcel is produced and stores them in metrics.csv
.
If you make your dataset directory a git repo, and set git = true
in config.toml
, then the data files of successful exports, as well as the metrics.csv file, will be committed after each change.
Some services are prohibitively difficult to export from automatically, but do provide a way to manually export your data.
export_manager can keep track of those manual exports for you.
To minimize the amount of manual work, you can configure it to automatically pull in any files that match a certain path pattern, so that all you need to do is dump the manually exported file into that location.
For example, if you put the following in config.toml
:
ingest.paths = ["/Users/me/Dropbox/*.opml"]
Then every time the process
command is run, export_manager will check for files with that suffix in /Users/me/Dropbox
.
If it finds any, it will move them into the dataset directory, assign them a parcel ID, run metrics on them, etc.
(By default, the modification time of the file is used as the parcel ID.
You can put ingest.time_source = "now"
in your config.toml
if you'd rather use the current date/time as the parcel ID.)
Help for each command is available on the command line and in the doc folder.
You can use export_manager programmatically. The most important module is export_manager.dataset.
Setup:
- Install python3 and pip
- Clone the repo
- I recommend creating a venv:
cd export_manager python3 -m venv venv source venv/bin/activate
- Install dependencies:
pip install . pip install -r requirements-dev.txt
To run unit tests:
PYTHONPATH=src pytest
(Overriding PYTHONPATH as shown ensures the tests run against the code in the src/ directory rather than the installed copy of the package.)
To run the CLI:
PYTHONPATH=src python -m export_manager ...
Bug reports and pull requests are welcome on GitHub at https://github.com/brokensandals/export_manager.
This is available as open source under the terms of the MIT License.