Skip to content

Commit

Permalink
chore: migrate from christian-monch/compute
Browse files Browse the repository at this point in the history
  • Loading branch information
christian-monch committed Oct 22, 2024
2 parents 99f1399 + 81f08a3 commit 817db19
Show file tree
Hide file tree
Showing 39 changed files with 2,271 additions and 39 deletions.
4 changes: 2 additions & 2 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,14 +68,14 @@ environment:
# Ubuntu core tests
- job_name: test-linux
APPVEYOR_BUILD_WORKER_IMAGE: Ubuntu2204
PY: 3.9
PY: 3.11
INSTALL_GITANNEX: git-annex -m snapshot

# same as 'test-linux', but TMPDIR is on a crippled filesystem, causing
# most, if not all test datasets to be created on that filesystem
- job_name: test-linux-crippled
APPVEYOR_BUILD_WORKER_IMAGE: Ubuntu2204
PY: 3.9
PY: 3.11
# datalad-annex git remote needs something after git-annex_8.20211x
INSTALL_GITANNEX: git-annex -m snapshot

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/conventional-commits.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ jobs:
run: python -m pip install commitizen
- name: Run commit message checks
run: |
cz check --rev-range ${{ github.event.pull_request.base.sha }}..${{ github.event.pull_request.head.sha }}
cz check --rev-range 630c2f514fd8d42c4def3d7ee588487ffa64cc38..${{ github.event.pull_request.head.sha }}
27 changes: 27 additions & 0 deletions .github/workflows/docbuild.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: docs

on: [push, pull_request]

jobs:
build:

runs-on: ubuntu-latest

steps:
- name: Set up environment
run: |
git config --global user.email "test@github.land"
git config --global user.name "GitHub Almighty"
- uses: actions/checkout@v4
- name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: 3.11
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements-devel.txt
pip install .
- name: Build docs
run: |
make -C docs html
8 changes: 5 additions & 3 deletions .github/workflows/mypy-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,13 @@ jobs:
if: steps.changed-py-files.outputs.any_changed == 'true'
run: |
# get any type stubs that mypy thinks it needs
hatch run types:mypy --install-types --non-interactive --follow-imports skip ${{ steps.changed-py-files.outputs.all_changed_files }}
hatch run types:mypy --install-types --non-interactive --ignore-missing-imports --follow-imports skip ${{ steps.changed-py-files.outputs.all_changed_files }}
# run mypy on the modified files only, and do not even follow imports.
# this results is a fairly superficial test, but given the overall
# state of annotations, we strive to become more correct incrementally
# with focused error reports, rather than barfing a huge complaint
# that is unrelated to the changeset someone has been working on.
# run on the oldest supported Python version
hatch run types:mypy --python-version 3.9 --follow-imports skip --pretty --show-error-context ${{ steps.changed-py-files.outputs.all_changed_files }}
# run on the oldest supported Python version.
# specify `--ignore-missing-imports` until the datalad-packages have
# type stubs for all their modules.
hatch run types:mypy --python-version 3.11 --ignore-missing-imports --follow-imports skip --pretty --show-error-context ${{ steps.changed-py-files.outputs.all_changed_files }}
6 changes: 4 additions & 2 deletions .github/workflows/mypy-project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,7 @@ jobs:
# get any type stubs that mypy thinks it needs
hatch run types:mypy --install-types --non-interactive --follow-imports skip datalad_core
# run mypy on the full project.
# run on the oldest supported Python version
hatch run types:mypy --python-version 3.9 --pretty --show-error-context datalad_core
# run on the oldest supported Python version.
# specify `--ignore-missing-imports` until the datalad-packages have
# type stubs for all their modules.
hatch run types:mypy --python-version 3.11 --ignore-missing-imports --pretty --show-error-context datalad_core
Empty file.
132 changes: 132 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,138 @@
[![Hatch project](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)


**This code is a POC**, that means currently:
- code does not thoroughly validate inputs
- names might be inconsistent
- few tests
- fewer docs
- no support for locking

This is a naive datalad compute extension that serves as a playground for
the datalad remake-project.

It contains an annex remote that can compute content on demand. It uses template
files that specify the operations. It encodes computation parameters in URLs
that are associated with annex keys, which allows to compute dropped content
instead of fetching it from some storage system. It also contains the new
datalad command `compute` that
can trigger the computation of content, generate the parameterized URLs, and
associate this URL with the respective annex key. This information can then
be used by the annex remote to repeat the computation.

## Installation

There is no pypi-package yet. To install the extension, clone the repository
and install it via `pip` (preferably in a virtual environment):

```bash
git clone https://github.com/christian-monch/datalad-compute.git
cd datalad-compute
pip install -r requirements-devel.txt
pip install .
```


## Example usage

Install the extension and create a dataset


```bash
> datalad create compute-test-1
> cd compute-test-1
```

Create the template directory and a template

```bash
> mkdir -p .datalad/compute/methods
> cat > .datalad/compute/methods/one-to-many <<EOF
inputs = ['first', 'second', 'output']
use_shell = 'true'
executable = 'echo'
arguments = [
"content: {first} > '{output}-1.txt';",
"echo content: {second} > '{output}-2.txt'",
]
EOF
> datalad save -m "add `one-to-many` compute method"
```

Create a "compute" annex special remote:
```bash
> git annex initremote compute encryption=none type=external externaltype=compute
```

Execute a computation and save the result:
```bash
> datalad compute -p first=bob -p second=alice -p output=name -o name-1.txt \
-o name-2.txt one-to-many
```
The method `one-to-many` will create two files with the names `<output>-1.txt`
and `<output>-2.txt`. That is why the two files `name-1.txt` and `name-2.txt`
are listed as outputs in the command above.

Note that only output files that are defined by the `-o/--output` option will
be available in the dataset after `datalad compute`. Similarly, only the files
defined by `-i/--input` will be available as inputs to the computation (the
computation is performed in a "scratch" directory, so the input files must be
copied there and the output files must be copied back).

```bash
> cat name-1.txt
content: bob
> cat name-2.txt
content: alice
```

Drop the content of `name-1.txt`, verify it is gone, recreate it via
`datalad get`, which "fetches" is from the compute remote:

```bash
> datalad drop name-1.txt
> cat name-1.txt
> datalad get name-1.txt
> cat name-1.txt
```

The command `datalad compute` does also support to just record the parameters
that would lead to a certain computation, without actually performing the
computation. We refer to this as *speculative computation*.

To use this feature, the following configuration value has to be set:

```bash
> git config annex.security.allow-unverified-downloads ACKTHPPT
```

Afterward, a speculative computation can be recorded by providing the `-u` option
(url-only) to `datalad compute`.

```bash
> datalad compute -p first=john -p second=susan -p output=person \
-o person-1.txt -o person-2.txt -u one-to-many
> cat person-1.txt # this will fail, because the computation has not yet been performed
```

`ls -l person-1.txt` will show a link to a not-downloaded URL-KEY.
`git annex whereis person-1.txt` will show the associated computation description URL.
No computation has been performed yet, `datalad compute` just creates an URL-KEY and
associates a computation description URL with the URL-KEY.

Use `datalad get` to perform the computation for the first time and receive the result::
```bash
> datalad get person-1.txt
> cat person-1.txt
```


# Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) if you are interested in internals or
contributing to the project.

## Acknowledgements

This development was supported by European Union’s Horizon research and
Expand Down
56 changes: 39 additions & 17 deletions datalad_remake/__init__.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,47 @@
"""DataLad remake extension"""

from __future__ import annotations

from datalad_remake._version import __version__

__all__ = [
'__version__',
'command_suite',
]

# command_suite = (
# # description of the command suite, displayed in cmdline help
# "Demo DataLad command suite",
# [
# # specification of a command, any number of commands can be defined
# (
# # importable module that contains the command implementation
# 'datalad_remake.commands.compute_cmd',
# # name of the command class implementation in above module
# 'Compute',
# # optional name of the command in the cmdline API
# 'compute',
# # optional name of the command in the Python API
# 'compute'
# ),
# ]
# )

# Defines a datalad command suite.
# This variable must be bound as a setuptools entrypoint
# to be found by datalad
command_suite = (
# description of the command suite, displayed in cmdline help
'DataLad remake command suite',
[
# specification of a command, any number of commands can be defined
(
# importable module that contains the command implementation
'datalad_remake.commands.make_cmd',
# name of the command class implementation in above module
'Make',
# optional name of the command in the cmdline API
'make',
# optional name of the command in the Python API
'make',
),
(
# importable module that contains the command implementation
'datalad_remake.commands.provision_cmd',
# name of the command class implementation in above module
'Provision',
# optional name of the command in the cmdline API
'provision',
# optional name of the command in the Python API
'provision',
),
],
)


url_scheme = 'datalad-remake'
template_dir = '.datalad/make/methods'
specification_dir = '.datalad/make/specifications'
Empty file.
Loading

0 comments on commit 817db19

Please sign in to comment.