datalad · christian-monch · Nov 11, 2024 · Oct 25, 2024 · Oct 28, 2024 · Oct 28, 2024
diff --git a/README.md b/README.md
@@ -6,28 +6,32 @@
 [![Hatch project](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)
 
 
-**This code is a POC**, that means currently:
-- code does not thoroughly validate inputs
-- names might be inconsistent
-- few tests
-- fewer docs
-- no support for locking
-
-This is a naive datalad compute extension that serves as a playground for
-the datalad remake-project. 
-
-It contains an annex remote that can compute content on demand. It uses template
-files that specify the operations. It encodes computation parameters in URLs
-that are associated with annex keys, which allows to compute dropped content
-instead of fetching it from some storage system.  It also contains the new
-datalad command `compute` that
-can trigger the computation of content, generate the parameterized URLs, and
-associate this URL with the respective annex key. This information can then
-be used by the annex remote to repeat the computation.
+**NOTE:** This extension is currently work-in-progress!
+
+
+## About
+
+This extension equips DataLad with the functionality to (re)compute file
+content on demand, based on a specified set of instructions. In particular,
+it features a `datalad make` command for capturing instructions on how to
+compute a given file, allowing the file content to be safely removed. It also
+implements a git-annex special remote, which enables the (re)computation of
+the file content based on the captured instructions. This is particularly
+useful when the file content can be produced deterministically. If storing
+the file content is more expensive than (re)producing it, this functionality
+can lead to more effective resource utilization. Thus, this extension may be
+of interest to a wide, interdisciplinary audience, including researchers,
+data curators, and infrastructure administrators.
+
+
+## Requirements
+
+This extension requires Python >= `3.11`.
+
 
 ## Installation
 
-There is no pypi-package yet. To install the extension, clone the repository
+There is no PyPI package yet. To install the extension, clone the repository
 and install it via `pip` (preferably in a virtual environment):
 
 ```bash
@@ -37,18 +41,24 @@ and install it via `pip` (preferably in a virtual environment):
 > pip install .
 ```
 
+To check your installation, run:
+
+```bash
+> datalad make --help
+```
+
 
 ## Example usage
 
-Create a dataset
+Create a dataset:
 
 
 ```bash
 > datalad create remake-test-1
 > cd remake-test-1
 ```
 
-Create the template directory and a template
+Create a template and place it in the `.datalad/make/methods` directory:
 
 ```bash
 > mkdir -p .datalad/make/methods
@@ -63,25 +73,19 @@ EOF
 > datalad save -m "add `one-to-many` remake method"
 ```
 
-Create a "datalad-remake" annex special remote:
+Create a `datalad-remake` git-annex special remote:
 ```bash
 > git annex initremote datalad-remake encryption=none type=external externaltype=datalad-remake
 ```
 
 Execute a computation and save the result:
 ```bash
-> datalad make -p first=bob -p second=alice -p output=name -o name-1.txt \
--o name-2.txt one-to-many
+> datalad make -p first=bob -p second=alice -p output=name \
+-o name-1.txt -o name-2.txt one-to-many
 ```
 The method `one-to-many` will create two files with the names `<output>-1.txt`
-and `<output>-2.txt`. That is why the two files `name-1.txt` and `name-2.txt`
-are listed as outputs in the command above.
-
-Note that only output files that are defined by the `-o/--output` option will
-be available in the dataset after `datalad make`. Similarly, only the files
-defined by `-i/--input` will be available as inputs to the computation (the
-computation is performed in a "scratch" directory, so the input files must be
-copied there and the output files must be copied back).
+and `<output>-2.txt`. Thus, the two files `name-1.txt` and `name-2.txt` need to
+be specified as outputs in the command above.
 
 ```bash
 > cat name-1.txt
@@ -91,7 +95,7 @@ content: alice
 ```
 
 Drop the content of `name-1.txt`, verify it is gone, recreate it via
-`datalad get`, which "fetches" is from the compute remote:
+`datalad get`, which "fetches" it from the `datalad-remake` remote:
 
 ```bash
 > datalad drop name-1.txt
@@ -100,31 +104,45 @@ Drop the content of `name-1.txt`, verify it is gone, recreate it via
 > cat name-1.txt
 ``` 
 
-The command `datalad make` does also support to just record the parameters
-that would lead to a certain computation, without actually performing the
-computation. We refer to this as *speculative computation*.
-
-To use this feature, the following configuration value has to be set:
+The `datalad make` command can also be used to perform a *prospective
+computation*. To use this feature, the following configuration value 
+has to be set:
 
 ```bash
 > git config annex.security.allow-unverified-downloads ACKTHPPT
 ```
 
-Afterward, a speculative computation can be recorded by providing the `-u` option
-(url-only) to `datalad make`.
+Afterwards, a prospective computation can be initiated by using the 
+`-u / --url-only` option:
 
 ```bash
 > datalad make -p first=john -p second=susan -p output=person \
 -o person-1.txt -o person-2.txt -u one-to-many
-> cat person-1.txt    # this will fail, because the computation has not yet been performed
 ```
 
-`ls -l person-1.txt` will show a link to a not-downloaded URL-KEY.
-`git annex whereis person-1.txt` will show the associated computation description URL.
-No computation has been performed yet, `datalad make` just creates an URL-KEY and
-associates a computation description URL with the URL-KEY.
+The following command will fail, because no computation has been performed,
+and the file content is unavailable:
+
+```bash
+> cat person-1.txt
+```
+
+We can further inspect this with `git annex info`:
+
+```bash
+> git annex info person-1.txt
+```
+
+Similarly, `git annex whereis` will show the URL, that can be handled by the
+git-annex special remote:
+
+```bash
+> git annex whereis person-1.txt
+```
+
+Finally, `datalad get` can be used to produce the file content (for the first
+time!) based on the specified instructions:
 
-Use `datalad get` to perform the computation for the first time and receive the result::
 ```bash
 > datalad get person-1.txt
 > cat person-1.txt
@@ -136,7 +154,8 @@ Use `datalad get` to perform the computation for the first time and receive the
 See [CONTRIBUTING.md](CONTRIBUTING.md) if you are interested in internals or
 contributing to the project.
 
-## Acknowledgements
+
+# Acknowledgements
 
 This development was supported by European Union’s Horizon research and
 innovation programme under grant agreement [eBRAIN-Health