Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforms-only and apply-transforms modes #2207

Open
effigies opened this issue Jul 6, 2020 · 4 comments
Open

Transforms-only and apply-transforms modes #2207

effigies opened this issue Jul 6, 2020 · 4 comments
Assignees
Labels
derivatives effort: high Estimated high effort task feature impact: medium Estimated medium impact task next
Milestone

Comments

@effigies
Copy link
Member

effigies commented Jul 6, 2020

For very large datasets (10k+ subjects), the cost of storing even a small minimal set of derivatives for each subject can become large. Internally, we delay transforming data in order to do as much as possible in a single shot, reducing interpolations. It should therefore not be very difficult to output all transforms and few if any other derivatives with something like a --transforms-only flag. The user can then construct the needed volumes and time series on the fly, or we could provide an --apply-transforms mode to fully populate a subject directory.

This would be enabled by the X5 transform format, allowing us to store the head-motion-correction transforms for an entire series as a step in a chain from BOLD to template space. I'm not sure if there's an existing format that something like antsApplyTransforms could use; we currently split, apply, and merge.

I list this as medium impact. I think it would be low value for moderately sized datasets, but extremely valuable for very large datasets.

cc @Shotgunosine @mih for thoughts.

@effigies effigies added feature derivatives impact: medium Estimated medium impact task effort: high Estimated high effort task labels Jul 6, 2020
@Shotgunosine
Copy link
Contributor

Shotgunosine commented Jul 6, 2020

Yeah, I think this would be really helpful for efforts to share fmriprep derivatives in the case that people will be downloading that data and running subsequent processes themselves. If the use case is that subsequent processes will happen in the cloud, the benefit will depend on the processing requirements of the apply transforms operation.

Off the top of my head, the only step where this might not work is slice-time correction. @effigies is that also handled with transformations that end up applied in a single step? Could also be tricky with multi-echo.

@effigies
Copy link
Member Author

effigies commented Jul 6, 2020

If the use case is that subsequent processes will happen in the cloud, the benefit will depend on the processing requirements of the apply transforms operation.

It seems likely you'll want some level of caching, but

Off the top of my head, the only step where this might not work is slice-time correction. @effigies is that also handled with transformations that end up applied in a single step?

Yeah, STC is done separately, but should be deterministic. I don't think there's any fundamental reason that STC couldn't be included as part of the X5 chain, but transforms with temporal components might (apart from one transform per time point) not be specified yet.

Could also be tricky with multi-echo.

I suspect the combination could be represented as a voxel x echo weight matrix, which would make it not too far from a displacement field. But that's a guess based on a very qualitative understanding of ME.

@oesteban
Copy link
Member

oesteban commented Aug 9, 2020

Yeah, STC is done separately, but should be deterministic. I don't think there's any fundamental reason that STC couldn't be included as part of the X5 chain, but transforms with temporal components might (apart from one transform per time point) not be specified yet.

Including STC at once would be really nice - but at this point, I see it very far in the future. To be able to include it directly in the resampling we would need to have a way of interpolating through time too, which is not currently available through scipy (and I honestly don't know the interpolating kernel you should use right this minute).

@oesteban oesteban added this to the 21.x milestone Jul 19, 2021
@Lestropie
Copy link
Contributor

Posting to register personal investment in this functionality, and to hopefully draw an update on what would be required to contribute given any changes to transformation handling that have happened since initial posting of the issue.

@effigies effigies self-assigned this Sep 27, 2022
effigies added a commit that referenced this issue Feb 2, 2023
Back when I commenced some work on #2207 (from which I'll need to
familiarise myself with #2913), I started by looking at the CLI and
thinking about what the right interface would be for such a capability
(see eg. #2805, #2806). I immediately found some command-line options
that really felt as though they were in the wrong place. So here I've
separated out the set of CLI changes that I'd generated in the course of
that little experiment.

The git diff is quite unclean, so I've produced the set of options that
would be moved and their corresponding old and new option groups.

| Option | Old option group | New option group |

|-----------------------------|-------------------------------|------------------------------------------------------|
| `--anat-only` | Options to handle performance | Options for performing
only a subset of the workflow |
| `--boilerplate-only` | Options to handle performance | Options for
performing only a subset of the workflow |
| `--md-only-boilerplate` | Options to handle performance | Options for
modulating outputs |
| `--cifti-output` | Surface preprocessing options | Workflow
configuration |
| `--error-on-aroma-warnings` | Options to handle performance | Specific
options for running ICA_AROMA |
| `--verbose` | Options to handle performance | Other options |
| `--me-output-echos` | Workflow configuration | Options for modulating
outputs |
| `--medial-surface-nan` | Workflow configuration | Options for
modulating outputs |
| `--no-submm-recon` | Surface preprocessing options | Specific options
for FreeSurfer processing |
| `--output-layout` | Other options | Options for modulating outputs |
| `--sloppy` | Other options | Options to handle performance |
| `--track-carbon` | Other options | Options for carbon tracking |
| `--country-code` | Other options | Options for carbon tracking |

New option groups:
-   "Options for performing only a subset of the workflow"
-   "Options for modulating outputs"
-   "Options for carbon tracking"

Deleted option groups:
-   "Surface preprocessing options"

In particular, "Options for performing only a subset of the workflow" is
what I was looking for in the context of #2207, where I had initially
added command-line options "`--fmri_withhold`" and "`--fmri_regenerate`"
(the naming of such options should perhaps be discussed in #2913).

Open to modification of proposed changes, or indeed addition of other
changes (since this would be a good opportunity to review placement of
the complete set of command-line options). For instance I'm currently
looking at `[ --output-spaces, --cifti-output, --me-output-echos ]` as a
candidate option group.
effigies added a commit that referenced this issue Aug 21, 2023
This begins on #2207.

Todo:

* [x] Validate correct operation of sMRIPrep integration with shim
workflow
* [x] Use smriprep fit and derivatives workflows separately, working
toward `--minimal` derivatives mode
* [x] Sink sdcflows' preprocessed fieldmap and fetch from derivatives
directory
* [x] Refactor BOLD stages to use the `input -> proc -> derivatives ->
buffer` pattern
* [x] Reduce fit stages to generate minimal files
* [ ] Thorough documentation of `--minimal`
effigies added a commit that referenced this issue Aug 24, 2023
…map (#3078)

## Changes proposed in this pull request

This PR continues work on #2207, addressing precomputed derivatives. The
IO spec was able to be simplified, as we only work with affine
transforms, and we need to use both the entities from the specific BOLD
to be corrected and the ID of the fieldmap used to correct it.
@effigies effigies added the next label Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
derivatives effort: high Estimated high effort task feature impact: medium Estimated medium impact task next
Projects
Status: Todo
Status: No status
Development

No branches or pull requests

4 participants