Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Voting closed] Proposal: General conventions for spatial derivatives #1602

Closed
PeerHerholz opened this issue Aug 28, 2023 · 49 comments
Closed
Labels
derivatives discussion ongoing discussion opinions wanted Please read and offer your opinion on this matter

Comments

@PeerHerholz
Copy link
Member

PeerHerholz commented Aug 28, 2023

Note
Voting on this proposal is open from September 28, 2023 to October 12, 2023. Use the 👍/👎 responses to this comment to vote.

Your idea

Hello @bids-standard/maintainers, @bids-standard/steering & everyone,

we, @francopestilli, @arokem, @effigies, @oesteban and @PeerHerholz, would like to submit a proposal concerning spatial derivatives. We hope to engage in fruitful discussions with you all and further refine our proposal. If you have any questions, please don't hesitate to post them as well.

Abstract

In this issue, we propose a general principle for developing BIDS extension proposals for derivative data. The goal is to establish consensus so that parts of BEPs that propose terms in line with this proposal will be considered accepted in principle. The proposal is to ask for feedback from the community, provide a timeline for the discussion, and settle on a decision-making process. At the end of the timeline, we request a decision be reached. The proposal is RECOMMENDED, not REQUIRED, in that BEPs would be allowed to deviate when deemed necessary.

Problem statement

In working through BEPs 12 and 16, we have identified a repeated pattern in generating derivatives within several imaging modalities' workflows where:

  1. We require a reference map that is used to encode spatial features and parameters. There is an antecedent of this in BIDS with BEP23 (see below). In that BEP, the proposed naming takes the pattern _<suffix>ref (e.g., _boldref, _dwiref, etc.), and that solution has been suggested as a possibility in issue #1532 of the spec repository.

  2. We have derived data that are no longer of the same type as the original, but for which we would like to keep the notion of the modality from which this was derived while also signaling that it is derived (i.e., non-raw).

Proposal

Introduce a new suffix pattern : _<suffix>map, where <suffix> is a BIDS suffix used in the raw data (e.g., dwi or bold). For example, the proposed pattern produces the suffices _dwimap or _boldmap. BEPs may use this suffix pattern under the conditions specified below and MUST specify the extension and metadata that are required with the suffix.

  1. The file descriptor does fall under one of the generic derivatives descriptors.
  2. No other descriptor exists in the BIDS spec. For example, statmap cannot be used, because it is already being used, or soon to be, for a different specification.

Motivation

Many users are not equipped to understand fine distinctions between different classes of derivatives (e.g., those that are produced by a model fit and a direct computation)

This suffix pattern provides context through the concatenation of a raw data suffix and the word "map", which implies that the file still contains spatially contiguous information (in contrast to tabular/"tidy" data, with each row representing a brain region, for example).

Precedents and interactions with other BEPs

BEP 23: PET Derivatives

BEP 23 has introduced "maps" that correspond to the conventions introduced by BEP 001 (qMRI), such as T1map, T2map, etc. The following maps were introduced:

  • RDmap (receptor density map)
  • BPmap (binding potential map)
  • GEmap (genetic expression map)

These generally will be distributed as mean/standard-deviation pairs, for example: sub-01_stat-mean_desc-5HT_RDmap.nii.gz/sub-01_stat-std_desc-5HT_RDmap.nii.gz.

BEP 12: Functional MRI derivatives

BEP 12 proposes a collection of summary statistics, including mean, standard deviation, temporal SNR, regional homogeneity, etc. Following the example of BEP 23, it has adopted the proposal.

  • <source_entities>_stat-<mean|std|...>_boldmap.nii.gz

BEP 16: diffusion-weighted imaging derivatives

⚠️ We discuss this option as a considered alternative below

The current writing of the proposal follows the alternative listed below, where model fit and model-derived parameters are described:

  • <source_entities>_model.<extension>
  • <source_entities>_mdp.<extension>

This pattern is, in principle, more generalizable across the other ongoing BEPs and Derivatives in general:

  1. A data process might have generated primary parameters that are either 3D (x,y,z) or 4D (x,y,z,v). These parameters might be of help for further data analysis or data interpretation, and ultimately the data end user. Examples include "statistics" such as mean, std, etc., or model derivatives, such as DTI FA.

  2. At the same time, the process might have generated secondary parameters. These are not strictly necessary for further processing or data interpretation, but they can be potentially useful to interpret the outputs of the data process, to track history of the processing, for reproducibility and ultimately for debugging purposes of the developer/modeler of the code.

BEP 39: dimensionality reduction-based networks

The current version of the proposal uses a comparable pattern as outlined for BEP16:

  • <source_entities>_mdp.<extension>
  • <source_entities>_mfp.<extension>

Alternatives Considered

  1. Suffixes that distinguish between model-fit and model-derived parameters. This alternative is implemented in the current state of BEP16 and BEP39. We assess this option should be deemed rejectable for the following reasons:

    1. This distinction does not seem useful for end users as there are no antecedents of previous adoption by some neuroimaging sub-community, who may or may not care about understanding the distinction.
    2. The distinction between model-fit and model-derived parameters is not always clear. To take one example, the eigenvalues and eigenvectors of the DTI tensor model could be seen as fit or derived. The utility of this high-level distinction is undermined if every such case is either left to the determination of the tool developer or requires an explicit declaration in the spec.
    3. For BIDS purposes, it is more important to state what something is than how it was derived.
    4. This is not something that any currently existing software does.
  2. For the word that modifies the <suffix>, the following options have been considered

    • tensor : this was deemed rejectable because while the fancy Google branding has run with it, it still means something in physics.
    • array : all non-scalar data may be considered an array, but it lacks the association with spatial meaning
    • image : this was deemed rejectable because in its common usage in neuroimaging software, it implies raw data (e.g., boldimage would most likely be read as an image containing BOLD data)
  3. Allowing each BEP to create separate suffixes that provide a good match to the use-case in that BEP. This is the status quo and was deemed rejectable to make both decision making and technical implementation simpler because it provides a reference rule for future implementations and avoids the proliferation of suffixes.

Decision making

As outlined above, we propose a two-stage decision-making process within a set timeline to reach a consensus. Furthermore, we aim to evaluate the feasibility of this process concerning other BIDS-related discussions, ie community-driven/guided decision-making.

Stage 1
In the first stage, comments from the entire community are solicited and discussed. We suggest a time period of 2 weeks, starting the day after the proposal was initially circulated/posted.

Stage 2
In the second stage, voting on the provided/proposed options (based on the Stage 1 outcomes) will take place. Here, we also suggest a time period of 2 weeks, starting the day after Stage 1 was finished.

After this time, this proposal will become part of the standard operating procedures of BIDS and be referenced in BEP development guidelines.

@francopestilli
Copy link
Collaborator

@effigies effigies added derivatives opinions wanted Please read and offer your opinion on this matter discussion ongoing discussion labels Aug 29, 2023
@CPernet
Copy link
Collaborator

CPernet commented Aug 30, 2023

personally, I am against it as this would create a plethora of suffixes Xmap of all sorts, when meas- can take care of a lot of that ; I let @mnoergaard elaborate for BEP23

@effigies
Copy link
Collaborator

@CPernet Can you give an example? It seems to produce a maximum of 5 suffixes (boldmap, cbfmap, dwimap, aslmap, petmap) that I can think of, and this would not be a requirement that they be used. It would say that, if a somewhat generic image derivative is needed for that modality, that would be an acceptable suffix.

In fact, this currently has reduced the number of suffixes for bold derivatives, as many have been moved into the stat- entity. (Which could be changed to meas- if that turns out to be the consensus.)

@mnoergaard
Copy link
Collaborator

mnoergaard commented Aug 30, 2023

If the goal is to produce those five suffixes then all good, but that is not how it was phrased in the BEP23 section of your proposal, mentioning T1map, T2map, RDmap, BPmap and GEmap (suggesting a plethora of suffixes with these suggestions)?

@effigies
Copy link
Collaborator

Those are examples of either existing or proposed Xmaps in various other BEPs, cited as precedent. Are you saying that RDmap etc are no longer proposed by bep23?

@mnoergaard
Copy link
Collaborator

For BEP23, we now put the outcome measure e.g. binding potential (BP) into meas-BP, so there would be no need for it in the suffix. Furthermore, because this data belongs to PET data, it would always be placed inside a pet directory in the derivatives (i.e. derivatives/pipelinex/sub-XX/ses-XX/pet/), thus making the pet in map slightly redundant. Currently for BEP23, the notion of maps has not been implemented yet (we previously talked about molmap, mimap, etc), mostly because it was solved by other entities.

@effigies
Copy link
Collaborator

effigies commented Aug 30, 2023

What suffix are you using for RDmaps now, if it's meas-RD_<suffix>? It appears to be missing from the BEP document.

@CPernet
Copy link
Collaborator

CPernet commented Aug 30, 2023

@effigies

  • petmap is no good and mimap was discussed and preferred molecular imaging map because it serves a larger community than PET (but not used)
  • cbfmap, ok but let's then add cbvmap
  • dwimap really? diffusion of what? if we use modelling like NODDI you have three outputs, again using meas- makes more sense
  • etc ...

@mnoergaard
Copy link
Collaborator

@effigies right now we don't have a suffix in line with the PET examples in BEP38 (https://docs.google.com/document/d/1RxW4cARr3-EiBEcXjLpSIVidvnUSHE7yJCUY91i5TfM/edit#heading=h.4k1noo90gelw). If a suffix is required, then I think the use of petmap is the most reasonable one.

@CPernet
Copy link
Collaborator

CPernet commented Aug 30, 2023

some PET examples:

sub-1_seg-DKT_hemi-left_stat-mean_meas-BPND_pet.nii
sub-1_seg-DKT_hemi-left_stat-mean_meas-SUV_pet.nii

we just do not see the value adding _petmap or _mimap

@effigies
Copy link
Collaborator

Suffixes have been a required part of BIDS files up to this point. I guess I'll go comment on that thread; I don't understand what's being gained by removing suffixes.

@CPernet
Copy link
Collaborator

CPernet commented Aug 30, 2023

' I don't understand what's being gained by removing suffixes' I'm not suggesting removing those that have been useful, I'm saying that it is not always needed, like we don't see the utility in PET to have eg petmap, see above _pet as usual works just fine

@oesteban
Copy link
Collaborator

@effigies was confused because the examples didn't have suffixes before the edit in the comment.

That said, the rationale behind _*map is to say this originally was derived from a given modality (in the sense of BIDS suffix) but now it may represent something completely different, so it's not that modality anymore.

In a strict sense, sub-1_seg-DKT_hemi-left_stat-mean_meas-BPND_pet.nii is not PET anymore, so saying sub-1_seg-DKT_hemi-left_stat-mean_meas-BPND_PETmap.nii signals the fact that this file is directly linked to the original PET dataset, but it is not PET data (as it comes from the scanner, as per BIDS raw definitions) anymore.

Re: datatype folder - some other folders have been proposed, and it is not unimaginable we'll hit a use case where you may need a PETmap outside the pet/ folder.

@CPernet
Copy link
Collaborator

CPernet commented Aug 30, 2023

reasonable argument

@Lestropie
Copy link
Collaborator

Lestropie commented Sep 4, 2023

Regarding the proposal of "*_<modality>map.<ext>", I would consider the example:

... BEP 001 (qMRI), such as T1map, T2map, etc.

to provide an argument against. Those existing suffices serve to distinguish eg. a non-quantitative image that is merely weighted by T1, ie. "T1w", from an image whose intensities reflect the quantitative parameter T1, ie. "map of T1" or "T1map". With something like "*_dwimap.<ext>", it's a quantitative map, but the first half of the suffix now encodes the modality involved (which is already indicated in the filesystem location) rather than the actual parameter encoded. So it's using the same suffix construction pattern, but the building blocks are different, which I would have thought would cause confusion.

Strawmanning the philosophy for a moment, a BIDS App quantifying relaxation tissue properties would be expected to produce "_anatmap.<ext>", yet is promoted to having more specific suffices compared to other modalities only because of BEP001.


RE distinction between "model fit" and "model-derived" (regardless of what particular terms or suffices might be involved):

This distinction does not seem useful for end users ...

Perhaps not currently, but it depends on how forward-thinking we want to be about it. In part, this statement may be expressed from the perspective of "I am a human, and I am reading filesystem names and trying to understand content". Personally I think that with derivatives, especially complex intermediate ones, machine interpretability and robust definition for utilisation in downstream analysis take precedence, whereas for analysis endpoints obviously human readability are again important.

Where I saw a prospect for benefit in the distinction:

  1. If either:

    1. Re-running a tool to generate additional derivatives (as came up recently in Inheritance principle clarification for exact matches #1583); or
    2. Using the output of one tool that has performed a model fit as the input to another tool that does something with that model fit;

    , then it is only the set of parameters that encode the model fit that are necessary to perform those calculations.

  2. There may be metadata that describe a given model, and parameters that dictate the way in which that model was fit to the input data. Some such metadata should be applicable to those parameters that encode the model fit to the data, but should not be applicable to parameters that are further downstream derivatives of such, whether because they are superseded by something else or because such information should be relegated to provenance.
    This is easy enough if the data encoding the fitting of the model to the data are expressed in a single data file: just put the corresponding metadata in the paired sidecar. Where it gets tricky is in situations where such model fit data must by necessity be split across multiple data files. In BEP016 this happens for anything more complex than a tensor or single-tissue CSD. There, what the distinction between model fit and model-derived hypothetically provides is the ability to utilise the inheritance principle to define such metadata just once, and have it be applicable to all data files that encode direct results of the model fit, but not be applicable to anything that is a subsequent derivative calculation of such. Unfortunately this plan doesn't work out-of-the-box because of the limitations of the inheritance principle ([ENH] Inheritance principle: Relaxation allowing multiple files per directory #1003).

The distinction between model-fit and model-derived parameters is not always clear. To take one example, the eigenvalues and eigenvectors of the DTI tensor model could be seen as fit or derived.

I would be interested if anyone can provide an example of a software that performs a direct optimisation of the eigenvalues and eigenvectors, rather than optimising the tensor coefficients and then performing an eigendecomposition of the result. In the absence of such I would state that these are unambiguously model-derived. It is the six coefficients of the symmetric rank-2 tensor that are the model fit parameters. I would hope (perhaps naively) that the presentation of such data in this way would be a net insight over time.

Maybe part of the problem here is a lack of consensus of the criteria by which that distinction is made. In my own formulation, this is not to do with what may be yielded by any given existing software tools, but with:

  1. The mathematical optimisation of the inverse problem that is defined by any given model;
  2. Directed dependencies, ie. you can calculate FA from the set of tensor coefficients, but you can't calculate the set of tensor coefficients from FA.

This is not something that any currently existing software does.

Depends on what you mean by "does". It would not be fair to base such an argument on eg. BIDS Apps; while there's insight to be gained from what BIDS Apps developers have encountered and the solutions they've come up with, BIDS Derivatives should IMO not be determined entirely based on the non-conformant derivatives currently being generated by BIDS Apps. Outside of BIDS Apps, in MRtrix3 land this is exactly what we do. dwi2tensor fits the tensor coefficients to the DWI data, tensor2metric derives parametric maps from this fit. Same for CSD, though more complex: dwi2fod performs the spherical deconvolution method, but from there there's all sorts of derivatives that can be computed using many different commands. We strive for modularity a la Unix philosophy. This might be a source of the different perspectives: BIDS Apps are moreso about pushing one button and getting a dense dataset of results. But I can foresee a future where the derivatives of BIDS Apps are used extensively as inputs to other BIDS Apps (technically it's already happening, just not in a way recognised by the standard / official BIDS Apps interface), and there a more modular approach is going to shine.

For the word that modifies the <suffix>, the following options have been considered

I had previously considered simply "parameter" / "param". That's entirely generic, and should apply reasonably to a broad spectrum of data types beyond images. But it's a little unwieldy. And I was dealing with data relating to fitting of models, and so wanted "model" to fit in there somewhere, but then what results from fitting a model to the data are themselves "parameters"... 🤕 eventually "mfp" and "mdp" arose as a way of getting both in there concisely and yielding the distinction I required for inheritance as above.

As I've stated previously, I'm not married to that proposal; I'm just yet to figure out / be presented with anything that I see less downsides to.

@oesteban
Copy link
Collaborator

oesteban commented Sep 4, 2023

Regarding the proposal of "*_<modality>map.<ext>", I would consider the example:

... BEP 001 (qMRI), such as T1map, T2map, etc.

to provide an argument against. Those existing suffices serve to distinguish eg. a non-quantitative image that is merely weighted by T1, ie. "T1w", from an image whose intensities reflect the quantitative parameter T1, ie. "map of T1" or "T1map". With something like "*_dwimap.<ext>", it's a quantitative map, but the first half of the suffix now encodes the modality involved (which is already indicated in the filesystem location) rather than the actual parameter encoded. So it's using the same suffix construction pattern, but the building blocks are different, which I would have thought would cause confusion.

Except for the final comment about confusion, this is a statement of a fact.

Regarding the confusion, I find this proposal less confusing (since these _*map suffices will clearly be always in a derivatives' folder as opposed to the quantitative counterpart) than any other alternative, as described in the proposal. Regarding BEP001, it is possible that clarifications are pushed upstream instead of downstream, and avoid _*map to indicate the quantitative nature of the contrast _qT2 would be as informative as _T2map, while freeing space for a broader meaning in the derivatives folder. MP2RAGE images were originally encoded as T1w images and the transition to new suffices was not traumatic, even if the BEP has been accepted already.


Re: model derived/fit

The following discussion should not be here, but rather in the main location where this is being discussed. Please let me know what is the target and I'll happily move the discussion there.

Perhaps not currently, but it depends on how forward-thinking we want to be about it. In part, this statement may be expressed from the perspective of "I am a human, and I am reading filesystem names and trying to understand content". Personally I think that with derivatives, especially complex intermediate ones, machine interpretability and robust definition for utilisation in downstream analysis take precedence, whereas for analysis endpoints obviously human readability are again important.

BIDS has been (to great success) pushed by the 80/20 principle. Attempting to capture such complexity with the filesystem organization falls in the 20% of the rule. If being able to represent the difference between derived and fit parameters is necessary, a better solution to BIDS such as NIDM should be considered.

I would be interested if anyone can provide an example of a software that performs a direct optimisation of the eigenvalues and eigenvectors, rather than optimising the tensor coefficients and then performing an eigendecomposition of the result.

BIDS should not suggest/support/indicate how things are implemented. Probably such a software doesn't exist but the point does not support a more favorable view of the proposal. The relationship between BIDS and practice should be the opposite - if there is a software that explicitly differentiates derived/fit in the naming at the output (therefore disallowing the user to choose their preference) because allowing the flexibility in naming would endanger the safe consumption of the results in further processing, then BIDS should maybe consider it. This is what is meant by no software "does" that (hereby the definition of "does" is complete).

@francopestilli
Copy link
Collaborator

francopestilli commented Sep 4, 2023

hi Folks, thanks for the comments! This a critical discussion that perhaps boils down to the following question: "What level of details (parameters, models, output, etc.) will we need in the future when generating derivatives?" If we avoid answering this important question now, we should be able to accept the current proposal. MRTrix3.0 should be able to generate derivative maps compatible with the current proposal.

@Lestropie is correct in saying that currently, we are not extremely modular in BIDS. My suggestion here is to avoid discussing modularity and the type and quantity of parameters but instead discuss those in the context of BIDS 2.0 and BIDS Provenance BEP (which should be the one providing provenance of the parameters used to generate a dataset.

In other words, I am suggesting separating (A) the data generated (derivative)[to be addressed here] from (B) the model and parameters used [more appropriate to be discussed in the context of provenance and/or even BIDS 2.0]

@effigies @oesteban @arokem @dlevitas @PeerHerholz

@CPernet
Copy link
Collaborator

CPernet commented Sep 7, 2023

I have been convinced by @oesteban argument above -- re _*map is not that modality anymore (a transform), but the issue are all the current _map which kinds of conflicts

@arokem
Copy link
Collaborator

arokem commented Sep 7, 2023

Hello! @robertoostenveld asked that I point out this document: https://docs.google.com/document/d/1JtTu5u7XTkWxxnCIH6sxGajGn1qG_syJ-p14aejpk3E/edit, and asked that we consider how the principles laid out in that document relate to this proposal. In particular, one prinicple (not very explicitly laid out in the document, imho) is to try to use entities more and suffixes less. @robertoostenveld identified that as a potential issue with the present proposal being discussed in this issue. Maybe this principle should also be made more explicit in the document.

I will just weight in that this is an objection similar to the one raised by @CPernet above, but as pointed out in previous comments (e.g., #1602 (comment)) it creates a rather limited number of additional suffixes, so might be in line with the principles.

@robertoostenveld
Copy link
Collaborator

That said, the rationale behind _*map is to say this originally was derived from a given modality

So if I were to do a 2nd-level statistical analysis on 1st-level contrasts from bold data, would I get a _boldmapmap?

@tsalo
Copy link
Member

tsalo commented Sep 7, 2023

So if I were to do a 2nd-level statistical analysis on 1st-level contrasts from bold data, would I get a _boldmapmap?

Why not _statmap as proposed in #887?

@effigies
Copy link
Collaborator

effigies commented Sep 7, 2023

First, on the specific proposal:

if I were to do a 2nd-level statistical analysis on 1st-level contrasts from bold data, would I get a _boldmapmap?

GLM outputs are not currently in anybody's proposal. As noted by @tsalo, _statmap is being used for that purpose in some projects until some BEP is accepted that would require a different suffix.

The goal of this proposal is more along the lines of first-order (or close enough to make little difference), voxelwise derivatives from a variety of modalities. These could be MD or FA in diffusion, cortical thickness in structural, regional homogeneity in functional. There just isn't a good term for all of these, so we could split them into a large number of suffixes to say "These must be treated as different measures", but that's excessive. The things they have in common are 1) being voxelwise derivatives; 2) generally being an input to some further processing, not an end goal (although they could be in some contexts!).

I am not married to _boldmap; it just seemed to cover the case as well as anything else I could think of and it seemed to have some precedent in other BEPs, and I'm happy to consider alternatives. I'd even be happy to step out of the discussion, let everybody else consider the alternatives and I will just implement it when you're done.


On the proposal process:

I'm concerned though that we are letting the perfect be the enemy of the good, and the desire many of us have to establish some principles that will apply in all cases is halting progress. This proposal is specifically worded not to be a binding decision on all future BEPs, but as a good-enough solution to a proximal problem seen in multiple BEPs. What I would like to see, and what I heard from Franco and Peer that they would like to see, is some decision that we can go back to our BEPs and implement and not have to relitigate this separately in each BEP and again when it's time for community review.

And I want to stress that we are not asking the steering group to join the debate and then decide for us. The steering group as the final authority might need to decide on contentious spec-level decisions, but the thing that needs steering is the community, not the individual decisions in each BEP. What we need is a way to establish community consensus without waiting to the end of the BEP process, and I think the steering group can play a role in creating that process and giving it legitimacy. If, at the end of that process the community consensus is clear, I would hope the steering group would announce that consensus, regardless of the consensus (or lack thereof) within the steering group itself.

@Lestropie
Copy link
Collaborator

The goal of this proposal is more along the lines of first-order (or close enough to make little difference), voxelwise derivatives from a variety of modalities.

This fairly nicely seeds a different perspective by which I wanted to look at this one.

Firstly, not sure if it's intentional, but to me, for a broad-reaching proposal like this I wouldn't be constraining to specifically voxel-wise maps. Any format that defines a spatial embedding of data can have stored within it data corresponding to any parameter. I don't see that "map" applies to voxels but not to eg. vertices or electrodes.

On one hand that could be a good thing; but conversely it leads into my concern. It's not clear what the scope / limiting principle of the proposal is; where "map" is no longer applicable and something else is either preferable or required. If the result of computing some statistic across volumes is a "map", and a voxel-wise parameter estimated from a biophysical model is a "map", then what are the criteria by which something is not a "map"?

I'll derive an example of this from BEP016, since it's where most of my thinking on the topic has been.

For things such as MD or FA calculated from the tensor model fit, I can absolutely see the appeal of those being broadcast to users as "maps". Indeed even something like NODDI's kappa parameter, more commonly transformed into the Orientation Dispersion Index (ODI), which is directly optimised as part of the model fitting procedure, is entirely reasonably sold as a "map". A key attribute that these have in common is that they are scalars per voxel. However in BEP016 there are may more of what are currently called data representations (not sold on that name), which reflect the fact that in diffusion MRI we have orientationally-dependent information per voxel, and there are many different mathematical forms of such. So the question is: should all of these simply be "_dwimap.*"? For things like tensors or FODs, there's probably still some reasonable sense to it, though it feels less intuitive. Where I think it struggles is something like, say, fixel azimuth-inclination angles. My own subjective sense is that "map" conveys with it a certain expectation of compatibility with trivial user interaction / visualisation, whereas anything for which interpretation of the data is more complex than just "an ND block of data within a spatial embedding" feels like the linguistic correspondence with "map" is more strained.

Linking back to the whole MDP / MFP thing, my observation was that often times we'll fit a complex orientationally-dependent model and then derive some scalar parameter from it since it's better for visualisation / analysis than the whole model fit result, but conversely, there are some models for which there is a scalar parameter that is genuinely a component of the model fitting procedure (eg. NODDI's kappa). For that reason I moved the distinction between scalar & more complex data representations to metadata (also some require additional metadata for interpretation, so not the only reason). But the suffices needed to be compatible with the situation also. I figured that all of these are "parameters" so went with that; but I also saw some merit in the fit vs. derived distinction, hence the acronyms. By erasing the fit / derived distinction, and reverting from "parameters", "map" re-escalates the scalar-vs-more-complex confound. It may be the case that it's acceptable to most to go "map" across the board, I just think it's worth elucidating.

@Lestropie
Copy link
Collaborator

For the word that modifies the <suffix>, the following options have been considered

For BIDS purposes, it is more important to state what something is than how it was derived

Extending your list by one would be "scalar". This is potentially reduced in scope compared to "map" (depending on what you think the scope of such should be), but to me is more precise in terms of both its scope and "stating what something is". It's kind of TRX-like, where for streamlines tractography data "data per vertex", "data per streamline" and "data per group (of streamlines)" are split at the filesystem level.

IN BEP016-land, this would involve giving each "data representation" its own suffix (eg. "dec", "tensor", "sh", "polar", "3vector"; up for debate). This change alone would not resolve the complex inheritance problem, as some of these representations would still have compulsory metadata fields. But the number of new suffices required would be < 10, which is far less than giving every possible derivative image contrast its own suffix.

@effigies
Copy link
Collaborator

effigies commented Sep 8, 2023

I do not have much of a position about what to do with non-scalar "maps". The closest that BEP12 comes to that is per-voxel regressors or spatiotemporal decompositions, and we have given them different suffixes (_timeseries and _{mixing,components}, though I do need to catch up with what's going on with BEP39 for the latter...). I don't see a problem with having a different suffix for per-voxel tensors, if we're really stretching the concept of "map" too far. If we end up with 5-7 new suffixes that cover all of the cases well enough, that seems like a reasonable outcome for a BEP, to my mind.

And I do agree that I would extend map to per-vertex scalars as well. I would defer to tractography experts about what to call per-streamline or per-group metrics.

I'm not 100% sure you're proposing _dwiscalar and _boldscalar intead of _*map, but in case you are, _boldscalar would indicate to me a single value, not a single value per-voxel. Though perhaps _boldscalar.nii.gz would make it clear enough that I've probably got 3D data. It still reads weirdly to me, so I prefer map, but I'm not 100% opposed.

@Lestropie
Copy link
Collaborator

Lestropie commented Sep 8, 2023

I'm not 100% sure you're proposing _dwiscalar and _boldscalar intead of _*map

Probably moreso "_scalar" rather than "_<modality>scalar", as per interleaved discussion in this thread.

Doesn't resolve the issues with "map" regarding applicability / scope (including to data for which more specific suffices already exist), and it'd be a sizeable strategic sidestep in terms of what attributes determine suffices. So I'm not a massive fan myself either. But it's nevertheless an option that was historically considered in this context.


Alternatively, we could try massaging "map" a little. I mentioned I'd contemplated "parameter". In the context of statistical outputs, "_spm" would be applicable for statistical parametric maps, even if there's a software package that took its name from that acronym. And obviously we want something that applies to things that aren't statistical derivatives. Can "_parmap (parametric map)" work? It removes ambiguities around alternative meanings of "map", and to me subjectively it lessens the issue of data called "map" being trivially user navigable; a complex model fit output is still a spatial "map" of that "parameter", just there's no guarantee of any given parameter being human-readable. There's precedent for contraction & concatenation to form suffices; it immediately reminds me of BEP017 "_relmat".

@CPernet
Copy link
Collaborator

CPernet commented Sep 8, 2023

It may seem trivial discussing even the name, but we don't want to reiterate the 'atlas' naming issue.

@effigies is however right, I think we should 1. Agree on some naming now (map, parmap, whatever) 2. Let the BEPs work with that naming for a while so we are moving forward 3. Re-evaluate see how that works for everyone.

@oesteban
Copy link
Collaborator

oesteban commented Sep 8, 2023

You write _*map

Yes, but this is not written in the proposal. I asked about the proposal. My elaboration later on with _*map was intended to communicate with @CPernet, and it seemed to work toward that effect.

So the question remains: where in the proposal is this suggested?

If the proposal were to now add a limited number of well-defined _xxxmap and _yyymap suffixes rather than a wildcard _*map, I don't see the problem so much.

I believe the proposal does not give space for the user to arbitrarily create new suffixes.

I see the plethora of suffixes as a problem because the specification needs to be explicit about them, which means every extra suffix needs a BIDS version bump, and every suffix needs to be incorporated in all tooling. Values represented in <entity>-<value> allow either rigidity (limited list) or flexibility (anything goes as long as the format is consistent) and are IMHO at the moment better dealt with in the specification (here and here). As far as I know, we don't have a human-readable overview of all suffixes. We do have this list that is part of the schema. FYI, we now have 103 suffixes, some of which are of the format *map.

BIDS should try to offer a compromise between flexibility and user-friendliness. If flexibility is afforded by means of entity-label pairs, user-friendliness goes rapidly down because it implements a strategy for users to keep doing exactly what they were doing, just it now looks "bidsy". If that is intended, I would prefer everyone keeps naming things the way they like but make the data NIDM compliant.

The suffix and the prefix are traditionally the most relevant parts of names in practice. Making unspecific suffixes will inevitably lead to discussing a strong ordering criteria for entities, because humans will automatically drop the unspecific suffix from the filename and focus on the next bit.

And again, let me repeat that the spirit aspect of the proposal is also very relevant.

@robertoostenveld
Copy link
Collaborator

So the question remains: where in the proposal is this suggested?

Here it states: "Introduce a new suffix pattern : _map, where is a BIDS suffix used in the raw data (e.g., dwi or bold). For example, the proposed pattern produces the suffices _dwimap or _boldmap."

This proposal makes _boldmap a suffix, and hence by recursion _boldmapmap also becomes a possible suffix. Unless recursion is excluded (which it is not in the proposal), this leads to an infinite number of suffixes.

@oesteban
Copy link
Collaborator

oesteban commented Sep 8, 2023

Despite I disagree recursion can be implied, further down it says:

This suffix pattern provides context through the concatenation of a raw data suffix and the word "map"

So, the only situation where you could get a mapmap is if the original raw data suffix had map in it such as those explained above.

This is probably something to consider when developing the derivatives of those particular suffixes, and this proposal is also made as a recommendation. It could have some language anticipating this case and excluding existing raw suffixes ending in map from it.

Finally, as a redux, this proposal could have brought into the light that currently existing "map" suffixes may require some discussion and allow alternatives that allow the "map" extension in derivatives, if "mapmap" is sufficiently annoying (it is a little to me, tbh).

@dorahermes
Copy link
Member

I would like to see a clearer definition of _*map.ext added to the proposal such that I can better understand the implications. In my current understanding the suffixes determine what type of metadata should be present with a file and restricts which extensions are allowed. For example, the suffixes _bold.ext, _eeg.ext, _meg.ext and _ieeg.ext indicate which .json and .tsv files MUST be present. I don't understand yet whether how that will be applied in the general _*map.ext. Say I create an _ieegmap.ext, I want to know which metadata I should save for interpretation and how I should, for example, save the type of units contained in this file.

  1. What type of metadata and extension go with an _*map.ext? Will this be defined in each extension proposal?

  2. Is _*map.ext intended to be restricted to nifti volumes with values of a certain quantity and units? Can I even use this proposal for electrophysiology data? Would this proposal allow me to save bold responses/values around iEEG electrodes and create a _boldmap.ext, but these data would be of an electrophysiology data type (maybe this example does not fall in the 80/20 rule, but trying to come up with a potentially confusing example on purpose to help clarify/restrict the proposal).

@oesteban
Copy link
Collaborator

@dorahermes I think finding counter-examples that would discourage applying this proposal would be very useful. We did not spend so much time on that, so I agree it's worth attempting. @francopestilli, @arokem, @PeerHerholz and @effigies please correct me if I say something imprecise about the proposal:

  1. What type of metadata and extension go with an _*map.ext? Will this be defined in each extension proposal?

This particular proposal does not prescribe extensions for _<suffix>map (please, let's avoid the star notation _*map as it seems more confusing than useful).

2. Is _*map.ext intended to be restricted to nifti volumes with values of a certain quantity and units? Can I even use this proposal for electrophysiology data?

It's not restricted to NIfTI (e.g., I can see it used with GIFTI and with CIFTI), nor does it impose any other metadata, such as units. Yes, this proposal wants to remain sufficiently flexible so other BEPs can easily adopt or ignore it if it doesn't apply.

@dorahermes
Copy link
Member

Just as a note, I am not trying to discourage this proposal in general, but some edits and clarifications would be very helpful. Specifically, I would like to see more clearly described how each BEP using _<suffix>map (may/should/must) specify the metadata and extensions allowed for the _<suffix>map .

@oesteban
Copy link
Collaborator

I am not trying to discourage this proposal in general

Apologies if my response can be interpreted in this way. I meant that finding counter-examples is indeed a good idea in this particular case.

how each BEP using _<suffix>map (may/should/must) specify the metadata and extensions allowed for the _<suffix>map .

I believe the proposal is very open-ended, so I don't think the goal was to limit what metadata and extensions should/must be used. As said above, I agree the proposal could be more explicit about what may be used.

Perhaps the most direct examples could come from what should not be done with this proposal:

  • Name a derivative image&meta corresponding to an original image (e.g., say some RSfMRI _bold.ext) that remains "the same modality". For example, after head-motion realignment, our RSfMRI timeseries is still a _bold.ext and not a _boldmap.ext

@francopestilli
Copy link
Collaborator

@dorahermes do you see specific issues? Do you think you can provide an example?

@dorahermes
Copy link
Member

The specific issue I have is that the proposal section is too vague. I would suggest adding something like what I put between []:

"Introduce a new suffix pattern : _map, where is a BIDS suffix used in the raw data (e.g., dwi or bold). For example, the proposed pattern produces the suffices _dwimap or _boldmap. [BEPs may use this suffix pattern under the conditions specified below and MUST specify the extension and metadata that are required with the suffix.]"

@francopestilli would it be possible to add a section with 'Conditions under which BEPs may and may not use the suffix pattern?'

For examples, would e.g. tmap or betamap be allowed under this pattern? Betamap seems particularly underspecified and could refer to a statistic OR a beta oscillation map of some kind without further specification.

@francopestilli
Copy link
Collaborator

Hi @dorahermes what about something like the following?

Introduce a new suffix pattern: _<suffix>map, where suffix is a BIDS suffix used in the raw data (e.g., dwi or bold). For example, the proposed pattern produces the suffices _dwimap or _boldmap. BEPs may use this suffix pattern under the conditions specified below and MUST specify the extension and metadata that are required with the suffix.

(1) The file descriptor does fall under one of the generic derivatives descriptors.
(2) No other descriptor exists in the BIDS spec. For example, statsmap cannot be used, because it is already being used, or soon to be, for a different specification.

@oesteban @effigies @arokem @robertoostenveld @CPernet

I asked @PeerHerholz to edit the original post

@CPernet
Copy link
Collaborator

CPernet commented Sep 26, 2023

FYI 'we' rather not use _petmap because many similar results can be obtained from other sources than pet, and _mimap would be preferred (molecular imaging map) -- we'll have to think if similar _*map applies across electrophys

@effigies
Copy link
Collaborator

@francopestilli That sounds good to me.

@CPernet I suppose I consider mimap "in the spirit" of this proposal. Again, the point is not to force any other BEPs to adopt this convention precisely. It's to get a rough consensus that it's good enough to at least use across a couple BEPs and, if others find it good enough within their own BEP communities, then those of us on the outside won't nitpick it when it comes time for broad review.

Speaking as someone with no electrophys experience, I did not really anticipate that they would use it, as it doesn't really seem to fit the shape of their data. That said, I could see a route to a _<X>map that would follow this logic: Suppose you project your EEG data onto the cortical surface and end up with _hemi-<LR>_space-fsnative_eeg.func.gii files. Then, if you follow the lead of BEP012 in its current (unmerged) state, you might describe the temporal mean of that as _hemi-<LR>_space-fsnative_stat-mean_eegmap.func.gii. Not that this is necessarily a useful thing to do...

@CPernet
Copy link
Collaborator

CPernet commented Sep 26, 2023

totally agree @effigies that is there just FYI and for not being stuck on _<modality>map ; @oesteban convinced me it was useful so I have no real concern but those raised about limiting somehow to not see an explosion of suffixes

@dorahermes
Copy link
Member

@francopestilli that sounds good, thank you!

@robertoostenveld
Copy link
Collaborator

I am fine with this as a proposal to the BIDS extensions guidelines as documented on https://github.com/bids-standard/bids-extensions.

@arokem
Copy link
Collaborator

arokem commented Sep 28, 2023

The originally-posited discussion period is now over, and we would like to move to a vote of support for this proposal, as indicated in the OP. Please indicate your support by adding a 👍 to this comment or your objection by adding a 👎. If the yays exceed the nays, we can close this and move on to create a PR that will add the language that has been settled through this discussion on to the bids extensions guidelines. The voting on this will close in 2 weeks, October 12th at 9am PT.

@effigies effigies changed the title Proposal: General conventions for spatial derivatives [Voting open] Proposal: General conventions for spatial derivatives Sep 28, 2023
@arokem
Copy link
Collaborator

arokem commented Oct 10, 2023

A reminder that voting on this issue will close in two days.

@arokem
Copy link
Collaborator

arokem commented Oct 12, 2023

OK. Voting is now closed and the proposal is accepted.

@PeerHerholz : when you get a chance, could you please work to incorporate this in https://github.com/bids-standard/bids-extensions? I'll post an issue linking back here, so we can keep track of this.

@arokem arokem closed this as completed Oct 12, 2023
@arokem arokem changed the title [Voting open] Proposal: General conventions for spatial derivatives [Voting closed] Proposal: General conventions for spatial derivatives Oct 12, 2023
PeerHerholz added a commit to PeerHerholz/bids-extensions that referenced this issue Oct 19, 2023
This PR and the respective commits aim to introduce a first draft of a new page/section concerning  general conventions for BEP development. This was discussed [here](bids-standard/bids-specification#1602) and addresses [this issue](bids-standard#24). 

To this end, a new page called "General conventions" is introduced within/from which specific conventions are included/linked. 
In the current form, parts of the original discussion/proposal were copy-pasted and adapted within a section called "general conventions for spatial derivatives".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
derivatives discussion ongoing discussion opinions wanted Please read and offer your opinion on this matter
Projects
None yet
Development

No branches or pull requests