Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a degree of freedom (dof) flag (--n-independent-echos) to allow users to set the dof used in fstat calculations #1177

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,4 @@ ENV/

# vim swap files
*.swp

8 changes: 8 additions & 0 deletions tedana/decomposition/pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ def tedpca(
adaptive_mask,
io_generator,
tes,
echo_dof=None,
algorithm="aic",
kdaw=10.0,
rdaw=1.0,
Expand All @@ -79,6 +80,10 @@ def tedpca(
The output generation object for this workflow
tes : :obj:`list`
List of echo times associated with `data_cat`, in milliseconds
echo_dof : :obj:`int`, optional
Degree of freedom to use in goodness of fit metrics (fstat).
Primarily used for EPTI acquisitions.
If None, number of echoes will be used. Default is None.
algorithm : {'kundu', 'kundu-stabilize', 'mdl', 'aic', 'kic', float}, optional
Method with which to select components in TEDPCA. PCA
decomposition with the mdl, kic and aic options are based on a Moving Average
Expand Down Expand Up @@ -355,6 +360,7 @@ def tedpca(
mixing=comp_ts,
adaptive_mask=adaptive_mask,
tes=tes,
echo_dof=echo_dof,
io_generator=io_generator,
label="PCA",
external_regressors=None,
Expand All @@ -377,6 +383,7 @@ def tedpca(
component_table, metric_metadata = kundu_tedpca(
component_table,
n_echos,
echo_dof,
kdaw,
rdaw,
stabilize=False,
Expand All @@ -385,6 +392,7 @@ def tedpca(
component_table, metric_metadata = kundu_tedpca(
component_table,
n_echos,
echo_dof,
kdaw,
rdaw,
stabilize=True,
Expand Down
18 changes: 16 additions & 2 deletions tedana/metrics/collect.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def generate_metrics(
mixing: npt.NDArray,
adaptive_mask: npt.NDArray,
tes: Union[List[int], List[float], npt.NDArray],
echo_dof: int = None,
io_generator: io.OutputGenerator,
label: str,
external_regressors: Union[pd.DataFrame, None] = None,
Expand All @@ -53,6 +54,10 @@ def generate_metrics(
For more information on thresholding, see `make_adaptive_mask`.
tes : list
List of echo times associated with `data_cat`, in milliseconds
echo_dof : int
Degree of freedom to use in goodness of fit metrics (fstat).
Primarily used for EPTI acquisitions.
If None, number of echoes will be used. Default is None.
io_generator : tedana.io.OutputGenerator
The output generator object for this workflow
label : str in ['ICA', 'PCA']
Expand Down Expand Up @@ -196,6 +201,7 @@ def generate_metrics(
mixing=mixing,
adaptive_mask=adaptive_mask,
tes=tes,
echo_dof=echo_dof,
)
metric_maps["map FT2"] = m_t2
metric_maps["map FS0"] = m_s0
Expand Down Expand Up @@ -224,7 +230,11 @@ def generate_metrics(

if "map FT2 clusterized" in required_metrics:
LGR.info("Calculating T2* F-statistic maps")
f_thresh, _, _ = getfbounds(len(tes))
if echo_dof is None:
f_thresh, _, _ = getfbounds(len(tes))
else:
f_thresh, _, _ = getfbounds(echo_dof)
Comment on lines +233 to +236
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler to just set echo_dof as len(tes) if it's not provided, at the top of this function.

However, I am concerned that setting the degrees of freedom based on the number of echoes is already a bug (see #811). It just doesn't account for the fact that the degrees of freedom varies across voxels due to the adaptive mask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on this change.
Addressing #811 might be useful, but seems beyond the scope of this PR. If we're clarifying which inputs are supposed to be DOF, this might be easier to address.


metric_maps["map FT2 clusterized"] = dependence.threshold_map(
maps=metric_maps["map FT2"],
mask=mask,
Expand All @@ -234,7 +244,11 @@ def generate_metrics(

if "map FS0 clusterized" in required_metrics:
LGR.info("Calculating S0 F-statistic maps")
f_thresh, _, _ = getfbounds(len(tes))
if echo_dof is None:
f_thresh, _, _ = getfbounds(len(tes))
else:
f_thresh, _, _ = getfbounds(echo_dof)

metric_maps["map FS0 clusterized"] = dependence.threshold_map(
maps=metric_maps["map FS0"],
mask=mask,
Expand Down
15 changes: 13 additions & 2 deletions tedana/metrics/dependence.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ def calculate_f_maps(
mixing: np.ndarray,
adaptive_mask: np.ndarray,
tes: np.ndarray,
echo_dof=None,
f_max: float = 500,
) -> typing.Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
"""Calculate pseudo-F-statistic maps for TE-dependence and -independence models.
Expand All @@ -153,6 +154,10 @@ def calculate_f_maps(
"good signal". Limited to masked voxels.
tes : (E) array_like
Echo times in milliseconds, in the same order as the echoes in data_cat.
echo_dof : int
Degree of freedom to use in goodness of fit metrics (fstat).
Primarily used for EPTI acquisitions.
If None, number of echoes will be used. Default is None.
f_max : float, optional
Maximum F-statistic, used to crop extreme values. Values in the
F-statistic maps greater than this value are set to it.
Expand Down Expand Up @@ -201,7 +206,10 @@ def calculate_f_maps(
pred_s0 = x1[:j_echo, :] * np.tile(coeffs_s0, (j_echo, 1))
sse_s0 = (comp_betas[:j_echo] - pred_s0) ** 2
sse_s0 = sse_s0.sum(axis=0) # (S,) prediction error map
f_s0 = (alpha - sse_s0) * (j_echo - 1) / (sse_s0)
if echo_dof is None:
f_s0 = (alpha - sse_s0) * (j_echo - 1) / (sse_s0)
else:
f_s0 = (alpha - sse_s0) * (echo_dof - 1) / (sse_s0)
f_s0[f_s0 > f_max] = f_max
f_s0_maps[mask_idx, i_comp] = f_s0[mask_idx]

Expand All @@ -212,7 +220,10 @@ def calculate_f_maps(
pred_t2 = x2[:j_echo] * np.tile(coeffs_t2, (j_echo, 1))
sse_t2 = (comp_betas[:j_echo] - pred_t2) ** 2
sse_t2 = sse_t2.sum(axis=0)
f_t2 = (alpha - sse_t2) * (j_echo - 1) / (sse_t2)
if echo_dof is None:
f_t2 = (alpha - sse_t2) * (j_echo - 1) / (sse_t2)
else:
f_t2 = (alpha - sse_t2) * (echo_dof - 1) / (sse_t2)
f_t2[f_t2 > f_max] = f_max
f_t2_maps[mask_idx, i_comp] = f_t2[mask_idx]

Expand Down
1 change: 1 addition & 0 deletions tedana/selection/component_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,7 @@ def select(
f"Step {self.current_node_idx_}: Running function {node['functionname']} "
f"with parameters: {all_params}"
)

# run the decision node function
self = fcn(self, **params, **kwargs)

Expand Down
23 changes: 21 additions & 2 deletions tedana/selection/selection_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,10 +714,19 @@ def calc_kappa_elbow(
This also means the kappa elbow should be calculated before those two other functions
are called
"""
if (
"echo_dof" in selector.cross_component_metrics_.keys()
and selector.cross_component_metrics_["echo_dof"]
):
echo_dof = selector.cross_component_metrics_["echo_dof"]
else:
# DOF is number of echoes if not otherwise specified
echo_dof = selector.cross_component_metrics_["n_echos"]
Comment on lines +723 to +724
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify- DOF wouldn't be the number of echoes, right? It would be the number of echoes minus 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ug. I'm fairly sure in all calculations that use echo_dof, it's echo_dof - 1. The math is correct, but the terminology is wrong. This also brings up an issue for @katielamar in that, if echo_dof is the true DOF, should we run calculations with echo_dof and not echo_dof - 1?

I think a terminology-consistent solution should be to make echo_dof the DOF, when we calculate this based on the number of echoes, echo_dof = n_echoes - 1 and we'd then need to update all calculations that use these values to not subtract another -1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You comment below of changing echo_dof to n_independent_sources is another solution that would not require changing formulas.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I'm not sure. Im going to check with Tom on this and Ill get back to you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think @tsalo is correct here. My proposal is:

  • For any stat function, like getfbounds the parameter should be n_independent_sources
  • For any function that is specifically about the number of echoes (i.e. input to tedana.py) the parameter should be n_independent_echoes or n_indep_echoes or 'n_indie_echoes`

This would mean that the user facing parameter has a clearer meaning that more accurately matches what we're currently doing in the code. This might mess up existing scripts that Katie & Charles are using, but hopefully that won't be a huge problem.

@katielamar I know you're preparing your talk right now. Tell me if you want me to make this change & open another PR on your branch for you to review.

outputs = {
"decision_node_idx": selector.current_node_idx_,
"node_label": None,
"n_echos": selector.cross_component_metrics_["n_echos"],
"echo_dof": echo_dof,
"used_metrics": {"kappa"},
"calc_cross_comp_metrics": [
"kappa_elbow_kundu",
Expand Down Expand Up @@ -777,7 +786,7 @@ def calc_kappa_elbow(
outputs["varex_upper_p"],
) = kappa_elbow_kundu(
selector.component_table_,
selector.cross_component_metrics_["n_echos"],
echo_dof,
comps2use=comps2use,
)
selector.cross_component_metrics_["kappa_elbow_kundu"] = outputs["kappa_elbow_kundu"]
Expand Down Expand Up @@ -846,10 +855,20 @@ def calc_rho_elbow(
f"It is {rho_elbow_type} "
)

if (
"echo_dof" in selector.cross_component_metrics_.keys()
and selector.cross_component_metrics_["echo_dof"]
):
echo_dof = selector.cross_component_metrics_["echo_dof"]
else:
# DOF is number of echoes if not otherwise specified
echo_dof = selector.cross_component_metrics_["n_echos"]

outputs = {
"decision_node_idx": selector.current_node_idx_,
"node_label": None,
"n_echos": selector.cross_component_metrics_["n_echos"],
"echo_dof": echo_dof,
"calc_cross_comp_metrics": [
elbow_name,
"rho_allcomps_elbow",
Expand Down Expand Up @@ -904,7 +923,7 @@ def calc_rho_elbow(
outputs["elbow_f05"],
) = rho_elbow_kundu_liberal(
selector.component_table_,
selector.cross_component_metrics_["n_echos"],
echo_dof,
rho_elbow_type=rho_elbow_type,
comps2use=comps2use,
subset_comps2use=subset_comps2use,
Expand Down
25 changes: 16 additions & 9 deletions tedana/selection/selection_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,7 +580,7 @@ def getelbow(arr, return_val=False):
return k_min_ind


def kappa_elbow_kundu(component_table, n_echos, comps2use=None):
def kappa_elbow_kundu(component_table, echo_dof, comps2use=None):
"""
Calculate an elbow for kappa.

Expand All @@ -592,8 +592,10 @@ def kappa_elbow_kundu(component_table, n_echos, comps2use=None):
Component metric table. One row for each component, with a column for
each metric. The index should be the component number.
Only the 'kappa' column is used in this function
n_echos : :obj:`int`
The number of echos in the multi-echo data
echo_dof : :obj:`int`
Degree of freedom to use in goodness of fit metrics (fstat).
Typically the number of echos in the multi-echo data
May be a lower value for EPTI acquisitions.
comps2use : :obj:`list[int]`
A list of component indices used to calculate the elbow
default=None which means use all components
Expand Down Expand Up @@ -633,7 +635,7 @@ def kappa_elbow_kundu(component_table, n_echos, comps2use=None):
kappas2use = component_table.loc[comps2use, "kappa"].to_numpy()

# low kappa threshold
_, _, f01 = getfbounds(n_echos)
_, _, f01 = getfbounds(echo_dof)
# get kappa values for components below a significance threshold
kappas_nonsig = kappas2use[kappas2use < f01]

Expand Down Expand Up @@ -670,7 +672,11 @@ def kappa_elbow_kundu(component_table, n_echos, comps2use=None):


def rho_elbow_kundu_liberal(
component_table, n_echos, rho_elbow_type="kundu", comps2use=None, subset_comps2use=-1
component_table,
echo_dof,
rho_elbow_type="kundu",
comps2use=None,
subset_comps2use=-1,
):
"""
Calculate an elbow for rho.
Expand All @@ -684,8 +690,10 @@ def rho_elbow_kundu_liberal(
Component metric table. One row for each component, with a column for
each metric. The index should be the component number.
Only the 'kappa' column is used in this function
n_echos : :obj:`int`
The number of echos in the multi-echo data
echo_dof : :obj:`int`
Degree of freedom to use in goodness of fit metrics (fstat).
Typically the number of echos in the multi-echo data
May be a lower value for EPTI acquisitions.
rho_elbow_type : :obj:`str`
The algorithm used to calculate the rho elbow. Current options are
'kundu' and 'liberal'.
Expand Down Expand Up @@ -753,8 +761,7 @@ def rho_elbow_kundu_liberal(
].tolist()

# One rho elbow threshold set just on the number of echoes
elbow_f05, _, _ = getfbounds(n_echos)

elbow_f05, _, _ = getfbounds(echo_dof)
# One rho elbow threshold set using all componets in comps2use
rhos_comps2use = component_table.loc[comps2use, "rho"].to_numpy()
rho_allcomps_elbow = getelbow(rhos_comps2use, return_val=True)
Expand Down
11 changes: 9 additions & 2 deletions tedana/selection/tedpca.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
F_MAX = 500


def kundu_tedpca(component_table, n_echos, kdaw=10.0, rdaw=1.0, stabilize=False):
def kundu_tedpca(component_table, n_echos, echo_dof=None, kdaw=10.0, rdaw=1.0, stabilize=False):
"""Select PCA components using Kundu's decision tree approach.

Parameters
Expand All @@ -25,6 +25,10 @@ def kundu_tedpca(component_table, n_echos, kdaw=10.0, rdaw=1.0, stabilize=False)
variance explained. Component number should be the index.
n_echos : :obj:`int`
Number of echoes in dataset.
echo_dof : int
Degree of freedom to use in goodness of fit metrics (fstat).
Primarily used for EPTI acquisitions.
If None, number of echoes will be used. Default is None.
kdaw : :obj:`float`, optional
Kappa dimensionality augmentation weight. Must be a non-negative float,
or -1 (a special value). Default is 10.
Expand Down Expand Up @@ -59,8 +63,11 @@ def kundu_tedpca(component_table, n_echos, kdaw=10.0, rdaw=1.0, stabilize=False)
+ 1
]
varex_norm_cum = np.cumsum(component_table["normalized variance explained"])
if echo_dof is None:
fmin, fmid, fmax = getfbounds(n_echos)
else:
fmin, fmid, fmax = getfbounds(echo_dof)
Comment on lines +66 to +69
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if echo_dof is None:
fmin, fmid, fmax = getfbounds(n_echos)
else:
fmin, fmid, fmax = getfbounds(echo_dof)
echo_dof = echo_dof or n_echos
fmin, fmid, fmax = getfbounds(n_echos)


fmin, fmid, fmax = getfbounds(n_echos)
if int(kdaw) == -1:
lim_idx = (
utils.andb([component_table["kappa"] < fmid, component_table["kappa"] > fmin]) == 2
Expand Down
14 changes: 8 additions & 6 deletions tedana/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,26 @@
RepLGR = logging.getLogger("REPORT")


def getfbounds(n_echos):
def getfbounds(echo_dof):
"""
Get F-statistic boundaries based on number of echos.

Parameters
----------
n_echos : :obj:`int`
Number of echoes
echo_dof : :obj:`int`
Degree of freedom to use in goodness of fit metrics (fstat).
Typically the number of echos in the multi-echo data
May be a lower value for EPTI acquisitions.

Returns
-------
fmin, fmid, fmax : :obj:`float`
F-statistic thresholds for alphas of 0.05, 0.025, and 0.01,
respectively.
"""
f05 = stats.f.ppf(q=(1 - 0.05), dfn=1, dfd=(n_echos - 1))
f025 = stats.f.ppf(q=(1 - 0.025), dfn=1, dfd=(n_echos - 1))
f01 = stats.f.ppf(q=(1 - 0.01), dfn=1, dfd=(n_echos - 1))
f05 = stats.f.ppf(q=(1 - 0.05), dfn=1, dfd=(echo_dof - 1))
f025 = stats.f.ppf(q=(1 - 0.025), dfn=1, dfd=(echo_dof - 1))
f01 = stats.f.ppf(q=(1 - 0.01), dfn=1, dfd=(echo_dof - 1))
Comment on lines +31 to +33
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's misleading to call the parameter dof and then subtract 1, unless I'm misunderstanding the math. We should change the function to use the degrees of freedom and pass in n_echos - 1 as needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just change the echo_dof variable to n_independent_sources/n_independent_echos.

return f05, f025, f01


Expand Down
3 changes: 3 additions & 0 deletions tedana/tests/test_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,13 @@ def test_integration_five_echo(skip_integration):
suffix = ".sm.nii.gz"
datalist = [prepend + str(i + 1) + suffix for i in range(5)]
echo_times = [15.4, 29.7, 44.0, 58.3, 72.6]
# also adding echo_dof=4 to make sure all workflow code using echo_dof is executed
tedana_cli.tedana_workflow(
data=datalist,
tes=echo_times,
ica_method="robustica",
n_robust_runs=4,
echo_dof=4,
out_dir=out_dir,
tedpca=0.95,
fittype="curvefit",
Expand Down Expand Up @@ -631,6 +633,7 @@ def test_integration_t2smap(skip_integration):
+ [str(te) for te in echo_times]
+ ["--out-dir", out_dir, "--fittype", "curvefit"]
+ ["--masktype", "dropout", "decay"]
+ ["--n-independent-echos", "4"]
)
t2smap_cli._main(args)

Expand Down
Loading