Run getelbow only if there are enough components #809

handwerkerd · 2021-09-23T15:40:35Z

Closes #808.

Changes proposed in this pull request:

Don't run getelbow if there are fewer than 2 non-significant kappa components and return nan. It previously wouldn't run if there was 0 components, but would run and give a divide by zero error if there was one component.
Add an error if getelbow is run with 0 or 1 components (previously only gave an error for 0 components)
Update warning and error messages accordingly
Added lgr.info output to say which kappa threshold was used and to output the kappa and rho elbow values (I don't think they're documented anywhere else)

As discussed in #808 it is mathematically problematic to calculate an elbow on only 2 components, but, for the way this is used in the code, this is the most conservative option. A kappa elbow based on the non-significant components can only lower the kappa threshold used for component selection which would mean fewer components would be removed as noise. That makes this choice the least aggressive (most conservative) denoising option.

I'm not sure how often this an elbow is changed based on the elbow for 2-5 non-significant components being lower than the elbow for all components. Therefore I added some info to the logger output which may help us track how often this actually happens.

tsalo · 2021-09-23T17:06:40Z

tedana/selection/tedica.py

+        LGR.info(
+            f"Kappa elbow is {kappa_elbow} and calculated based on {kappas_nonsig.size}"
+            f" components with non-significant kappa fits"
+        )


Suggested change

LGR.info(

f"Kappa elbow is {kappa_elbow} and calculated based on {kappas_nonsig.size}"

f" components with non-significant kappa fits"

)

LGR.info(

f"Kappa elbow is {kappa_elbow} and calculated based on {kappas_nonsig.size} "

"components with non-significant kappa fits"

)

No need for the f-string in the second one. Also, I think that trailing spaces are easier to read that leading ones.

On a related note, is this actually true? We're using the minimum between the two values. While it's hard to imagine a situation where the elbow from the nonsignificant Kappas would be higher than the elbow from all Kappas, there are more straightforward ways to write the elbow selection if we did expect this to always be the case.

nonsignificant refers to a the fit to the kappa model, not necessarily the magnitude of the magnitude of the kappa values. My assumption is that either could be higher, but I don't actually know because we never logged this info... until now!

Given that the degrees of freedom are the same across components, wouldn't nonsignificant kappa values have to be smaller than significant ones?

You're fitting a line with 3-5 data points. The magnitude of the fit could be higher but the residual could be even larger. Therefore, you can have a higher kappa magnitude that isn't significant. I'd have to double-check the code to confirm this, but that's my memory.

I relooked at the code & you're correct. The nonsig threshold is made purely based on the raw kappa value. That's odd on many levels (which is probably why I was mis-remembering it to be more logical)
https://github.com/handwerkerd/tedana/blob/e6f714cadaf8eecaf2e43e879d362e876523c86a/tedana/selection/tedica.py#L261

If the kappa values roughly follow the assumed elbow shape, I agree that you can't have nonsig kappa elbow ever higher than the kappa elbow using all the components. If the distribution of kappa values is weirder, I think it can happen. I've been playing with artificial values to as input to to getelbow and I can't create an immediate demo, so maybe it's not possible in practice?

In all, there are the following scenarios:

There aren't enough nonsig kappa values. This code will tell the user that and use the elbow from all kappas

There are enough nonsig kappa values and the user will see that the nonsig kappa elbow is used.

If there's a questionable case where there are enough nonsig values, but the kappa elbow is lower, then the output would show the full kappa elbow was used, but there isn't a message about too few nonsig kappas.

At minimum that means, this code will let us know what is currently happening.

This all said, this is making me want to increase the nonsig threshold. If the nonsig threshold is nearly always used when it exists, then it should include enough data points to represent an actual elbow. 5? Thoughts?

The kappa values should be (averaged) F-statistics though, rather than parameter estimates, so we evaluate their significance based on a single threshold.

I think that the degrees of freedom probably vary slightly across components, since we're averaging across voxels using component-specific weights, but I think that's something we'd need to add to our calculated metrics and then account for when we call getfbounds to get our F-statistic significance threshold.

Ah, sorry, my response was to your earlier comment. I guess you posted while I was writing my own.

I keep flipping on this. I'm again leaning towards keeping this PR as is and calculating an elbow with just two components. This is wrong, but it's shouldn't change existing results. This part of the code deserves serious rethinking (hopefully after decision tree modularization) but an algorithmic change probably shouldn't be mixed up with this minor bug fix. Perhaps move this discussion to a new issue so we remember to revisit?

I just opened #810 to separate this discussion from the specific change proposed in the PR.

codecov · 2021-09-23T17:20:30Z

Codecov Report

Merging #809 (fddefd0) into main (5399ca5) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #809      +/-   ##
==========================================
+ Coverage   93.27%   93.28%   +0.01%     
==========================================
  Files          27       27              
  Lines        2215     2219       +4     
==========================================
+ Hits         2066     2070       +4     
  Misses        149      149

Impacted Files	Coverage Δ
tedana/selection/_utils.py	`100.00% <100.00%> (ø)`
tedana/selection/tedica.py	`91.82% <100.00%> (+0.21%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5399ca5...fddefd0. Read the comment docs.

Co-authored-by: Taylor Salo <tsalo006@fiu.edu>

…d/tedana into getelbow-only-from-more-comps

handwerkerd · 2021-09-30T19:16:12Z

I just decided to change the code so that an elbow will only be calculated with a minimum of 5 values. This is now a breaking change because it will alter the kappa elbow in some cases, but it will reduce that chance that we get a meaningless elbow based on the lowest 2-4 kappa values. Happy to discuss more & change back, if others prefer.

jbteves

I think this is a good change. If people run into issues then we'll get more data on what's going on. LGMT, thanks @handwerkerd.

handwerkerd · 2021-11-17T14:53:46Z

Based on the discussion at the October Dev call #816, we've decided to keep this bug in place for now.

Currently, the kappa elbow is almost always calculated on just the nonsignificant components, which will always be a more liberal threshold to keep components. This is true even when there are only 2 nonsig components and an elbow calculation is meaningless. There is a similar issue with the rho elbow currently sometimes being too aggressive. This bug should be fixed, but fixing it now will almost definitely mean more components are being incorrectly rejected.

The plan is to iclude this fix inmerge metric modularization #756 so that it's easier to have multiple decision trees and elbow metrics. At that point it should be possible to fix this bug and also have functional and stable decision trees that aren't affected by this bug. For example current minimal tree in #756 already uses a bug-fixed kappa elbow calculation and an added conservative step that says, if kappa>3*rho then keep the component without regard to the rho elbow value.

handwerkerd · 2022-01-04T19:20:31Z

Since we are not going to directly merge this, I am closing this PR.

handwerkerd added 2 commits September 23, 2021 10:49

getelbow for min 2 components

63493f5

Added lgr info wtih elbow information

3e4a4ee

handwerkerd requested review from tsalo and jbteves September 23, 2021 15:41

fixed style issue

78c8dfb

tsalo reviewed Sep 23, 2021

View reviewed changes

handwerkerd and others added 2 commits September 23, 2021 13:48

Remove unnecessary fstring 'f'

e6f714c

Co-authored-by: Taylor Salo <tsalo006@fiu.edu>

Adjusted info message wording

84a7133

handwerkerd mentioned this pull request Sep 24, 2021

Give elbow criteria a closer look #810

Open

tsalo mentioned this pull request Sep 24, 2021

Calculate component-wise kappa/rho degrees of freedom based on weight maps #811

Open

handwerkerd added 2 commits September 30, 2021 14:40

Set nonsig threshold to 5

98fefc6

added a space

6929ba2

handwerkerd added the breaking change WIll make a non-trivial change to outputs label Sep 30, 2021

Merge branch 'getelbow-only-from-more-comps' of github.com:handwerker…

fddefd0

…d/tedana into getelbow-only-from-more-comps

jbteves approved these changes Sep 30, 2021

View reviewed changes

handwerkerd mentioned this pull request Oct 12, 2021

October Developers' call Friday, 10/15 #816

Closed

handwerkerd mentioned this pull request Nov 17, 2021

[REF] Decision Tree Modularization #756

Merged

22 tasks

handwerkerd closed this Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run getelbow only if there are enough components #809

Run getelbow only if there are enough components #809

handwerkerd commented Sep 23, 2021

tsalo Sep 23, 2021

tsalo Sep 23, 2021

handwerkerd Sep 23, 2021

tsalo Sep 23, 2021

handwerkerd Sep 23, 2021

handwerkerd Sep 23, 2021

tsalo Sep 23, 2021

tsalo Sep 23, 2021

handwerkerd Sep 23, 2021

handwerkerd Sep 24, 2021

codecov bot commented Sep 23, 2021 •

edited

Loading

handwerkerd commented Sep 30, 2021

jbteves left a comment

handwerkerd commented Nov 17, 2021

handwerkerd commented Jan 4, 2022

Run getelbow only if there are enough components #809

Run getelbow only if there are enough components #809

Conversation

handwerkerd commented Sep 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 23, 2021 • edited Loading

Codecov Report

handwerkerd commented Sep 30, 2021

jbteves left a comment

Choose a reason for hiding this comment

handwerkerd commented Nov 17, 2021

handwerkerd commented Jan 4, 2022

codecov bot commented Sep 23, 2021 •

edited

Loading