Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic run index <<>> option #195

Merged
merged 7 commits into from
Aug 18, 2023

Conversation

dzemanov
Copy link

@dzemanov dzemanov commented Aug 9, 2023

Feature request: dynamic run index naming

Hello,

I would like to suggest a small extension to the way how dynamic run indexes are handled. Right now, with option run <<1>>, each filename will get run keyval, even when it doesn’t need it when no more runs are present. It would be amazing if there was an option to automatically increment run index while not to include it in the filename if it is not needed, as that pollutes a bit the filenames.

I suggest a solution which preserves the original behavior for dynamic <<1>> run index, but for <<>> empty run index files are named without run keyval. Only when file with the same name is found, bidscoin adds run-2 index to its filename and retrospectively adds run-1 to the corresponding other file with which it has matched. Next found run gets run-3, run-4 etc.

What do you think about the solution?
I might have missed some errors this could introduce, although I tested the results.

As an alternative approach, I've been thinking about making a script that would run after bidscoin, renaming the files. But that would also mean all references to the files would need to be renamed.

@marcelzwiers
Copy link
Collaborator

marcelzwiers commented Aug 14, 2023

Hi, thanks for your suggestion. I considered this in the past but, without making things much more complicated, I figured you may end up with folders such as:

bids/sub-001/func/                                                                                                                            
|-- sub-001_task-rest_bold.json                                                                                         
|-- sub-001_task-rest.nii.gz                                                                                       
|-- sub-001_task-rest_run-2_bold.json                                                                                       
`-- sub-001_task-rest_run-2_bold.nii.gz                                                                                     

I expected that may be confusing for bids apps, when they encounter a run-2 image, but cannot find a run-1 image (you'd have to rename the run-less image into a run-1 image). I'm not even sure if that is BIDS compliant (haven't tested it with bids-validator). I therefore decided it was better to be always explicit about the run-index, unless you are sure there is always only 1 run (in which case you can just delete <<1>> run-index to have it removed from the filename)

@marcelzwiers
Copy link
Collaborator

I see you wrote code to retrospectively rename the run-1 images, I'll have a look and see if it doesn't complicate things too much

@marcelzwiers marcelzwiers self-requested a review August 14, 2023 10:05
bidscoin/bids.py Outdated
@@ -6,6 +6,7 @@

@author: Marcel Zwiers
"""
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace the obsolete os.path usage with the (much better) pathlib usage (Path is already imported, i.e. this is an unnecesary import)

bidscoin/bids.py Outdated
bidsvalue = run['datasource'].dynamicvalue(bidsvalue, cleanup=True, runtime=runtime)
if bidsvalue:
if cleanup:
bidsvalue = cleanup_value(bidsvalue)
if not bidsvalue:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are a few too many nested if statements now, i.e. the first if bidsvalue: can be discarded and then unindent and this and change it to if bidsvalue: bidsname = .. (i.e. reverse the order of the two original if statements)

bidscoin/bids.py Outdated
old_bidsname = remove_run_keyval(bidsname)
new_bidsname = insert_bidskeyval(bidsname, 'run', '1', False)
scanpath = outfolder.relative_to(bidsses)
for ext in ('.nii.gz', '.nii', '.json', '.tsv', '.tsv.gz', '.bval', '.bvec'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to not hardcode BIDS specifications but instead to use something generic like outfolder.glob(bidsname + '.*') and then determine the extension from the result

@@ -14,6 +14,7 @@
from typing import Union
from pathlib import Path
from bidscoin import bids
from bidscoin.bids import add_run1_keyval
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discarding this import statement and using bids.add_run1_keyval() below is more clean / readable. Idem for other plugins

bidscoin/bids.py Outdated
new_bidsname = insert_bidskeyval(bidsname, 'run', '1', False)
scanpath = outfolder.relative_to(bidsses)
for ext in ('.nii.gz', '.nii', '.json', '.tsv', '.tsv.gz', '.bval', '.bvec'):
if os.path.exists((outfolder / old_bidsname).with_suffix(ext)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pathlib's .is_file() method

bidscoin/bids.py Outdated
scanpath = outfolder.relative_to(bidsses)
for ext in ('.nii.gz', '.nii', '.json', '.tsv', '.tsv.gz', '.bval', '.bvec'):
if os.path.exists((outfolder / old_bidsname).with_suffix(ext)):
os.rename((outfolder / old_bidsname).with_suffix(ext), (outfolder / new_bidsname).with_suffix(ext))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pathlib's .rename() method

bidscoin/bids.py Outdated
@@ -1869,25 +1873,64 @@ def insert_bidskeyval(bidsfile: Union[str, Path], bidskey: str, newvalue: str, v
return newbidsfile


def remove_run_keyval(bidsname: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is not needed and can be achieved by calling insert_keyval(bidsname, 'run', '', False)

@marcelzwiers
Copy link
Collaborator

If you have some subjects with one run and others with multiple runs, you do get inconsistent naming of the first run when using <<>>. It can be desirable, just saying:

bids/sub-001/func/                                                                                                                            
   |-- sub-001_task-rest_bold.json                                                                                       
   `-- sub-001_task-rest_bold.nii.gz                                                                                     
bids/sub-002/func/                                                                                                                            
   |-- sub-002_task-rest_run-1_bold.json                                                                                       
   |-- sub-002_task-rest_run-1_bold.nii.gz                                                                                       
   |-- sub-002_task-rest_run-2_bold.json                                                                                       
   `-- sub-002_task-rest_run-2_bold.nii.gz                                                                                     

@marcelzwiers
Copy link
Collaborator

Your solution looks good to me, I might even make the new <<>> run-index the default. Would you happen to have time to implement my suggestions?

@marcelzwiers marcelzwiers merged commit ed0deef into Donders-Institute:master Aug 18, 2023
@marcelzwiers
Copy link
Collaborator

Perfect, thanks a lot! I like the new <<>> option more than I expected and will make it the new default :-)

marcelzwiers added a commit that referenced this pull request Aug 18, 2023
- Move core dynamic run value code from the plugins to the bids module
- Move `add_run1_keyval()` code to `increment_runindex` (as they should always be executed in conjunction)
Bugfix dcm2niix (add_run1_keyval: bidsname -> newbidsname)
@marcelzwiers
Copy link
Collaborator

I refactored (3c9b896) your PR a bit, i.e. I moved code that is central to bidscoin out of the plugins. But now I messed up the tests a bit... I'm not really familiar with unittest mocking, and I'm struggling to get it all right. It would be really nice if you can fix your tests and help me out :-)

marcelzwiers added a commit that referenced this pull request Aug 21, 2023
marcelzwiers added a commit that referenced this pull request Aug 21, 2023
@marcelzwiers
Copy link
Collaborator

Ok, I rewrote your mock tests because I think that mocking system libraries is fragile and a bad idea

@dzemanov
Copy link
Author

dzemanov commented Aug 21, 2023

Ok, I rewrote your mock tests because I think that mocking system libraries is fragile and a bad idea

Hello, sorry for replying so late.
I have time tomorrow evening to rewrite the tests but if you are content with the ones you wrote that is fine by me :).

Mocking is good for test isolation, when a test fails we know that this concrete tested function failed and not other external factor e.g function that it used. Furthermore, one doesn't need to actually create the resources.

I can see that you want to use code that you fully understand so it is easier to maintain :).

@marcelzwiers
Copy link
Collaborator

marcelzwiers commented Aug 21, 2023

You don't need to rewrite your tests, but if you like you can review the new tests to see if it misses anything (you'll see I need less and easier code to generate some temporary files than you did to mock pathlib.glob).

I think mocking can be suitable for testing code in cases where you don't have control over certain functions (such as functions that read out sensors in an industrial process, or functions that communicate with web servers). Otherwise, I think you should stay away from mocking if you can -- especially from mocking common system libraries. The reason is that mocking is very fragile: you are mocking the functions in the code you like to test, but in fact, mocking can have side effects outside the scope of the code you like to test. In other words, if your mocking works today, it may be broken tomorrow if a dependency of the code you want to test gets updated and also starts using the function that you mocked. For instance, you are mocking pathlib and then one of your dependencies switches from using os.path to the more modern pathlib functions. Perfectly fine and sensible update, except that your test may now fail (while it shouldn't)...

@dzemanov
Copy link
Author

Yeah, you are right that my tests were a bit tightly coupled to the implementation and the new tests look much cleaner and cover everything needed :).

@marcelzwiers
Copy link
Collaborator

Thanks again for your contribution, I like it so much that I made it the new default :-)

@dzemanov
Copy link
Author

I am happy I could contribute a bit :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants