-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression fails if filename contains $ #749
Comments
@pwrightkcl as a temporary fix, you could run |
In pre-history, dcm2niix was very restrictive for filenaming characters (similar to BIDS) but a consensus was formed to allow all characters that were legal for file names in all operating systems. On the other hand, David Wheeler's section 7 notes we should be cautious about a wider range of control characters: While the discussion is lost (as we moved dcm2niix from my private github to a github organization), the community did decide that dcm2niix should try to preserve as much information as possible, at the risk of outcomes like yours. So I think this is a result of a known tradeoff that the community made in the design of dcm2niix. In my memory @satra made a strong case for not censoring filenames. |
I don't think the $ should be censored (having ready the previous discussion), but maybe inputs to pigz can be single-quoted or escaped if they contain characters like this. |
@pwrightkcl why don't you try my latest commit to the development branch. It uses single rather than double quotes to wrap filenames. This works on my MacOS computer. I am on the road, but we should make sure this also works for Windows computers that might have different conventions. |
@neurolabusc I compiled on my Ubuntu 22.04 machine and it successfully compressed the file with ./dcm2niix -o $output -f test_%t_%s_%d -z y $input
|
@pwrightkcl that sounds like an unintended consequence of handling issue 743. I have very limited access to Philips MR data - is it possible to share a troublesome dataset with my institutional email? |
@neurolabusc link sent. |
@pwrightkcl can you test the latest commit to the development branch. |
@pwrightkcl can you confirm that the latest commit resolves your issues? |
Sorry for delay. Yes, @neurolabusc the new commit converts the DICOMs with |
Unfortunately, the change introduces an error with a different dataset that includes single quotes in the series description and therefore in the filename:
This affected only 7 scans in the 1.2 million I'm processing right now. I wonder if the radiographer is like this Mom: |
@sarta as you were a prime advocate for current permissive file naming scheme. Do you have your thoughts about this? Do we forbid |
I may be wrong, but based on the edits I've done to my own scripts processing these images, I think as long as you properly escape single quotes, then single quoting the rest of the filename will handle any other control characters. |
I agree. However, if we do not censor these characters one fears unintended consequences for subsequent tools and scripts. For example, the BIDS standard strictly limits the character set for entities, as they MUST only consist of letters and/or numbers (other characters, including spaces and underscores, are not allowed. I worry that letting the troublesome |
I agree, but I think it's the responsibility of whoever writes subsequent tools and scripts to handle dodgy inputs. I expect you have more specific examples in mind, though, and have a much broader knowledge than I do about the possibities, so this is just my limited view. In my use case, I know my clinical source data will have illegal characters and missing essential data. I wouldn't expect any off-the-shelf code to be able to make these BIDS-compliant, which is why I had to write my own. |
@pwrightkcl please test the latest commit. My original change had unintended consequences as for the Windows Command Prompt, only double quotes are interpreted correctly. Therefore, I have reverted to double quotes and the dollar sign becomes a forbidden character (it will be replaced with an underscore). For future reference, one can replicate this issue into any DICOM by adding a dollar sign in the description and then asking dcm2niix to use the description in the filename:
|
* tag 'v1.0.20240202': (135 commits) Update submodules Refactor (rordenlab#791) gantry tilt tolerance (rordenlab#791) GE step description (rordenlab#790) PatientOrient -> Patient Position (rordenlab#786) Prevent shell expansion (rordenlab#789) Replace presumably accidental bitwise AND operations Update JasPer API calls for compatibility with newer versions of the library Update divest logic, reducing duplication and supporting new mode of operation Fix PhaseEncodingDirectionDisplayed for GE Update date GE Diffusion issue rordenlab#777 minor GE Diffusion issue rordenlab#777 Kludge for issue 775 (rordenlab#775) add docstrings better python wrapper I/O issue 769: Mildly relax the check for bvec components > 1. PRs (rordenlab#745; rordenlab#768) Edge cases (rordenlab#763, rordenlab#749) Code spell ...
Describe the bug
Using
dcm2niix -z y
produced a .nii file, not .nii.gz as expected.To reproduce
Steps to reproduce the behavior:
dcm2niix -o $output -f my-prefix_%d -z y $input
Compress: "/usr/bin/pigz" -b 960 -n -f -6 "/my/path/my_prefixBrain_T2W_TSE$1.nii"
my_prefixBrain_T2W_TSE$1.nii
notmy_prefixBrain_T2W_TSE$1.nii.gz
Expected behavior
The file should be compressed.
Output log
(File details changed to remove potential identifiers.)
Version
v1.0.20230411
Troubleshooting
The
$1
part of the filename is apparently evaluated as a variable, which is empty, and hence replaced with the empty string. The radiographer who set the series description to "T2W/TSE$1" clearly didn't envisage me batch processing these images in using Linux years later. In my case, this occurred 3 times in 1293231 conversions, which makes this a very edgy edge case. With these numbers I can fix it manually. I'm raising an issue in case the information is useful.The text was updated successfully, but these errors were encountered: