Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression fails if filename contains $ #749

Closed
pwrightkcl opened this issue Sep 4, 2023 · 16 comments
Closed

Compression fails if filename contains $ #749

pwrightkcl opened this issue Sep 4, 2023 · 16 comments

Comments

@pwrightkcl
Copy link

Describe the bug

Using dcm2niix -z y produced a .nii file, not .nii.gz as expected.

To reproduce

Steps to reproduce the behavior:

  1. Run the command dcm2niix -o $output -f my-prefix_%d -z y $input
  2. Output includes: Compress: "/usr/bin/pigz" -b 960 -n -f -6 "/my/path/my_prefixBrain_T2W_TSE$1.nii"
  3. Output directory contains my_prefixBrain_T2W_TSE$1.nii not my_prefixBrain_T2W_TSE$1.nii.gz

Expected behavior

The file should be compressed.

Output log

dcm2niix -o "$output" -f my-prefix_%t_%s_%d -z y "$input"

(File details changed to remove potential identifiers.)

Chris Rorden's dcm2niiX version v1.0.20230411  (JP2:OpenJPEG) (JP-LS:CharLS) GCC8.4.0 x86-64 (64-bit Linux)
Found 22 DICOM file(s)
Image Decompression is new: please validate conversions
Philips Scaling Values RS:RI:SS = 2.43858:0:0.0062341 (see PMC3998685)
Convert 22 DICOM as /my/output/my-prefix_20220202090000_501_Brain_T2W_TSE$1 (256x256x22x1)
pigz: skipping: /my/output/my-prefix_20220202090000_501_Brain_T2W_TSE.nii does not exist
Compress: "/usr/bin/pigz" -b 960 -n -f -6 "/my/output/my-prefix_20220202090000_501_Brain_T2W_TSE$1.nii"
Conversion required 0.262101 seconds (0.180816 for core code).

Version

v1.0.20230411

Troubleshooting

The $1 part of the filename is apparently evaluated as a variable, which is empty, and hence replaced with the empty string. The radiographer who set the series description to "T2W/TSE$1" clearly didn't envisage me batch processing these images in using Linux years later. In my case, this occurred 3 times in 1293231 conversions, which makes this a very edgy edge case. With these numbers I can fix it manually. I'm raising an issue in case the information is useful.

@neurolabusc
Copy link
Collaborator

@pwrightkcl as a temporary fix, you could run -z i instead of -z y to use the internal miniz or zlib compressor rather than the external pigz compression. @satra this is related to issue 398 as well as earlier discussions where dcm2niix's censoring of file naming was made more permissive. Do you think filenaming should be modified to censor the $ character?

@neurolabusc
Copy link
Collaborator

In pre-history, dcm2niix was very restrictive for filenaming characters (similar to BIDS) but a consensus was formed to allow all characters that were legal for file names in all operating systems. On the other hand, David Wheeler's section 7 notes we should be cautious about a wider range of control characters: $*?:[]"<>|(){}&'!\;.

While the discussion is lost (as we moved dcm2niix from my private github to a github organization), the community did decide that dcm2niix should try to preserve as much information as possible, at the risk of outcomes like yours. So I think this is a result of a known tradeoff that the community made in the design of dcm2niix. In my memory @satra made a strong case for not censoring filenames.

@pwrightkcl
Copy link
Author

I don't think the $ should be censored (having ready the previous discussion), but maybe inputs to pigz can be single-quoted or escaped if they contain characters like this.

neurolabusc added a commit that referenced this issue Sep 4, 2023
@neurolabusc
Copy link
Collaborator

@pwrightkcl why don't you try my latest commit to the development branch. It uses single rather than double quotes to wrap filenames. This works on my MacOS computer. I am on the road, but we should make sure this also works for Windows computers that might have different conventions.

@pwrightkcl
Copy link
Author

@neurolabusc I compiled on my Ubuntu 22.04 machine and it successfully compressed the file with $1 in the name. It added 'r4527.34' to the end, though. Is that because I'm using the development version?

./dcm2niix -o $output -f test_%t_%s_%d -z y $input
Chris Rorden's dcm2niiX version v1.0.20230904  (JP2:OpenJPEG) (JP-LS:CharLS) GCC11.4.0 x86-64 (64-bit Linux)
Found 22 DICOM file(s)
Image Decompression is new: please validate conversions
Philips Scaling Values RS:RI:SS = 2.46496:0:0.00581155 (see PMC3998685)
Convert 22 DICOM as /data/tmp/dcm2niix_dollar_test/test_20220202090000_501_Brain_T2W_TSE$1_r4527.34 (256x256x22x1)
Compress: "/usr/local/bin/pigz" -b 960 -n -f -6 '/data/tmp/dcm2niix_dollar_test/test_20220202090000_501_Brain_T2W_TSE$1_r4527.34.nii'
Conversion required 1.628203 seconds (0.452984 for core code).

@neurolabusc
Copy link
Collaborator

@pwrightkcl that sounds like an unintended consequence of handling issue 743. I have very limited access to Philips MR data - is it possible to share a troublesome dataset with my institutional email?

@pwrightkcl
Copy link
Author

@neurolabusc link sent.

@neurolabusc
Copy link
Collaborator

@pwrightkcl can you test the latest commit to the development branch.

@neurolabusc
Copy link
Collaborator

@pwrightkcl can you confirm that the latest commit resolves your issues?

@pwrightkcl
Copy link
Author

Sorry for delay. Yes, @neurolabusc the new commit converts the DICOMs with $ in the series description with compression and does not add the r... suffix to the end.

@pwrightkcl
Copy link
Author

Unfortunately, the change introduces an error with a different dataset that includes single quotes in the series description and therefore in the filename:

sh: 1: Syntax error: Unterminated quoted string
Compress: "/usr/local/bin/pigz" -b 960 -n -f -6 './test_anondate_16_Sag_T2_frFSE_POST_GAD'.nii'

This affected only 7 scans in the 1.2 million I'm processing right now. I wonder if the radiographer is like this Mom:

https://xkcd.com/327/

@neurolabusc
Copy link
Collaborator

@sarta as you were a prime advocate for current permissive file naming scheme. Do you have your thoughts about this? Do we forbid $ as it can be interpreted as a control character, or do we allow it but forbid single quotes, or do we add a lot of extra logic to wrap control characters?

@pwrightkcl
Copy link
Author

I may be wrong, but based on the edits I've done to my own scripts processing these images, I think as long as you properly escape single quotes, then single quoting the rest of the filename will handle any other control characters.

@neurolabusc
Copy link
Collaborator

I agree. However, if we do not censor these characters one fears unintended consequences for subsequent tools and scripts. For example, the BIDS standard strictly limits the character set for entities, as they MUST only consist of letters and/or numbers (other characters, including spaces and underscores, are not allowed. I worry that letting the troublesome $ character through can cause unintended consequences down stream.

@pwrightkcl
Copy link
Author

I agree, but I think it's the responsibility of whoever writes subsequent tools and scripts to handle dodgy inputs. I expect you have more specific examples in mind, though, and have a much broader knowledge than I do about the possibities, so this is just my limited view.

In my use case, I know my clinical source data will have illegal characters and missing essential data. I wouldn't expect any off-the-shelf code to be able to make these BIDS-compliant, which is why I had to write my own.

@neurolabusc
Copy link
Collaborator

@pwrightkcl please test the latest commit. My original change had unintended consequences as for the Windows Command Prompt, only double quotes are interpreted correctly. Therefore, I have reverted to double quotes and the dollar sign becomes a forbidden character (it will be replaced with an underscore).

For future reference, one can replicate this issue into any DICOM by adding a dollar sign in the description and then asking dcm2niix to use the description in the filename:

dcmodify -m '(0008,103E)=de$cription' file.dcm
./dcm2niix -f %d -z y /path/to/file.dcm

yarikoptic added a commit to neurodebian/dcm2niix that referenced this issue Apr 29, 2024
* tag 'v1.0.20240202': (135 commits)
  Update submodules
  Refactor (rordenlab#791)
  gantry tilt tolerance (rordenlab#791)
  GE step description (rordenlab#790)
  PatientOrient -> Patient Position (rordenlab#786)
  Prevent shell expansion (rordenlab#789)
  Replace presumably accidental bitwise AND operations
  Update JasPer API calls for compatibility with newer versions of the library
  Update divest logic, reducing duplication and supporting new mode of operation
  Fix PhaseEncodingDirectionDisplayed for GE
  Update date
  GE Diffusion issue rordenlab#777 minor
  GE Diffusion issue rordenlab#777
  Kludge for issue 775 (rordenlab#775)
  add docstrings
  better python wrapper I/O
  issue 769: Mildly relax the check for bvec components > 1.
  PRs (rordenlab#745; rordenlab#768)
  Edge cases (rordenlab#763, rordenlab#749)
  Code spell
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants