Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recompress compressed DICOM images after redaction #1040

Open
niwilso opened this issue Feb 23, 2023 · 0 comments
Open

Recompress compressed DICOM images after redaction #1040

niwilso opened this issue Feb 23, 2023 · 0 comments
Labels
bug Something isn't working image-anonymization

Comments

@niwilso
Copy link
Collaborator

niwilso commented Feb 23, 2023

Describe the bug
When running redaction on compressed pixel data, the returned pixel data is uncompressed. This is because when adding boxes via DicomImageRedactorEngine._add_redact_box, we use the loaded DICOM instance's .pixel_array values, which is uncompressed, unlike its .PixelData.

We are still able to redact correctly, but we are then unable to save the redacted instance as a .dcm file.

Side note: If an error occurs while trying to write out the pixel data post-redaction, then gdcm may need to be installed.

Whether the pixel data is compressed or not can be checked via the DICOM tag (0028, 2110). If the value is '01', then the pixel data is compressed.

if redacted_instance[0x0028, 0x2110].value == '01':
    compression_method = instance.file_meta.TransferSyntaxUID
    print(f'Pixel data is compressed with Transfer Syntax UID: {compression_method}')

To Reproduce
Steps to reproduce the behavior:

import pydicom
from presidio_image_redactor import DicomImageRedactorEngine

# Redact text PHI
engine = DicomImageRedactorEngine()
instance = pydicom.dcmread(PATH_TO_DICOM_FILE)
redacted_instance = engine.redact(instance)

# Calculate bytes
rows = instance[0x0028, 0x0010].value
columns = instance[0x0028, 0x0011].value
samples_per_pixel = instance[0x0028, 0x0002].value
bits_allocated = instance[0x0028, 0x0100].value
try:
    number_of_frames = instance[0x0028, 0x0008].value
except:
    number_of_frames = 1
expected_num_bytes = rows * columns * number_of_frames * samples_per_pixel * (bits_allocated/8)

print(f"Expected (no compression): {int(expected_num_bytes)}")
print(f"Actual, pre-redaction: {len(instance[0x7fe0, 0x0010].value)}")
print(f"Actual, post-redaction: {len(redacted_instance[0x7fe0, 0x0010].value)}")

Note that native support for compressing is not implemented in pydicom yet. The following line would be ideal but throws an error due to it not being available.

redacted_instance.compress(transfer_syntax_uid=compression_method, encoding_plugin='gdcm')

Expected behavior
With the above, we would ideally have the number of bytes pre- and post-redaction as equal. But when no compression is re-applied to previously compressed pixel data, then the number of bytes for post-redaction would be equal to what is expected with no compression.

If we run redacted_instance.save_as('FILE_NAME_HERE.dcm'), then we get the following error (which we want to avoid):

ValueError: With tag (7fe0, 0010) got exception: (7FE0,0010) Pixel Data has an undefined length indicating that it's compressed, but the data isn't encapsulated as required. See pydicom.encaps.encapsulate() for more information
Traceback (most recent call last):
  File "/anaconda/envs/feasibility-study/lib/python3.8/site-packages/pydicom/tag.py", line 28, in tag_in_exception
    yield
  File "/anaconda/envs/feasibility-study/lib/python3.8/site-packages/pydicom/filewriter.py", line 662, in write_dataset
    write_data_element(fp, dataset.get_item(tag), dataset_encoding)
  File "/anaconda/envs/feasibility-study/lib/python3.8/site-packages/pydicom/filewriter.py", line 579, in write_data_element
    raise ValueError(
ValueError: (7FE0,0010) Pixel Data has an undefined length indicating that it's compressed, but the data isn't encapsulated as required. See pydicom.encaps.encapsulate() for more information

Additional context
Potentially helpful resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working image-anonymization
Projects
None yet
Development

No branches or pull requests

1 participant