Skip to content

Commit

Permalink
Merge pull request #126 from BodenmillerGroup/develop
Browse files Browse the repository at this point in the history
Allow handling of MCD files with missing channel labels
nilseling authored Mar 8, 2023

Verified

This commit was signed with the committer’s verified signature.
aroralanuk Kunal Arora
2 parents f318a90 + bbc329a commit 2609835
Showing 16 changed files with 396 additions and 271 deletions.
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
max-line-length = 88
extend-ignore = E203
12 changes: 6 additions & 6 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
on:
push:
branches: [main]
on:
push:
branches: [main]
pull_request:
branches: [main]

@@ -10,9 +10,9 @@ jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x
python-version: "3.x"
- run: pip install mkdocs-material
- run: mkdocs gh-deploy --force
2 changes: 2 additions & 0 deletions .isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[settings]
profile=black
43 changes: 43 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
exclude: ^(\.vscode/.*|scripts/.*|mkdocs.yml|docs/.*)$
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-added-large-files
- id: check-case-conflict
- id: check-docstring-first
- id: check-executables-have-shebangs
- id: check-merge-conflict
- id: check-shebang-scripts-are-executable
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
- id: requirements-txt-fixer
- id: trailing-whitespace
- repo: https://github.com/PyCQA/isort
rev: "5.12.0"
hooks:
- id: isort
- repo: https://github.com/PyCQA/autoflake
rev: v2.0.1
hooks:
- id: autoflake
args: [--in-place, --remove-all-unused-imports]
- repo: https://github.com/psf/black
rev: '23.1.0'
hooks:
- id: black
- repo: https://github.com/PyCQA/flake8
rev: "6.0.0"
hooks:
- id: flake8
additional_dependencies: [flake8-typing-imports]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.991
hooks:
- id: mypy
additional_dependencies: [types-requests, types-PyYAML]
ci:
autoupdate_branch: develop
skip: [flake8, mypy]
25 changes: 20 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# Changelog

## [3.6, 08-03-2023]

- allow handling MCD files with missing channel label entries
- updated links to raw data on Zenodo
- switched from `MCDFile.metadata` to `MCDFile.schema_xml` to keep up with the latest version of `readimc`

## [3.5, 07-11-2022]

- exclude hidden files from processing

## [3.4, 02-06-2022]

- removed `tifffile` version pinning

## [3.3, 27-04-2022]

- fixed `tifffile` version

## [3.2]

- sort channels by metal tag when creating the ilastik and full stacks
@@ -20,22 +38,19 @@
- segmentation masks are directly written out to `cpout/masks` in the second pipeline and read in as objects in the last pipeline
- pixel probabilities are downscaled in the second pipeline and directly written into `cpout/probabilites`
- cell segmentation is performed on downscaled pixel probabilities

## [2.3]

- Bugfixes: `1_prepare_ilastik`: Removed special characters from pipeline comments as this caused encoding issues.

## [2.1]

- Bugfixes: `1_prepare_ilastik`: Fix range to 0-1 for mean image, preventing out of range errors

## [2.0]

- Change to imctools v2: Changes the structure of the folder to the new format, changing the naming of the .ome.tiff files
- Change to Cellprofiler v4: Requires the use of the ImcPluginsCP master branch or a release > v.4.1
- Updated documentation
- Adds var_Cells.csv containing metadata for the measurements
- Adds panel to cpout folder



38 changes: 17 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -3,14 +3,11 @@

## Introduction

The pipeline is based on [CellProfiler](http://cellprofiler.org/) (tested v4.2.1) for segmentation and [Ilastik](http://ilastik.org/) (tested v1.3.3post3) for pixel classification.
It is streamlined by using the `imcsegpipe` python package available via this repository as well as custom CellProfiler modules ([ImcPluginsCP](https://github.com/BodenmillerGroup/ImcPluginsCP), release v4.2.1).
The pipeline is based on [CellProfiler](http://cellprofiler.org/) (tested v4.2.1) for segmentation and [Ilastik](http://ilastik.org/) (tested v1.3.3post3) for pixel classification. It is streamlined by using the `imcsegpipe` python package available via this repository as well as custom CellProfiler modules ([ImcPluginsCP](https://github.com/BodenmillerGroup/ImcPluginsCP), release v4.2.1).

This repository showcases the basis of the workflow with step-by-step instructions.
As an alternative and dockerized version of the pipeline, check out [steinbock](https://github.com/BodenmillerGroup/steinbock).
This repository showcases the basis of the workflow with step-by-step instructions. As an alternative and dockerized version of the pipeline, check out [steinbock](https://github.com/BodenmillerGroup/steinbock).

This pipeline was developed in the Bodenmiller laboratory at the University of Zurich ([www.bodenmillerlab.com](https://www.bodenmillerlab.com/)) to segment hundreds of highly multiplexed imaging mass cytometry (IMC) images.
The concepts applied here to IMC data can also be transfered to data generated by other highly multiplexed imaging modalities.
This pipeline was developed in the Bodenmiller laboratory at the University of Zurich ([www.bodenmillerlab.com](https://www.bodenmillerlab.com/)) to segment hundreds of highly multiplexed imaging mass cytometry (IMC) images. The concepts applied here to IMC data can also be transfered to data generated by other highly multiplexed imaging modalities.

For a general overview on IMC as technology and data processing tasks, please refer to [bodenmillergroup.github.io/IMCWorkflow](https://bodenmillergroup.github.io/IMCWorkflow/).

@@ -22,13 +19,13 @@ Before being able to pre-process the data, you will need to setup the environmen

1. [Install conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)

2. Clone the repository:
2. Clone the repository:

```bash
git clone --recursive https://github.com/BodenmillerGroup/ImcSegmentationPipeline.git
```

3. Setup the conda environment:
3. Setup the conda environment:

```bash
cd ImcSegmentationPipeline
@@ -44,15 +41,14 @@ conda activate imcsegpipe
jupyter-lab
```

This will automatically open a jupyter instance at `http://localhost:8888/lab` in your browser.
From there, you can open the `scripts/imc_preprocessing.ipynb` notebook and start the data pre-processing.
This will automatically open a jupyter instance at `http://localhost:8888/lab` in your browser. From there, you can open the `scripts/imc_preprocessing.ipynb` notebook and start the data pre-processing.

In brief, the main analysis steps include:

1. Pre-processing of the raw images to create `.ome.tiffs` and `.tiff` stacks for ilastik training and measurement (python).
2. Ilastik pixel classification based on random crops of the images (CellProfiler, Ilastik).
3. Image segmentation based on the classification probabilities (CellProfiler).
4. Measurement and export of cell-specific features, such as marker expression (CellProfiler).
1. Pre-processing of the raw images to create `.ome.tiffs` and `.tiff` stacks for ilastik training and measurement (python).
2. Ilastik pixel classification based on random crops of the images (CellProfiler, Ilastik).
3. Image segmentation based on the classification probabilities (CellProfiler).
4. Measurement and export of cell-specific features, such as marker expression (CellProfiler).

## Example data

@@ -69,21 +65,22 @@ The slides briefly explain why we chose this approach to image segmentation and
## Changelog

For changes in specific releases, please refer to the [CHANGELOG](CHANGELOG.md).

## License

We [freely share](LICENSE) this pipeline in the hope that it will be useful for others to perform high quality image segmentation and serve as a basis to develop more complicated open source IMC image processing workflows.
In return we would like you to be considerate and give us and others feedback if you find a bug/issue and [raise a GitHub Issue](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/issues) on the affected projects or on this page.
We [freely share](LICENSE) this pipeline in the hope that it will be useful for others to perform high quality image segmentation and serve as a basis to develop more complicated open source IMC image processing workflows. In return we would like you to be considerate and give us and others feedback if you find a bug/issue and [raise a GitHub Issue](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/issues) on the affected projects or on this page.

## Contributing

To contribute to this work, please fork the repository, make changes to it and open a pull request.

## Contributors

**Creator:** Vito Zanotelli
**Contributor:** Jonas Windhager, Nils Eling
**Maintainer:** Nils Eling
**Creator:** Vito Zanotelli

**Contributor:** Jonas Windhager, Nils Eling

**Maintainer:** Nils Eling

## Citation

@@ -100,4 +97,3 @@ If you use this workflow for your research, please cite us:
url = {https://doi.org/10.5281/zenodo.3841961}
}
```

28 changes: 17 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -40,25 +40,31 @@ Furthermore, before running the analysis, you will need to setup a `conda` envir

2. Clone the repository:

```bash
git clone --recursive https://github.com/BodenmillerGroup/ImcSegmentationPipeline.git
```
```
git clone --recursive https://github.com/BodenmillerGroup/ImcSegmentationPipeline.git
```
3. Setup the conda environment:
```bash
cd ImcSegmentationPipeline
conda env create -f environment.yml
```
```
cd ImcSegmentationPipeline
```
```
conda env create -f environment.yml
```
4. Configure CellProfiler to use the plugins by opening the CellProfiler GUI, selecting `Preferences` and setting the `CellProfiler plugins directory` to `path/to/ImcSegmentationPipeline/resources/ImcPluginsCP/plugins` and **restart CellProfiler**. Alternatively you can clone the `ImcPluginsCP` repository individually and set the path correctly in CellProfiler.
5. Activate the environment created in 3. and start a jupyter instance
```bash
conda activate imcsegpipe
jupyter-lab
```
```
conda activate imcsegpipe
```
```
jupyter-lab
```
This will automatically open a jupyter instance at `http://localhost:8888/lab` in your browser.
From there, you can open the `scripts/imc_preprocessing.ipynb` notebook and start the data pre-processing.
4 changes: 2 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -22,11 +22,11 @@ nav:
- Cell segmentation: segmentation.md
- Cell measurement: measurement.md
- Output files: output.md

markdown_extensions:
- footnotes
- attr_list
- md_in_html
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
emoji_generator: !!python/name:materialx.emoji.to_svg
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[build-system]
requires = ["setuptools", "wheel"]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"
224 changes: 121 additions & 103 deletions scripts/download_examples.ipynb

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions scripts/download_examples.py
Original file line number Diff line number Diff line change
@@ -17,23 +17,23 @@
for example_file_name, example_file_url in [
(
"Patient1.zip",
"https://zenodo.org/record/5949116/files/Patient1.zip",
"https://zenodo.org/record/7575859/files/Patient1.zip",
),
(
"Patient2.zip",
"https://zenodo.org/record/5949116/files/Patient2.zip",
"https://zenodo.org/record/7575859/files/Patient2.zip",
),
(
"Patient3.zip",
"https://zenodo.org/record/5949116/files/Patient3.zip",
"https://zenodo.org/record/7575859/files/Patient3.zip",
),
(
"Patient4.zip",
"https://zenodo.org/record/5949116/files/Patient4.zip",
"https://zenodo.org/record/7575859/files/Patient4.zip",
),
(
"panel.csv",
"https://zenodo.org/record/5949116/files/panel.csv",
"https://zenodo.org/record/7575859/files/panel.csv",
)
]:
example_file = raw_folder / example_file_name
@@ -48,7 +48,7 @@
# Sample metadata
sample_metadata = Path("..") / "sample_metadata.xlsx"
if not sample_metadata.exists():
request.urlretrieve("https://zenodo.org/record/5949116/files/sample_metadata.xlsx", sample_metadata)
request.urlretrieve("https://zenodo.org/record/7575859/files/sample_metadata.csv", sample_metadata)

# %%
# !conda list
232 changes: 135 additions & 97 deletions scripts/imc_preprocessing.ipynb

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions scripts/imc_preprocessing.py
Original file line number Diff line number Diff line change
@@ -134,9 +134,11 @@
imcsegpipe.extract_zip_file(zip_file, temp_dir.name)
acquisition_metadatas = []
for raw_dir in raw_dirs + [Path(temp_dir.name) for temp_dir in temp_dirs]:
mcd_files = list(raw_dir.rglob("[!.]*.mcd"))
mcd_files = list(raw_dir.rglob("*.mcd"))
mcd_files=[(i) for i in mcd_files if not i.stem.startswith('.')]
if len(mcd_files) > 0:
txt_files = list(raw_dir.rglob("[!.]*.txt"))
txt_files = list(raw_dir.rglob("*.txt"))
txt_files=[(i) for i in txt_files if not i.stem.startswith('.')]
matched_txt_files = imcsegpipe.match_txt_files(mcd_files, txt_files)
for mcd_file in mcd_files:
acquisition_metadata = imcsegpipe.extract_mcd_file(
8 changes: 2 additions & 6 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -4,11 +4,11 @@ version = 1.0.0

[options]
zip_safe = True
install_requires =
install_requires =
imageio
numpy
pandas
readimc
readimc>=0.6.2
scipy
tifffile
xtiff>=0.7.8
@@ -19,7 +19,3 @@ packages = find:

[options.packages.find]
where = src

[flake8]
max-line-length = 88
extend-ignore = E203
4 changes: 0 additions & 4 deletions setup.py

This file was deleted.

24 changes: 17 additions & 7 deletions src/imcsegpipe/_imcsegpipe.py
Original file line number Diff line number Diff line change
@@ -28,7 +28,7 @@ def match_txt_files(
mcd_files: Sequence[Union[str, PathLike]], txt_files: Sequence[Union[str, PathLike]]
) -> Dict[Union[str, PathLike], List[Path]]:
unmatched_txt_files = list(txt_files)
matched_txt_files: Dict[Union[str, PathLike], List[Union[str, PathLike]]] = {}
matched_txt_files: Dict[Union[str, PathLike], List[Path]] = {}
for mcd_file in sorted(mcd_files, key=lambda x: Path(x).stem, reverse=True):
matched_txt_files[mcd_file] = []
i = 0
@@ -80,7 +80,7 @@ def extract_mcd_file(
acquisition_is_valid = _extract_acquisition(
f_mcd, acquisition, acquisition_img_file, acquisition_channels_file
)
if not acquisition_is_valid:
if not acquisition_is_valid and txt_files is not None:
acquisition_txt_files = [
txt_file
for txt_file in txt_files
@@ -173,10 +173,13 @@ def export_to_histocat(
histocat_img_dir.mkdir(exist_ok=True)
for channel_index, row in acquisition_channels.iterrows():
acquisition_channel_img: np.ndarray = acquisition_img[channel_index]
channel_label = re.sub("[^a-zA-Z0-9()]", "-", row["channel_label"])
channel_name = row["channel_name"]
channel_label = row["channel_label"]
if not pd.isnull(channel_label) and not channel_label:
channel_label = re.sub("[^a-zA-Z0-9()]", "-", channel_label)
tifffile.imwrite(
histocat_img_dir / f"{channel_label}_{channel_name}.tiff",
histocat_img_dir
/ f"{channel_label or channel_name}_{channel_name}.tiff",
data=acquisition_channel_img,
imagej=True,
)
@@ -197,7 +200,7 @@ def export_to_histocat(
def _extract_schema(mcd_file_handle: MCDFile, schema_xml_file: Path) -> bool:
try:
with schema_xml_file.open("w") as f:
f.write(mcd_file_handle.metadata)
f.write(mcd_file_handle.schema_xml)
return True
except Exception as e:
logging.error(
@@ -218,6 +221,7 @@ def _extract_slide(
logging.error(
f"Error reading slide {slide.id} from file {mcd_file_handle.path.name}: {e}"
)
return False


def _extract_panorama(
@@ -292,13 +296,19 @@ def _write_acquisition_image(
acquisition_img_file: Path,
acquisition_channels_file: Path,
) -> None:
channel_labels_or_names = [
channel_label or channel_name
for channel_name, channel_label in zip(
acquisition.channel_names, acquisition.channel_labels
)
]
xtiff.to_tiff(
acquisition_img,
acquisition_img_file,
ome_xml_fun=get_acquisition_ome_xml,
channel_names=acquisition.channel_labels,
channel_names=channel_labels_or_names,
channel_fluors=acquisition.channel_names,
xml_metadata=mcd_file_handle.metadata.replace("\r\n", ""),
xml_metadata=mcd_file_handle.schema_xml.replace("\r\n", ""),
)
pd.DataFrame(
data={

0 comments on commit 2609835

Please sign in to comment.