Skip to content

Commit

Permalink
Fixed typos and clarifications
Browse files Browse the repository at this point in the history
  • Loading branch information
nilseling committed Mar 31, 2022
1 parent 3d998e8 commit 3c4639a
Show file tree
Hide file tree
Showing 14 changed files with 26 additions and 22 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ conda activate imcsegpipe
jupyter-lab
```

This will automatically open a jupyter instance at `http://localhost:8888` in your browser.
This will automatically open a jupyter instance at `http://localhost:8888/lab` in your browser.
From there, you can open the `scripts/imc_preprocessing.ipynb` notebook and start the data pre-processing.

In brief, the main analysis steps include:
Expand Down
2 changes: 1 addition & 1 deletion docs/ilastik.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Ilastik pixel classification

In this setp, we use [Ilastik](https://www.ilastik.org/) to label pixels and train a random-forrest classifier for semantic segmentation.
In this setp, we use [Ilastik](https://www.ilastik.org/) to label pixels and train a random-forest classifier for semantic segmentation.
This means that each pixel will be classified as "nuclear", "cytoplasmic" or "background".
The probability of each pixel belonging to one of these classes will be used for image segmentation (see [cell segmentation](segmentation.md)).

Expand Down
Binary file modified docs/img/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/segmentation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ For a more detailed introduction to IMC as technolgy and common data analysis st

The [steinbock](https://github.com/BodenmillerGroup/steinbock) framework offers a dockerized version of the pipeline and extends the segmentation approach by [deepcell](https://github.com/vanvalenlab/intro-to-deepcell) segmentation.

This site gives detailed explanations of the individual steps of the pipeline ([see below](#overview)) to generate single-cell measurements from raw imag ing data.
This site gives detailed explanations of the individual steps of the pipeline ([see below](#overview)) to generate single-cell measurements from raw imaging data.

## Scope

Expand Down Expand Up @@ -60,7 +60,7 @@ conda activate imcsegpipe
jupyter-lab
```

This will automatically open a jupyter instance at `http://localhost:8888` in your browser.
This will automatically open a jupyter instance at `http://localhost:8888/lab` in your browser.
From there, you can open the `scripts/imc_preprocessing.ipynb` notebook and start the data pre-processing.

## Image data types
Expand Down Expand Up @@ -105,7 +105,7 @@ For downstream analysis in `R`, please refer to the [IMC Data Analysis](https://
## Contributors

**Creator:** Vito Zanotelli [:fontawesome-brands-github:](https://github.com/votti) [:fontawesome-brands-twitter:](https://twitter.com/ZanotelliVRT)
**Contributors:** Jonas Windhager [:fontawesome-brands-github:](https://github.com/jwindhager) [:fontawesome-brands-twitter:](https://twitter.com/JonasWindhager) Nils Eling [:fontawesome-brands-github:](https://github.com/nilseling) [:fontawesome-brands-twitter:](https://twitter.com/NilsEling)
**Contributors:** Jonas Windhager [:fontawesome-brands-github:](https://github.com/jwindhager) [:fontawesome-brands-twitter:](https://twitter.com/JonasWindhager), Nils Eling [:fontawesome-brands-github:](https://github.com/nilseling) [:fontawesome-brands-twitter:](https://twitter.com/NilsEling)
**Maintainer:** Nils Eling

## Citation
Expand Down
2 changes: 1 addition & 1 deletion docs/measurement.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The following steps are part of the pipeline:
- the intensity values are all scaled by a scaling factor corresponding to the bit depth. This scaling factor can be found in the `Image.csv` file in the `Scaling_FullStack` column. For 16-bit unsigned integer images (`uint16`) as we use them here the values are divided by `2**16 - 1 = 65535`.
- The channel identifier `_c1`, `_c2`, `_c3`, ... corresponds to the position in the `..._full.csv` files found in the `analysis/cpout/images` folder.
- The original acquisition description, acquisition frequencies, acquisition name, etc. can be found in the `Image.csv` output file as `Metdata_...` columns.
11. The cell-cell neighbor information detected in step 4 are exported as `.csv` containing an edge list.
11. The cell-cell neighbor information detected in step 4 are exported as `.csv` file containing an edge list.
12. The final output are `.csv` files that contain additional metadata per measured feature. For the cell features the following information is written out: `category` (e.g. Intensity), `image_name` (e.g. FullStack), `object_name`, `feature_name` (e.g. MeanIntensity), `channel` (e.g. 1), `parameters`, `channel_id` (e.g. Ir191) and `data_type` (e.g. float)

## Output
Expand Down
4 changes: 2 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Here `XYZ` indicates the sample name.
The `cpout` folder contains all relevant output files:

* `cpout/images`: contains the hot pixel filtered full stacks for analysis as well as `.csv` files indicating the channel order.
* `cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented object.
* `cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented objects.
* `cpout/probabilities`: contains 3 channel images in 16-bit `.tiff` format representing the downscaled pixel probabilities after Ilastik pixel classification.
* `cpout/cell.csv`: contains features (columns) for each cell (rows).
* `cpout/Experiment.csv`: contains metadata related to the CellProfiler version used.
Expand All @@ -86,7 +86,7 @@ The following folders contain files for Ilastik pixel classification:

## Image data folders

The follwoing folders contain data in different formats for use with other software or [histoCAT](https://bodenmillergroup.github.io/histoCAT/).
The following folders contain data in different formats for use with other software or [histoCAT](https://bodenmillergroup.github.io/histoCAT/).

* `analysis/ometiff`: contains individual folders (one per sample) of which each contains multiple `.ome.tiff` files (one per acquisition).
* `analysis/histocat`: contains individual folders (one per acquisition) of which each contains multiple single-channel `.tiff` files for upload to histoCAT.
8 changes: 4 additions & 4 deletions docs/prepro.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ When going through the [preprocessing script](https://github.com/BodenmillerGrou

## Example data

We provide raw IMC example data at [zenodo.org/record/5949116](https://zenodo.org/record/5949116). This dataset contains 4 `.zip` archives each of which holds one `.mcd` and multiple `.txt` files. The data was acquired as part of the **I**ntegrated i**MMU**noprofiling of large adaptive **CAN**cer patient cohorts (IMMUcan) project [immucan.eu](https://immucan.eu) using the [Hyperion imaging syste](https://www.fluidigm.com/products-services/instruments/hyperion). Data of 4 patients with different cancer types are provided. To download the raw data together with the panel file, sample metadata and a pre-trained Ilastik classifier, please follow the [download script](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/blob/main/scripts/download_examples.ipynb)
We provide raw IMC example data at [zenodo.org/record/5949116](https://zenodo.org/record/5949116). This dataset contains 4 `.zip` archives each of which holds one `.mcd` and multiple `.txt` files. The data was acquired as part of the **I**ntegrated i**MMU**noprofiling of large adaptive **CAN**cer patient cohorts (IMMUcan) project [immucan.eu](https://immucan.eu) using the [Hyperion imaging system](https://www.fluidigm.com/products-services/instruments/hyperion). Data of 4 patients with different cancer types are provided. To download the raw data together with the panel file, sample metadata and a pre-trained Ilastik classifier, please follow the [download script](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/blob/main/scripts/download_examples.ipynb).

## Conversion fom .mcd to .ome.tiff files

In the first step of the pipeline, raw `.mcd` files are converted into `.ome.tiff` files[^fn2].
This serves the purpose to allow vendor independent downstream analysis and visualization of the images.
For in-depth information of the `.ome.tiff` file format see [here](https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome.html).
Each `.mcd` file can contain multiple acquisitions. This means that multiple multi-channel `.ome.tiff` files per `.mcd` file are produced.
The `Fluor` and `Name` of each channel is set.
The `Fluor` and `Name` entries of each channel are set.
Here `Name` contains the actual name of the antibody as defined in the panel file and `Fluor` contains the metal tag of the antibody.
For IMC data, the metal tag is defined as: `(IsotopeShortname)(Mass)`, e.g. Ir191 for Iridium
isotope 191.
Expand Down Expand Up @@ -107,13 +107,13 @@ For downstream analysis and Ilastik pixel classification, the `.ome.tiff` files

**1. Full stack:** The full stack contains all channels specified by the "1" entries in the `full` column of the panel file. This stack will be later used to measure cell-specific expression features of the selected channels.

**2. Ilastik stack:** The Ilastik stack contains all channels specified by the "1" entries in the `ilastik` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background probability masks (see [Ilastik training](ilastik.md)).
**2. Ilastik stack:** The Ilastik stack contains all channels specified by the "1" entries in the `ilastik` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background pixel probabilities (see [Ilastik training](ilastik.md)).

Additional image stacks can be generated by adapting the panel file and specifying the suffix of the file name.

**Hot pixel filtering:** Each pixel intensity is compared against the maximum intensity of the 3x3 neighboring pixels. If the difference is larger than a specified threshold, the pixel intensity is clipped to the maximum intensity in the 3x3 neighborhood. Setting `hpf=None` disables hot pixel filtering in this conversion step.

By default the hot pixel filtered full stack is written out to the `analysis/cpout/images` folder and the Ilastik stack is written out to the `analysis/ilastik` folder.
By default the hot pixel filtered full stack is written out to the `analysis/cpout/images` folder and the hot pixel filtered Ilastik stack is written out to the `analysis/ilastik` folder.

The `analysis/ilastik` folder contains files such as:

Expand Down
4 changes: 2 additions & 2 deletions docs/segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The following steps are part of the pipeline:
4. The nulcear and cytoplasmic channels are summed up to form a single channel indicating the full cell probability.
5. The nuclear probabilities are smoothed using a gaussian filter. This step can be adjusted or removed to increase segmentation success.
6. The `IdentifyPrimaryObjects` module is crucial to correctly identifying nuclei. Use the test mode and enable the "eye" icon next to the module to observe if nuclei are correctly segmented. The advanced settings can be adjusted to improve segmentation.
7. The `MeasureObjectSizeShape` module measures the size of the nuclei and the `FilterObjects` module filters nuclei below a specified thresholds.
7. The `MeasureObjectSizeShape` module measures the size of the nuclei and the `FilterObjects` module filters nuclei below a specified threshold.
8. The `IdentifySecondaryObjects` module expands from the identified nuclei to the border of the full cell probability generated in step 3 or until touching the neighboring cell.
9. The segmentation masks are converted to 16-bit images.
10. The segmentation masks are written out as 16-bit, single-channel `.tiff` images to the `analysis/cpout/masks` folder.
Expand All @@ -28,5 +28,5 @@ The following steps are part of the pipeline:

After image segmentation the following files have been generated:

* `analysis/cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented object
* `analysis/cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented objects.
* `analysis/cpout/probabilities`: contains 3 channel images in 16-bit `.tiff` format representing the downscaled pixel probabilities after Ilastik pixel classification.
2 changes: 1 addition & 1 deletion resources/pipelines/1_prepare_ilastik.cppipe
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ StackImages:[module_num:7|svn_version:'Unknown'|variable_revision_number:2|show_
Image name:ScaledMean
Image name:Ilastik

Resize:[module_num:8|svn_version:'Unknown'|variable_revision_number:4|show_window:False|notes:['Images are upscaled by a factor of 2. This approach facilitates pixel labelling using ilastik. Downscaling is perfomed in the following pipelines.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
Resize:[module_num:8|svn_version:'Unknown'|variable_revision_number:4|show_window:False|notes:['Images are upscaled by a factor of 2. This approach facilitates pixel labelling using ilastik. Downscaling is perfomed in the following pipeline.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
Select the input image:IlastikExp
Name the output image:Ilastik2x
Resizing method:Resize by a fraction or multiple of the original size
Expand Down
2 changes: 1 addition & 1 deletion resources/pipelines/2_segment_ilastik.cppipe
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ IdentifySecondaryObjects:[module_num:12|svn_version:'Unknown'|variable_revision_
# of deviations:2
Thresholding method:Otsu

ConvertObjectsToImage:[module_num:13|svn_version:'Unknown'|variable_revision_number:1|show_window:False|notes:['The downscaled segmentation masks are converted into objects.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
ConvertObjectsToImage:[module_num:13|svn_version:'Unknown'|variable_revision_number:1|show_window:False|notes:['The segmentation masks are converted into images.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
Select the input objects:Cells
Name the output image:CellImage
Select the color format:uint16
Expand Down
2 changes: 1 addition & 1 deletion scripts/download_examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
"version": "3.9.12"
}
},
"nbformat": 4,
Expand Down
8 changes: 5 additions & 3 deletions scripts/imc_preprocessing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@
"* `cellprofiler_output_dir`: all files written out by CellProfiler (default `analysis/cpout`)\n",
"* `histocat_dir`: folders containing single-channel images for histoCAT upload (default `analysis/histocat`)\n",
"\n",
"Within the `cellprofiler_output_dir` three subfolder are created storing the final images:\n",
"Within the `cellprofiler_output_dir` three subfolders are created storing the final images:\n",
"\n",
"* `final_images_dir`: stores the hot pixel filtered multi-channel images containing selected channels (default `analysis/cpout/images`)\n",
"* `final_masks_dir`: stores the final cell segmentation masks (default `analysis/cpout/masks`)\n",
Expand Down Expand Up @@ -179,7 +179,7 @@
"source": [
"## Convert `.mcd` files to `.ome.tiff` files\n",
"\n",
"In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquied panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session. \n",
"In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquired panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session. \n",
"At this stage, only `.zip` files specified by `file_regex` will be processed.\n",
"\n",
"In the following chunk, individual acquisition metadata are written out as `acquisition_metadata.csv` file in the `cellprofiler_output_dir` folder. "
Expand Down Expand Up @@ -288,6 +288,8 @@
"\n",
"**2. Ilastik stack:** The ilastik stack contains all channels specified by the \"1\" entries in the `panel_ilastik_col` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background probability masks (see [Ilastik training](https://bodenmillergroup.github.io/ImcSegmentationPipeline/ilastik.html)).\n",
"\n",
"**Of note:** Both image stacks are now by default hot pixel filtered (see below). To write out the raw image data without filtering set `hpf=None`.\n",
"\n",
"The `create_analysis_stacks` function takes several arguments:\n",
"\n",
"* `acquisition_dir`: specifies the folder containing the `.ome.tiff` files. \n",
Expand Down Expand Up @@ -577,7 +579,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
"version": "3.9.12"
}
},
"nbformat": 4,
Expand Down
6 changes: 4 additions & 2 deletions scripts/imc_preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
# * `cellprofiler_output_dir`: all files written out by CellProfiler (default `analysis/cpout`)
# * `histocat_dir`: folders containing single-channel images for histoCAT upload (default `analysis/histocat`)
#
# Within the `cellprofiler_output_dir` three subfolder are created storing the final images:
# Within the `cellprofiler_output_dir` three subfolders are created storing the final images:
#
# * `final_images_dir`: stores the hot pixel filtered multi-channel images containing selected channels (default `analysis/cpout/images`)
# * `final_masks_dir`: stores the final cell segmentation masks (default `analysis/cpout/masks`)
Expand Down Expand Up @@ -114,7 +114,7 @@
# %% [markdown]
# ## Convert `.mcd` files to `.ome.tiff` files
#
# In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquied panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session.
# In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquired panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session.
# At this stage, only `.zip` files specified by `file_regex` will be processed.
#
# In the following chunk, individual acquisition metadata are written out as `acquisition_metadata.csv` file in the `cellprofiler_output_dir` folder.
Expand Down Expand Up @@ -176,6 +176,8 @@
#
# **2. Ilastik stack:** The ilastik stack contains all channels specified by the "1" entries in the `panel_ilastik_col` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background probability masks (see [Ilastik training](https://bodenmillergroup.github.io/ImcSegmentationPipeline/ilastik.html)).
#
# **Of note:** Both image stacks are now by default hot pixel filtered (see below). To write out the raw image data without filtering set `hpf=None`.
#
# The `create_analysis_stacks` function takes several arguments:
#
# * `acquisition_dir`: specifies the folder containing the `.ome.tiff` files.
Expand Down

0 comments on commit 3c4639a

Please sign in to comment.