Fixed typos and clarifications

BodenmillerGroup · Mar 31, 2022 · 3c4639a · 3c4639a
1 parent 3d998e8
commit 3c4639a
Show file tree

Hide file tree

Showing 14 changed files with 26 additions and 22 deletions.
diff --git a/README.md b/README.md
@@ -44,7 +44,7 @@ conda activate imcsegpipe
 jupyter-lab
 ```
 
-This will automatically open a jupyter instance at `http://localhost:8888` in your browser.
+This will automatically open a jupyter instance at `http://localhost:8888/lab` in your browser.
 From there, you can open the `scripts/imc_preprocessing.ipynb` notebook and start the data pre-processing.
 
 In brief, the main analysis steps include:

diff --git a/docs/ilastik.md b/docs/ilastik.md
@@ -1,6 +1,6 @@
 # Ilastik pixel classification
 
-In this setp, we use [Ilastik](https://www.ilastik.org/) to label pixels and train a random-forrest classifier for semantic segmentation. 
+In this setp, we use [Ilastik](https://www.ilastik.org/) to label pixels and train a random-forest classifier for semantic segmentation. 
 This means that each pixel will be classified as "nuclear", "cytoplasmic" or "background".
 The probability of each pixel belonging to one of these classes will be used for image segmentation (see [cell segmentation](segmentation.md)).
 

diff --git a/docs/img/overview.png b/docs/img/overview.png
diff --git a/docs/img/segmentation.png b/docs/img/segmentation.png
diff --git a/docs/index.md b/docs/index.md
@@ -10,7 +10,7 @@ For a more detailed introduction to IMC as technolgy and common data analysis st
 
 The [steinbock](https://github.com/BodenmillerGroup/steinbock) framework offers a dockerized version of the pipeline and extends the segmentation approach by [deepcell](https://github.com/vanvalenlab/intro-to-deepcell) segmentation. 
 
-This site gives detailed explanations of the individual steps of the pipeline ([see below](#overview)) to generate single-cell measurements from raw imag   ing data. 
+This site gives detailed explanations of the individual steps of the pipeline ([see below](#overview)) to generate single-cell measurements from raw imaging data. 
 
 ## Scope
 
@@ -60,7 +60,7 @@ conda activate imcsegpipe
 jupyter-lab
 ```
 
-This will automatically open a jupyter instance at `http://localhost:8888` in your browser.
+This will automatically open a jupyter instance at `http://localhost:8888/lab` in your browser.
 From there, you can open the `scripts/imc_preprocessing.ipynb` notebook and start the data pre-processing.
 
 ## Image data types
@@ -105,7 +105,7 @@ For downstream analysis in `R`, please refer to the [IMC Data Analysis](https://
 ## Contributors
 
 **Creator:** Vito Zanotelli [:fontawesome-brands-github:](https://github.com/votti) [:fontawesome-brands-twitter:](https://twitter.com/ZanotelliVRT)    
-**Contributors:** Jonas Windhager [:fontawesome-brands-github:](https://github.com/jwindhager) [:fontawesome-brands-twitter:](https://twitter.com/JonasWindhager) Nils Eling [:fontawesome-brands-github:](https://github.com/nilseling) [:fontawesome-brands-twitter:](https://twitter.com/NilsEling)  
+**Contributors:** Jonas Windhager [:fontawesome-brands-github:](https://github.com/jwindhager) [:fontawesome-brands-twitter:](https://twitter.com/JonasWindhager), Nils Eling [:fontawesome-brands-github:](https://github.com/nilseling) [:fontawesome-brands-twitter:](https://twitter.com/NilsEling)  
 **Maintainer:** Nils Eling
 
 ## Citation

diff --git a/docs/measurement.md b/docs/measurement.md
@@ -27,7 +27,7 @@ The following steps are part of the pipeline:
     - the intensity values are all scaled by a scaling factor corresponding to the bit depth. This scaling factor can be found in the `Image.csv` file in the `Scaling_FullStack` column. For 16-bit unsigned integer images (`uint16`) as we use them here the values are divided by `2**16 - 1 = 65535`.
     - The channel identifier `_c1`, `_c2`, `_c3`, ... corresponds to the position in the `..._full.csv` files found in the `analysis/cpout/images` folder.
     - The original acquisition description, acquisition frequencies, acquisition name, etc. can be found in the `Image.csv` output file as `Metdata_...` columns.
-11. The cell-cell neighbor information detected in step 4 are exported as `.csv` containing an edge list.
+11. The cell-cell neighbor information detected in step 4 are exported as `.csv` file containing an edge list.
 12. The final output are `.csv` files that contain additional metadata per measured feature. For the cell features the following information is written out: `category` (e.g. Intensity), `image_name` (e.g. FullStack), `object_name`, `feature_name` (e.g. MeanIntensity), `channel` (e.g. 1), `parameters`, `channel_id` (e.g. Ir191) and `data_type` (e.g. float)
 
 ## Output

diff --git a/docs/output.md b/docs/output.md
@@ -59,7 +59,7 @@ Here `XYZ` indicates the sample name.
 The `cpout` folder contains all relevant output files:
 
 * `cpout/images`: contains the hot pixel filtered full stacks for analysis as well as `.csv` files indicating the channel order. 
-* `cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented object. 
+* `cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented objects. 
 * `cpout/probabilities`: contains 3 channel images in 16-bit `.tiff` format representing the downscaled pixel probabilities after Ilastik pixel classification.
 * `cpout/cell.csv`: contains features (columns) for each cell (rows).
 * `cpout/Experiment.csv`: contains metadata related to the CellProfiler version used.
@@ -86,7 +86,7 @@ The following folders contain files for Ilastik pixel classification:
 
 ## Image data folders
 
-The follwoing folders contain data in different formats for use with other software or [histoCAT](https://bodenmillergroup.github.io/histoCAT/).
+The following folders contain data in different formats for use with other software or [histoCAT](https://bodenmillergroup.github.io/histoCAT/).
 
 * `analysis/ometiff`: contains individual folders (one per sample) of which each contains multiple `.ome.tiff` files (one per acquisition).  
 * `analysis/histocat`: contains individual folders (one per acquisition) of which each contains multiple single-channel `.tiff` files for upload to histoCAT.  
diff --git a/docs/prepro.md b/docs/prepro.md
@@ -51,15 +51,15 @@ When going through the [preprocessing script](https://github.com/BodenmillerGrou
 
 ## Example data
 
-We provide raw IMC example data at [zenodo.org/record/5949116](https://zenodo.org/record/5949116). This dataset contains 4 `.zip` archives each of which holds one `.mcd` and multiple `.txt` files. The data was acquired as part of the **I**ntegrated i**MMU**noprofiling of large adaptive **CAN**cer patient cohorts (IMMUcan) project [immucan.eu](https://immucan.eu) using the [Hyperion imaging syste](https://www.fluidigm.com/products-services/instruments/hyperion). Data of 4 patients with different cancer types are provided. To download the raw data together with the panel file, sample metadata and a pre-trained Ilastik classifier, please follow the [download script](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/blob/main/scripts/download_examples.ipynb)
+We provide raw IMC example data at [zenodo.org/record/5949116](https://zenodo.org/record/5949116). This dataset contains 4 `.zip` archives each of which holds one `.mcd` and multiple `.txt` files. The data was acquired as part of the **I**ntegrated i**MMU**noprofiling of large adaptive **CAN**cer patient cohorts (IMMUcan) project [immucan.eu](https://immucan.eu) using the [Hyperion imaging system](https://www.fluidigm.com/products-services/instruments/hyperion). Data of 4 patients with different cancer types are provided. To download the raw data together with the panel file, sample metadata and a pre-trained Ilastik classifier, please follow the [download script](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/blob/main/scripts/download_examples.ipynb).
 
 ## Conversion fom .mcd to .ome.tiff files
 
 In the first step of the pipeline, raw `.mcd` files are converted into `.ome.tiff` files[^fn2].
 This serves the purpose to allow vendor independent downstream analysis and visualization of the images.
 For in-depth information of the `.ome.tiff` file format see [here](https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome.html). 
 Each `.mcd` file can contain multiple acquisitions. This means that multiple multi-channel `.ome.tiff` files per `.mcd` file are produced. 
-The `Fluor` and `Name` of each channel is set.
+The `Fluor` and `Name` entries of each channel are set.
 Here `Name` contains the actual name of the antibody as defined in the panel file and `Fluor` contains the metal tag of the antibody.
 For IMC data, the metal tag is defined as: `(IsotopeShortname)(Mass)`, e.g. Ir191 for Iridium
 isotope 191.
@@ -107,13 +107,13 @@ For downstream analysis and Ilastik pixel classification, the `.ome.tiff` files
 
 **1. Full stack:** The full stack contains all channels specified by the "1" entries in the `full` column of the panel file. This stack will be later used to measure cell-specific expression features of the selected channels.
 
-**2. Ilastik stack:** The Ilastik stack contains all channels specified by the "1" entries in the `ilastik` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background probability masks (see [Ilastik training](ilastik.md)).
+**2. Ilastik stack:** The Ilastik stack contains all channels specified by the "1" entries in the `ilastik` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background pixel probabilities (see [Ilastik training](ilastik.md)).
 
 Additional image stacks can be generated by adapting the panel file and specifying the suffix of the file name. 
 
 **Hot pixel filtering:** Each pixel intensity is compared against the maximum intensity of the 3x3 neighboring pixels. If the difference is larger than a specified threshold, the pixel intensity is clipped to the maximum intensity in the 3x3 neighborhood. Setting `hpf=None` disables hot pixel filtering in this conversion step.
 
-By default the hot pixel filtered full stack is written out to the `analysis/cpout/images` folder and the Ilastik stack is written out to the `analysis/ilastik` folder.
+By default the hot pixel filtered full stack is written out to the `analysis/cpout/images` folder and the hot pixel filtered Ilastik stack is written out to the `analysis/ilastik` folder.
 
 The `analysis/ilastik` folder contains files such as:
 

diff --git a/docs/segmentation.md b/docs/segmentation.md
@@ -18,7 +18,7 @@ The following steps are part of the pipeline:
 4. The nulcear and cytoplasmic channels are summed up to form a single channel indicating the full cell probability.  
 5. The nuclear probabilities are smoothed using a gaussian filter. This step can be adjusted or removed to increase segmentation success. 
 6. The `IdentifyPrimaryObjects` module is crucial to correctly identifying nuclei. Use the test mode and enable the "eye" icon next to the module to observe if nuclei are correctly segmented. The advanced settings can be adjusted to improve segmentation.  
-7. The `MeasureObjectSizeShape` module measures the size of the nuclei and the `FilterObjects` module filters nuclei below a specified thresholds. 
+7. The `MeasureObjectSizeShape` module measures the size of the nuclei and the `FilterObjects` module filters nuclei below a specified threshold. 
 8. The `IdentifySecondaryObjects` module expands from the identified nuclei to the border of the full cell probability generated in step 3 or until touching the neighboring cell. 
 9. The segmentation masks are converted to 16-bit images. 
 10. The segmentation masks are written out as 16-bit, single-channel `.tiff` images to the `analysis/cpout/masks` folder.
@@ -28,5 +28,5 @@ The following steps are part of the pipeline:
 
 After image segmentation the following files have been generated:
 
-* `analysis/cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented object 
+* `analysis/cpout/masks`: contains single-channel segmentation masks in 16-bit `.tiff` format. Segmentation masks are single-channel images that match the input images in size, with non-zero grayscale values indicating the IDs of segmented objects.
 * `analysis/cpout/probabilities`: contains 3 channel images in 16-bit `.tiff` format representing the downscaled pixel probabilities after Ilastik pixel classification.
diff --git a/resources/pipelines/1_prepare_ilastik.cppipe b/resources/pipelines/1_prepare_ilastik.cppipe
@@ -84,7 +84,7 @@ StackImages:[module_num:7|svn_version:'Unknown'|variable_revision_number:2|show_
     Image name:ScaledMean
     Image name:Ilastik
 
-Resize:[module_num:8|svn_version:'Unknown'|variable_revision_number:4|show_window:False|notes:['Images are upscaled by a factor of 2. This approach facilitates pixel labelling using ilastik. Downscaling is perfomed in the following pipelines.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
+Resize:[module_num:8|svn_version:'Unknown'|variable_revision_number:4|show_window:False|notes:['Images are upscaled by a factor of 2. This approach facilitates pixel labelling using ilastik. Downscaling is perfomed in the following pipeline.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
     Select the input image:IlastikExp
     Name the output image:Ilastik2x
     Resizing method:Resize by a fraction or multiple of the original size

diff --git a/resources/pipelines/2_segment_ilastik.cppipe b/resources/pipelines/2_segment_ilastik.cppipe
@@ -215,7 +215,7 @@ IdentifySecondaryObjects:[module_num:12|svn_version:'Unknown'|variable_revision_
     # of deviations:2
     Thresholding method:Otsu
 
-ConvertObjectsToImage:[module_num:13|svn_version:'Unknown'|variable_revision_number:1|show_window:False|notes:['The downscaled segmentation masks are converted into objects.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
+ConvertObjectsToImage:[module_num:13|svn_version:'Unknown'|variable_revision_number:1|show_window:False|notes:['The segmentation masks are converted into images.']|batch_state:array([], dtype=uint8)|enabled:True|wants_pause:False]
     Select the input objects:Cells
     Name the output image:CellImage
     Select the color format:uint16

diff --git a/scripts/download_examples.ipynb b/scripts/download_examples.ipynb
@@ -233,7 +233,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.10"
+   "version": "3.9.12"
   }
  },
  "nbformat": 4,

diff --git a/scripts/imc_preprocessing.ipynb b/scripts/imc_preprocessing.ipynb
@@ -112,7 +112,7 @@
     "* `cellprofiler_output_dir`: all files written out by CellProfiler (default `analysis/cpout`)\n",
     "* `histocat_dir`: folders containing single-channel images for histoCAT upload (default `analysis/histocat`)\n",
     "\n",
-    "Within the `cellprofiler_output_dir` three subfolder are created storing the final images:\n",
+    "Within the `cellprofiler_output_dir` three subfolders are created storing the final images:\n",
     "\n",
     "* `final_images_dir`: stores the hot pixel filtered multi-channel images containing selected channels (default `analysis/cpout/images`)\n",
     "* `final_masks_dir`: stores the final cell segmentation masks (default `analysis/cpout/masks`)\n",
@@ -179,7 +179,7 @@
    "source": [
     "## Convert `.mcd` files to `.ome.tiff` files\n",
     "\n",
-    "In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquied panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session.  \n",
+    "In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquired panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session.  \n",
     "At this stage, only `.zip` files specified by `file_regex` will be processed.\n",
     "\n",
     "In the following chunk, individual acquisition metadata are written out as `acquisition_metadata.csv` file in the `cellprofiler_output_dir` folder. "
@@ -288,6 +288,8 @@
     "\n",
     "**2. Ilastik stack:** The ilastik stack contains all channels specified by the \"1\" entries in the `panel_ilastik_col` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background probability masks (see [Ilastik training](https://bodenmillergroup.github.io/ImcSegmentationPipeline/ilastik.html)).\n",
     "\n",
+    "**Of note:** Both image stacks are now by default hot pixel filtered (see below). To write out the raw image data without filtering set `hpf=None`.\n",
+    "\n",
     "The `create_analysis_stacks` function takes several arguments:\n",
     "\n",
     "* `acquisition_dir`: specifies the folder containing the `.ome.tiff` files.  \n",
@@ -577,7 +579,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.10"
+   "version": "3.9.12"
   }
  },
  "nbformat": 4,

diff --git a/scripts/imc_preprocessing.py b/scripts/imc_preprocessing.py
@@ -71,7 +71,7 @@
 # * `cellprofiler_output_dir`: all files written out by CellProfiler (default `analysis/cpout`)
 # * `histocat_dir`: folders containing single-channel images for histoCAT upload (default `analysis/histocat`)
 #
-# Within the `cellprofiler_output_dir` three subfolder are created storing the final images:
+# Within the `cellprofiler_output_dir` three subfolders are created storing the final images:
 #
 # * `final_images_dir`: stores the hot pixel filtered multi-channel images containing selected channels (default `analysis/cpout/images`)
 # * `final_masks_dir`: stores the final cell segmentation masks (default `analysis/cpout/masks`)
@@ -114,7 +114,7 @@
 # %% [markdown]
 # ## Convert `.mcd` files to `.ome.tiff` files
 #
-# In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquied panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session.  
+# In the first step, the `.zip` archives containing `.mcd` files are converted to folders, which contain `.ome.tiff` files, channel metadata files, panoramas and slide overviews. The `.ome.tiff` files can be read in by commercial and open-source software such as `ImageJ` using the BioFormats importer. The `.csv` files contain the order of the channels as well as the antibody names. The `_pano.png` contain the acquired panoramas; the `_slide.png` contains the slide overview. The `_schema.xml` contains metadata regarding the acquisition session.  
 # At this stage, only `.zip` files specified by `file_regex` will be processed.
 #
 # In the following chunk, individual acquisition metadata are written out as `acquisition_metadata.csv` file in the `cellprofiler_output_dir` folder. 
@@ -176,6 +176,8 @@
 #
 # **2. Ilastik stack:** The ilastik stack contains all channels specified by the "1" entries in the `panel_ilastik_col` column of the panel file. This stack will be used to perform the ilastik training to generate cell, cytoplasm and background probability masks (see [Ilastik training](https://bodenmillergroup.github.io/ImcSegmentationPipeline/ilastik.html)).
 #
+# **Of note:** Both image stacks are now by default hot pixel filtered (see below). To write out the raw image data without filtering set `hpf=None`.
+#
 # The `create_analysis_stacks` function takes several arguments:
 #
 # * `acquisition_dir`: specifies the folder containing the `.ome.tiff` files.