Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mph norm workflow #125

Merged
merged 27 commits into from
Jun 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
972f9b3
framework for channel counts function
ngreenwald Jan 31, 2022
85e6305
testing for first function
ngreenwald Feb 1, 2022
3d89a9e
notebook
ngreenwald Feb 8, 2022
cdd74d4
Merge branch 'main' into mph_norm_workflow
ngreenwald Feb 8, 2022
72d3ec3
updated naming
ngreenwald Feb 9, 2022
4080bd1
first part of full workflow
ngreenwald Feb 22, 2022
12e876c
Merge branch 'main' into mph_norm_workflow
ngreenwald May 26, 2022
75c746c
move to FOV-based normalization scheme
ngreenwald May 27, 2022
ea1160b
update workflow
ngreenwald May 27, 2022
4ac8b66
fix tests
ngreenwald May 27, 2022
6592a8a
add logging
ngreenwald May 28, 2022
9f8726a
standardize notebook naming
ngreenwald May 28, 2022
2c7cb0d
check for missing normalization function
ngreenwald May 28, 2022
1c387ce
pycodestyle
ngreenwald May 28, 2022
45bb0e3
Update README.md
ngreenwald May 28, 2022
2e145fe
style
ngreenwald May 28, 2022
beb8d3e
Merge branch 'mph_norm_workflow' of https://github.com/angelolab/toff…
ngreenwald May 28, 2022
d79adc5
Mph norm workflow curve (#123)
ngreenwald Jun 7, 2022
314514b
merge conflict
ngreenwald Jun 7, 2022
3847322
fixed merges
ngreenwald Jun 7, 2022
19fcd50
code review comments
ngreenwald Jun 10, 2022
9eec96c
add more explanation to notebook on normalization
ngreenwald Jun 10, 2022
20cc284
Merge branch 'main' into mph_norm_workflow
ngreenwald Jun 11, 2022
550f630
typo in environment.yaml
ngreenwald Jun 13, 2022
d1c397f
switch default to 2nd degree polynomial
ngreenwald Jun 13, 2022
c0c4781
merge
ngreenwald Jun 13, 2022
f0f32e0
update default value and testing
ngreenwald Jun 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,10 @@ There are a number of different computational tasks to complete once a MIBI run
- 3a: real time monitoring. The [MIBI monitoring](./templates/3a_monitor_MIBI_run.ipynb) notebook will monitor an ongoing MIBI run, and begin processing the image data as soon as it is generated. This notebook is being continually be updated as we move more of our processing pipeline to happen in real time as the data is generated.
- 3b: post-run monitoring. For each step in the monitoring notebook, we have a dedicated notebook that can perform the same tasks once a run is complete. This includes [the image extraction notebook](./templates/extract_bin_file.ipynb) and the [qc metrics notebook](./templates/3b_generate_qc_metrics.ipynb).

### 4. Processing MIBI data
Once your run has finished, you can begin to process the data to make it ready for analysis. To remove background signal contamination, as well as compensate for channel crosstalk, you can use the [compensation](./templates/4a_compensate_image_data.ipynb) notebook. This will guide you through the Rosetta algorithm, which uses a flow-cytometry style compensation approach to remove spurious signal.

### 4. Processing MIBI Data
Once your run has finished, you can begin to process the data to make it ready for analysis. To remove background signal contamination, as well as compensate for channel crosstalk, you can use the [compensation](./templates/4_compensate_image_data.ipynb) notebook. This will guide you through the Rosetta algorithm, which uses a flow-cytometry style compensation approach to remove spurious signal.

Following compensation, you will want to normalize your images to ensure consistent intensity across the run. This functionality is currently in the works, and we'll have a beta version available to test soon.
Following compensation, you will want to normalize your images to ensure consistent intensity across the run. You can use the [normalization](./templates/4b_normalize_image_data.ipynb) notebook to perform this step.

## Installation
In order to get toffy working, you'll need to first install the repo.
Expand Down
5 changes: 3 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@ dependencies:
- python=3.8
- pip
- pip:
- git+https://github.com/angelolab/mibi-bin-tools.git@02b4549731c2204e727073d6f23d3a91123e69d8
- git+https://github.com/angelolab/ark-analysis.git@master
- git+https://github.com/angelolab/ark-analysis.git@02b4549731c2204e727073d6f23d3a91123e69d8
- git+https://github.com/angelolab/mibi-bin-tools.git@master
- jupyter>=1.0.0,<2
- jupyter_contrib_nbextensions>=0.5.1,<1
- jupyterlab>=3.1.5,<4
- watchdog>=2.1.6,<3
- natsort >= 0.8
- numpy>=1.22,<2
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ mibi-bin-tools @ git+https://github.com/angelolab/mibi-bin-tools.git@master
jupyter>=1.0.0,<2
jupyter_contrib_nbextensions>=0.5.1,<1
jupyterlab>=3.1.5,<4
natsort>=0.8
numpy>=1.22,<2
watchdog>=2.1.6,<3
traitlets==5.2.2.post1
100 changes: 94 additions & 6 deletions templates/1_set_up_toffy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@
"id": "e36293c5-aa89-4029-a3fa-e8ea841bb8b5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the sensitivity curve generation go in 4b, since it's only used there? Is the idea that putting this here will encourage people to run a sweep before data aq?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's because it only needs to happen once per instrument, not separately for each run. So having it here means it won't be present in the notebook each time people are normalizing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be done programmatically here?


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweeps aren't created in their own folder, they're just put into the main /Data folder. So you need to separately identify each FOV from the sweep, which are given the generic names. This was initially what that find_detector_sweeps function was for, but then Erin had a couple sweeps where FOVs were missing, so it gave an error.

We could change it so that it would give a warning when an FOV is missing, rather than an error, and then ask people to list the first FOV and last FOV of their sweep and it would find the rest, but at that point it started to feel like the solution was almost as complicated as the problem. Up to you though, it would be an easy change

"metadata": {},
"source": [
"There are two parts to this notebook. \n",
"There are three parts to this notebook. \n",
"1. The first part creates the necessary folders that toffy is expecting, and only needs to be run the first time you install it on a new CAC. \n",
"2. The second part updates the co-registration parameters between the slide image (optical image) and the stage coordinates. This needs to be run anytime Ionpath changes the co-registration"
"2. The second part updates the co-registration parameters between the slide image (optical image) and the stage coordinates. This needs to be run anytime Ionpath changes the co-registration\n",
"3. The third part generates a tuning curve to correct for shifts in instrument sensitivity, and only needs to be run once per instrument"
]
},
{
Expand All @@ -37,7 +38,8 @@
"import os\n",
"from sklearn.linear_model import LinearRegression\n",
"\n",
"from toffy import tiling_utils"
"from toffy import tiling_utils, normalize\n",
"from ark.utils import io_utils"
]
},
{
Expand All @@ -55,9 +57,12 @@
"metadata": {},
"outputs": [],
"source": [
"folders = ['D:\\\\Extracted_Images', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\run_metrics', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\watcher_logs',\n",
" 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\tiled_run_jsons', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\autolabeled_tma_jsons', \n",
" 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\panel_files']\n",
"folders = ['D:\\\\Extracted_Images', 'D:\\\\Rosetta_Compensated_Images', 'D:\\\\Normalized_Images', \n",
" 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\run_metrics', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\watcher_logs',\n",
" 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\tiled_run_jsons', \n",
" 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\autolabeled_tma_jsons', \n",
" 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\panel_files', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\normalization_curve', \n",
" 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\mph_files']\n",
"\n",
"for folder in folders:\n",
" if not os.path.exists(folder):\n",
Expand Down Expand Up @@ -178,6 +183,89 @@
"source": [
"tiling_utils.save_coreg_params(coreg_params)"
]
},
{
"cell_type": "markdown",
"id": "14c82566-d6f5-4096-a249-92fae371ab39",
"metadata": {},
"source": [
"## 3. Generate sensitivity tuning curve\n",
"Depending on when an FOV was acquired with respect to the last time the detector was tuned, you will see variable levels of antibody signal. These differences in sensitivity can result in differences in marker intensity, when in fact there is no underlying biological difference in the real signal. In order to correct for this, and ensure that samples which have the same expression levels of a given marker record the same intensity, we need to normalize the images. \n",
"\n",
"The normalization process relies on constructing a tuning curve, which accurately tracks the relationship between increasing detector gain and antibody signal. We can use a detector sweep to figure out this relationship. We can then correct each image to ensure that there are consistent levels of antibody signal. "
]
},
{
"cell_type": "markdown",
"id": "b4170c03-1619-479c-99f0-ea03ac13d76b",
"metadata": {},
"source": [
"### Identify detector sweep\n",
"The first step is selecting a detector sweep. The goal is for this sweep to cover the range of values most often seen during image acqusition. Therefore, it's best to pick a sweep where the suggested change in voltage following the sweep was less than 50V."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "525de367-672a-416c-9c1a-1cb20a397cda",
"metadata": {},
"outputs": [],
"source": [
"# pick a name for the sweep, such as the date it was run\n",
"sweep_name = '20220417_pmma'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e661bba8-6657-4f9f-ba8b-ebb9eab874bc",
"metadata": {},
"outputs": [],
"source": [
"# create a new folder with the sweep name\n",
"normalization_dir = 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\normalization_curve'\n",
"sweep_path = os.path.join(normalization_dir, sweep_name)\n",
"os.makedirs(sweep_path)"
]
},
{
"cell_type": "markdown",
"id": "9cab012d-9092-4136-b81b-9c8b3a969e15",
"metadata": {},
"source": [
"Now, copy all of the FOVs from the sweep into the newly created folder, which can be found in *C:\\\\Users\\\\Customer.ION\\\\Documents\\\\normalization_curve*"
]
},
{
"cell_type": "markdown",
"id": "86941169-09a1-43fe-9299-f9f1ca8766b3",
"metadata": {},
"source": [
"### Create tuning curve\n",
"We'll then use these FOVs in order to create the curve"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e7d0aa94-a481-4628-b41a-fa2b74d489d9",
"metadata": {},
"outputs": [],
"source": [
"# define masses to use\n",
"normalize.create_tuning_function(sweep_path=sweep_path)"
]
},
{
"cell_type": "markdown",
"id": "762d6e3b-41d1-4064-9578-2bae599e3f99",
"metadata": {},
"source": [
"Your curve should look like the image below. It's okay if your values are a bit different, but the shape of the curve should be qualitatively the same. The curve will be saved in the *sweep_path* folder you defined above\n",
"<div>\n",
" <img src=\"img/tuning_curve.png\" width=\"500\"/>\n",
"</div>\n"
]
}
],
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"id": "d1db8364",
"metadata": {},
"source": [
"### First, make a folder to hold all of the files related to rosetta processing, and put the full path below"
"### First, make a folder for evaluating rosetta normalization, and put the full path below"
]
},
{
Expand All @@ -54,7 +54,7 @@
"source": [
"### Next, copy over the .bin files for the ~10 FOVs will you use for testing. In addition to the .bin files, make sure to copy over the .JSON files with the same name into this folder. Place them in a folder named *example_bins*.\n",
"\n",
"#### For example, fov-1-scan-1.bin, fov-1-scan-1.json, fov-23-scan-1.bin, fov-23-scan-1.json, etc"
"### For example, fov-1-scan-1.bin, fov-1-scan-1.json, fov-23-scan-1.bin, fov-23-scan-1.json, etc"
]
},
{
Expand Down Expand Up @@ -200,7 +200,7 @@
"id": "bd4f51e0-8bcc-4b5b-b2d3-6bc00140e0ca",
"metadata": {},
"source": [
"### Once you're satisfied that the Rosetta is working appropriately, you can use it to process your entire dataset"
"### Once you're satisfied that the Rosetta is working appropriately, you can use it to process your run. First select the run you want to process, and define the relevant top-level folders"
]
},
{
Expand All @@ -210,13 +210,25 @@
"metadata": {},
"outputs": [],
"source": [
"# Specify necessary folders\n",
"# Put the name of your run here\n",
"run_name = '20220101_my_run'\n",
"\n",
"# This should be a folder of run folders. Each folder within bin_file_dir should contain all of the .bin and .json files for that run\n",
"bin_file_dir = 'path/to/cohort/all_runs'\n",
"# The path to the folder containing raw run data\n",
"bin_file_dir = 'D:\\\\Data'\n",
"\n",
"# This folder is where all of the extracted images will get saved\n",
"extracted_image_dir = 'path/to/cohort/extracted_runs'"
"extracted_image_dir = 'D:\\\\Extracted_Images'\n",
"\n",
"# This folder will hold the post-rosetta images\n",
"rosetta_image_dir = 'D:\\\\Rosetta_Compensated_Images'"
]
},
{
"cell_type": "markdown",
"id": "7f6161fb-3050-44e1-ab9a-7b9c59675d89",
"metadata": {},
"source": [
"### Prior to running compensation, you'll need to extract your data if you haven't already"
]
},
{
Expand All @@ -226,36 +238,27 @@
"metadata": {},
"outputs": [],
"source": [
"# If you only want to extract a subset of your runs, specify their names here; otherwise, leave as None\n",
"runs = None\n",
"if runs is None:\n",
" runs = list_folders(bin_file_dir)\n",
"# set run-specific folders\n",
"run_bin_dir = os.path.join(bin_file_dir, run_name)\n",
"run_extracted_dir = os.path.join(extracted_image_dir, run_name)\n",
"if not os.path.exists(run_extracted_dir):\n",
" os.makedirs(run_extracted_dir)\n",
"\n",
"for run in runs:\n",
" print(\"processing run {}\".format(run))\n",
" current_bin = os.path.join(bin_file_dir, run)\n",
" current_out = os.path.join(extracted_image_dir, run)\n",
" if not os.path.exists(current_out):\n",
" os.makedirs(current_out)\n",
" \n",
" # extract bins and replace gold image\n",
" bin_files.extract_bin_files(current_bin, current_out, panel=panel, intensities=['Au', 'chan_39'])\n",
" rosetta.replace_with_intensity_image(run_dir=current_out, channel='Au')\n",
" rosetta.replace_with_intensity_image(run_dir=current_out, channel='chan_39')\n",
" \n",
" # clean up dirs\n",
" rosetta.remove_sub_dirs(run_dir=current_out, sub_dirs=['intensities', 'intensity_times_width'])"
"# extract bins\n",
"bin_files.extract_bin_files(run_bin_dir, run_extracted_dir, panel=panel, intensities=['Au', 'chan_39'])\n",
"rosetta.replace_with_intensity_image(run_dir=run_extracted_dir, channel='Au')\n",
"rosetta.replace_with_intensity_image(run_dir=run_extracted_dir, channel='chan_39')\n",
"\n",
"# clean up dirs\n",
"rosetta.remove_sub_dirs(run_dir=run_extracted_dir, sub_dirs=['intensities', 'intensity_times_width'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce583c8d-2fff-4a87-9d41-3c02246eb56d",
"cell_type": "markdown",
"id": "c4df6224-9839-43f7-9122-62bca0668fbc",
"metadata": {},
"outputs": [],
"source": [
"# specify path to save rosetta images\n",
"rosetta_image_dir = base_dir + 'rosetta_run_output'"
"### Then, you can compensate the data using rosetta"
]
},
{
Expand All @@ -266,14 +269,12 @@
"outputs": [],
"source": [
"# Perform rosetta on extracted images\n",
"for run in runs:\n",
" print(\"processing run {}\".format(run))\n",
" raw_img_dir = os.path.join(extracted_image_dir, run)\n",
" out_dir = os.path.join(rosetta_image_dir, run)\n",
" if not os.path.exists(out_dir):\n",
" os.makedirs(out_dir)\n",
" rosetta.compensate_image_data(raw_data_dir=raw_img_dir, comp_data_dir=out_dir, \n",
" comp_mat_path=rosetta_mat_path, panel_info=panel, batch_size=1)"
"run_rosetta_dir = os.path.join(rosetta_image_dir, run_name)\n",
"if not os.path.exists(run_rosetta_dir):\n",
" os.makedirs(run_rosetta_dir)\n",
"\n",
"rosetta.compensate_image_data(raw_data_dir=run_extracted_dir, comp_data_dir=run_rosetta_dir, \n",
" comp_mat_path=rosetta_mat_path, panel_info=panel, batch_size=1)"
]
}
],
Expand Down
Loading