angelolab · ngreenwald · Jun 15, 2022 · Jan 31, 2022 · Feb 1, 2022 · Feb 8, 2022
diff --git a/README.md b/README.md
@@ -35,11 +35,10 @@ There are a number of different computational tasks to complete once a MIBI run
 - 3a: real time monitoring. The [MIBI monitoring](./templates/3a_monitor_MIBI_run.ipynb) notebook will monitor an ongoing MIBI run, and begin processing the image data as soon as it is generated. This notebook is being continually be updated as we move more of our processing pipeline to happen in real time as the data is generated.
 - 3b: post-run monitoring. For each step in the monitoring notebook, we have a dedicated notebook that can perform the same tasks once a run is complete. This includes [the image extraction notebook](./templates/extract_bin_file.ipynb) and the [qc metrics notebook](./templates/3b_generate_qc_metrics.ipynb). 
 
+### 4. Processing MIBI data
+Once your run has finished, you can begin to process the data to make it ready for analysis. To remove background signal contamination, as well as compensate for channel crosstalk, you can use the [compensation](./templates/4a_compensate_image_data.ipynb) notebook. This will guide you through the Rosetta algorithm, which uses a flow-cytometry style compensation approach to remove spurious signal. 
 
-### 4. Processing MIBI Data
-Once your run has finished, you can begin to process the data to make it ready for analysis. To remove background signal contamination, as well as compensate for channel crosstalk, you can use the [compensation](./templates/4_compensate_image_data.ipynb) notebook. This will guide you through the Rosetta algorithm, which uses a flow-cytometry style compensation approach to remove spurious signal. 
-
-Following compensation, you will want to normalize your images to ensure consistent intensity across the run. This functionality is currently in the works, and we'll have a beta version available to test soon. 
+Following compensation, you will want to normalize your images to ensure consistent intensity across the run. You can use the [normalization](./templates/4b_normalize_image_data.ipynb) notebook to perform this step. 
 
 ## Installation
 In order to get toffy working, you'll need to first install the repo. 

diff --git a/environment.yml b/environment.yml
@@ -3,10 +3,11 @@ dependencies:
   - python=3.8
   - pip
   - pip:
-    - git+https://github.com/angelolab/mibi-bin-tools.git@02b4549731c2204e727073d6f23d3a91123e69d8
-    - git+https://github.com/angelolab/ark-analysis.git@master
+    - git+https://github.com/angelolab/ark-analysis.git@02b4549731c2204e727073d6f23d3a91123e69d8
+    - git+https://github.com/angelolab/mibi-bin-tools.git@master
     - jupyter>=1.0.0,<2
     - jupyter_contrib_nbextensions>=0.5.1,<1
     - jupyterlab>=3.1.5,<4
     - watchdog>=2.1.6,<3
+    - natsort >= 0.8
     - numpy>=1.22,<2
diff --git a/requirements.txt b/requirements.txt
@@ -3,6 +3,7 @@ mibi-bin-tools @ git+https://github.com/angelolab/mibi-bin-tools.git@master
 jupyter>=1.0.0,<2
 jupyter_contrib_nbextensions>=0.5.1,<1
 jupyterlab>=3.1.5,<4
+natsort>=0.8
 numpy>=1.22,<2
 watchdog>=2.1.6,<3
 traitlets==5.2.2.post1
diff --git a/templates/1_set_up_toffy.ipynb b/templates/1_set_up_toffy.ipynb
@@ -15,9 +15,10 @@
    "id": "e36293c5-aa89-4029-a3fa-e8ea841bb8b5",
    "metadata": {},
    "source": [
-    "There are two parts to this notebook. \n",
+    "There are three parts to this notebook. \n",
     "1. The first part creates the necessary folders that toffy is expecting, and only needs to be run the first time you install it on a new CAC. \n",
-    "2. The second part updates the co-registration parameters between the slide image (optical image) and the stage coordinates. This needs to be run anytime Ionpath changes the co-registration"
+    "2. The second part updates the co-registration parameters between the slide image (optical image) and the stage coordinates. This needs to be run anytime Ionpath changes the co-registration\n",
+    "3. The third part generates a tuning curve to correct for shifts in instrument sensitivity, and only needs to be run once per instrument"
    ]
   },
   {
@@ -37,7 +38,8 @@
     "import os\n",
     "from sklearn.linear_model import LinearRegression\n",
     "\n",
-    "from toffy import tiling_utils"
+    "from toffy import tiling_utils, normalize\n",
+    "from ark.utils import io_utils"
    ]
   },
   {
@@ -55,9 +57,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "folders = ['D:\\\\Extracted_Images', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\run_metrics', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\watcher_logs',\n",
-    "           'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\tiled_run_jsons', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\autolabeled_tma_jsons', \n",
-    "          'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\panel_files']\n",
+    "folders = ['D:\\\\Extracted_Images', 'D:\\\\Rosetta_Compensated_Images', 'D:\\\\Normalized_Images', \n",
+    "           'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\run_metrics', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\watcher_logs',\n",
+    "           'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\tiled_run_jsons', \n",
+    "           'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\autolabeled_tma_jsons', \n",
+    "           'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\panel_files', 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\normalization_curve', \n",
+    "           'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\mph_files']\n",
     "\n",
     "for folder in folders:\n",
     "    if not os.path.exists(folder):\n",
@@ -178,6 +183,89 @@
    "source": [
     "tiling_utils.save_coreg_params(coreg_params)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14c82566-d6f5-4096-a249-92fae371ab39",
+   "metadata": {},
+   "source": [
+    "## 3. Generate sensitivity tuning curve\n",
+    "Depending on when an FOV was acquired with respect to the last time the detector was tuned, you will see variable levels of antibody signal. These differences in sensitivity can result in differences in marker intensity, when in fact there is no underlying biological difference in the real signal. In order to correct for this, and ensure that samples which have the same expression levels of a given marker record the same intensity, we need to normalize the images. \n",
+    "\n",
+    "The normalization process relies on constructing a tuning curve, which accurately tracks the relationship between increasing detector gain and antibody signal. We can use a detector sweep to figure out this relationship. We can then correct each image to ensure that there are consistent levels of antibody signal. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4170c03-1619-479c-99f0-ea03ac13d76b",
+   "metadata": {},
+   "source": [
+    "### Identify detector sweep\n",
+    "The first step is selecting a detector sweep. The goal is for this sweep to cover the range of values most often seen during image acqusition. Therefore, it's best to pick a sweep where the suggested change in voltage following the sweep was less than 50V."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "525de367-672a-416c-9c1a-1cb20a397cda",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# pick a name for the sweep, such as the date it was run\n",
+    "sweep_name = '20220417_pmma'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e661bba8-6657-4f9f-ba8b-ebb9eab874bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a new folder with the sweep name\n",
+    "normalization_dir = 'C:\\\\Users\\\\Customer.ION\\\\Documents\\\\normalization_curve'\n",
+    "sweep_path = os.path.join(normalization_dir, sweep_name)\n",
+    "os.makedirs(sweep_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cab012d-9092-4136-b81b-9c8b3a969e15",
+   "metadata": {},
+   "source": [
+    "Now, copy all of the FOVs from the sweep into the newly created folder, which can be found in *C:\\\\Users\\\\Customer.ION\\\\Documents\\\\normalization_curve*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86941169-09a1-43fe-9299-f9f1ca8766b3",
+   "metadata": {},
+   "source": [
+    "### Create tuning curve\n",
+    "We'll then use these FOVs in order to create the curve"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e7d0aa94-a481-4628-b41a-fa2b74d489d9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# define masses to use\n",
+    "normalize.create_tuning_function(sweep_path=sweep_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "762d6e3b-41d1-4064-9578-2bae599e3f99",
+   "metadata": {},
+   "source": [
+    "Your curve should look like the image below. It's okay if your values are a bit different, but the shape of the curve should be qualitatively the same. The curve will be saved in the *sweep_path* folder you defined above\n",
+    "<div>\n",
+    "    <img src=\"img/tuning_curve.png\" width=\"500\"/>\n",
+    "</div>\n"
+   ]
   }
  ],
  "metadata": {

diff --git a/templates/4_compensate_image_data.ipynb → templates/4a_compensate_image_data.ipynb b/templates/4_compensate_image_data.ipynb → templates/4a_compensate_image_data.ipynb
@@ -34,7 +34,7 @@
    "id": "d1db8364",
    "metadata": {},
    "source": [
-    "### First, make a folder to hold all of the files related to rosetta processing, and put the full path below"
+    "### First, make a folder for evaluating rosetta normalization, and put the full path below"
    ]
   },
   {
@@ -54,7 +54,7 @@
    "source": [
     "### Next, copy over the .bin files for the ~10 FOVs will you use for testing. In addition to the .bin files, make sure to copy over the .JSON files with the same name into this folder. Place them in a folder named *example_bins*.\n",
     "\n",
-    "#### For example, fov-1-scan-1.bin, fov-1-scan-1.json, fov-23-scan-1.bin, fov-23-scan-1.json, etc"
+    "### For example, fov-1-scan-1.bin, fov-1-scan-1.json, fov-23-scan-1.bin, fov-23-scan-1.json, etc"
    ]
   },
   {
@@ -200,7 +200,7 @@
    "id": "bd4f51e0-8bcc-4b5b-b2d3-6bc00140e0ca",
    "metadata": {},
    "source": [
-    "### Once you're satisfied that the Rosetta is working appropriately, you can use it to process your entire dataset"
+    "### Once you're satisfied that the Rosetta is working appropriately, you can use it to process your run. First select the run you want to process, and define the relevant top-level folders"
    ]
   },
   {
@@ -210,13 +210,25 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Specify necessary folders\n",
+    "# Put the name of your run here\n",
+    "run_name = '20220101_my_run'\n",
     "\n",
-    "# This should be a folder of run folders. Each folder within bin_file_dir should contain all of the .bin and .json files for that run\n",
-    "bin_file_dir = 'path/to/cohort/all_runs'\n",
+    "# The path to the folder containing raw run data\n",
+    "bin_file_dir = 'D:\\\\Data'\n",
     "\n",
     "# This folder is where all of the extracted images will get saved\n",
-    "extracted_image_dir = 'path/to/cohort/extracted_runs'"
+    "extracted_image_dir = 'D:\\\\Extracted_Images'\n",
+    "\n",
+    "# This folder will hold the post-rosetta images\n",
+    "rosetta_image_dir = 'D:\\\\Rosetta_Compensated_Images'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f6161fb-3050-44e1-ab9a-7b9c59675d89",
+   "metadata": {},
+   "source": [
+    "### Prior to running compensation, you'll need to extract your data if you haven't already"
    ]
   },
   {
@@ -226,36 +238,27 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# If you only want to extract a subset of your runs, specify their names here; otherwise, leave as None\n",
-    "runs = None\n",
-    "if runs is None:\n",
-    "    runs = list_folders(bin_file_dir)\n",
+    "# set run-specific folders\n",
+    "run_bin_dir = os.path.join(bin_file_dir, run_name)\n",
+    "run_extracted_dir = os.path.join(extracted_image_dir, run_name)\n",
+    "if not os.path.exists(run_extracted_dir):\n",
+    "    os.makedirs(run_extracted_dir)\n",
     "\n",
-    "for run in runs:\n",
-    "    print(\"processing run {}\".format(run))\n",
-    "    current_bin = os.path.join(bin_file_dir, run)\n",
-    "    current_out = os.path.join(extracted_image_dir, run)\n",
-    "    if not os.path.exists(current_out):\n",
-    "        os.makedirs(current_out)\n",
-    "        \n",
-    "    # extract bins and replace gold image\n",
-    "    bin_files.extract_bin_files(current_bin, current_out, panel=panel, intensities=['Au', 'chan_39'])\n",
-    "    rosetta.replace_with_intensity_image(run_dir=current_out, channel='Au')\n",
-    "    rosetta.replace_with_intensity_image(run_dir=current_out, channel='chan_39')\n",
-    "    \n",
-    "    # clean up dirs\n",
-    "    rosetta.remove_sub_dirs(run_dir=current_out, sub_dirs=['intensities', 'intensity_times_width'])"
+    "# extract bins\n",
+    "bin_files.extract_bin_files(run_bin_dir, run_extracted_dir, panel=panel, intensities=['Au', 'chan_39'])\n",
+    "rosetta.replace_with_intensity_image(run_dir=run_extracted_dir, channel='Au')\n",
+    "rosetta.replace_with_intensity_image(run_dir=run_extracted_dir, channel='chan_39')\n",
+    "\n",
+    "# clean up dirs\n",
+    "rosetta.remove_sub_dirs(run_dir=run_extracted_dir, sub_dirs=['intensities', 'intensity_times_width'])"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ce583c8d-2fff-4a87-9d41-3c02246eb56d",
+   "cell_type": "markdown",
+   "id": "c4df6224-9839-43f7-9122-62bca0668fbc",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "# specify path to save rosetta images\n",
-    "rosetta_image_dir = base_dir + 'rosetta_run_output'"
+    "### Then, you can compensate the data using rosetta"
    ]
   },
   {
@@ -266,14 +269,12 @@
    "outputs": [],
    "source": [
     "# Perform rosetta on extracted images\n",
-    "for run in runs:\n",
-    "    print(\"processing run {}\".format(run))\n",
-    "    raw_img_dir = os.path.join(extracted_image_dir, run)\n",
-    "    out_dir = os.path.join(rosetta_image_dir, run)\n",
-    "    if not os.path.exists(out_dir):\n",
-    "        os.makedirs(out_dir)\n",
-    "    rosetta.compensate_image_data(raw_data_dir=raw_img_dir, comp_data_dir=out_dir, \n",
-    "                                 comp_mat_path=rosetta_mat_path, panel_info=panel, batch_size=1)"
+    "run_rosetta_dir = os.path.join(rosetta_image_dir, run_name)\n",
+    "if not os.path.exists(run_rosetta_dir):\n",
+    "    os.makedirs(run_rosetta_dir)\n",
+    "\n",
+    "rosetta.compensate_image_data(raw_data_dir=run_extracted_dir, comp_data_dir=run_rosetta_dir, \n",
+    "                             comp_mat_path=rosetta_mat_path, panel_info=panel, batch_size=1)"
    ]
   }
  ],