Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mph plots #101

Merged
merged 111 commits into from
Jul 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
fde8a5c
setup branch and notebook
camisowers Apr 28, 2022
993c1a3
it works
camisowers Apr 29, 2022
6b67313
format and comments
camisowers May 5, 2022
d3b4d0e
regression line plot
camisowers May 5, 2022
ec9b6b3
Merge branch 'main' into MPH_plots
camisowers May 5, 2022
1ac57d8
move notebook functionality to helper functions
camisowers May 7, 2022
4a65409
Merge remote-tracking branch 'origin/MPH_plots' into MPH_plots
camisowers May 7, 2022
4618eeb
pycode style
camisowers May 10, 2022
51538d7
add mass functionality
camisowers May 11, 2022
407bcd9
fix existing csv issue
camisowers May 16, 2022
4ed3a60
add estimated time to mph functions and csvs
camisowers May 16, 2022
35540d6
add time axis to plots
camisowers May 16, 2022
64090d2
stupid character limit
camisowers May 16, 2022
aaa3e4b
add default mass_range
camisowers May 17, 2022
3f3a29f
comments
camisowers May 17, 2022
3725bc3
path validation
camisowers May 17, 2022
f53db8e
change mass_range to mass_start and mass_stop
camisowers May 17, 2022
f92ee51
get total_count for one fov at a time
camisowers May 17, 2022
f62d216
fix background issue
camisowers May 17, 2022
fabe5ed
validate paths issue
camisowers May 18, 2022
8aec085
cumulative total counts list
camisowers May 18, 2022
45fcb63
enumerate
camisowers May 18, 2022
0d0e5a7
compute_mph_metrics work one fov at a time
camisowers May 19, 2022
4f1e0d0
remove return functionality
camisowers May 19, 2022
e161b1e
Merge branch 'MPH_plots' into time_axis
camisowers May 19, 2022
2c8d9c0
typo
camisowers May 19, 2022
6b1d450
fix axis merge cumulative count
camisowers May 19, 2022
6cc4fe1
save figure change
camisowers May 19, 2022
e55d320
save figure change
camisowers May 19, 2022
885ac1f
fix time axis to have standard ticks
camisowers May 20, 2022
fbb9ed4
Merge branch 'MPH_plots' into time_axis
camisowers May 20, 2022
d5569d7
fix time axis to have standard ticks
camisowers May 20, 2022
7ab1267
Merge remote-tracking branch 'origin/time_axis' into time_axis
camisowers May 20, 2022
60ba28a
update images
camisowers May 20, 2022
b5fd28a
Merge pull request #105 from angelolab/time_axis
camisowers May 24, 2022
1dba497
delete and generate new csv for combine_mph_metrics
camisowers May 24, 2022
ca7cc8e
delete and generate new jpg for visualize_mph
camisowers May 24, 2022
47445ab
fix time retrieval to work one fov at a time
camisowers May 24, 2022
2169179
tests for get_estimated_time
camisowers May 24, 2022
fdbb60e
tests for compute_mph_metrics
camisowers May 25, 2022
fbbaf10
raise error for bad json and add test
camisowers May 26, 2022
e09b447
tests for combine_mph_metrics and add os.remove for test csvs
camisowers May 26, 2022
d0f9ef2
remove unecessary plt.show()
camisowers May 26, 2022
1285b3c
add directory tests to combine_mph
camisowers May 26, 2022
1673e40
tests for visualize_mph
camisowers May 26, 2022
9434570
add return functionality & test for combine_mph_metrics
camisowers May 31, 2022
400d12d
condense data retrieval from csv
camisowers May 31, 2022
40f0564
Merge branch 'main' into MPH_plots
camisowers May 31, 2022
ee4da48
csv_dir to save fov csvs to, remove bin_file_path arg for combine_mph
camisowers May 31, 2022
45b50d8
add watcher callbacks
camisowers May 31, 2022
7d01b37
update requirements.txt
camisowers Jun 1, 2022
b430c24
add test prefix
camisowers Jun 2, 2022
3ace025
fix path issues
camisowers Jun 2, 2022
adbd0ff
really fix the path issues this time
camisowers Jun 2, 2022
e1ab03d
fix json test error
camisowers Jun 2, 2022
b43a0a8
really fix json test error
camisowers Jun 2, 2022
07955bc
use TemporaryDirectory instead of NamedTemporaryFile
camisowers Jun 6, 2022
13885ca
fix watcher argument names
camisowers Jun 6, 2022
ca59fca
argument name change to bin_file_dir and comments
camisowers Jun 7, 2022
82f8639
remove excess path validations
camisowers Jun 7, 2022
4e3aa10
tims axis from ms to hours
camisowers Jun 7, 2022
277294b
function for sample mph data
camisowers Jun 7, 2022
19068d4
watcher tests
camisowers Jun 8, 2022
e1dc805
some case name changing
camisowers Jun 9, 2022
a900e20
take in all args
camisowers Jun 9, 2022
c8a1f93
add required args to tests
camisowers Jun 13, 2022
eaf9e69
remove target argument
camisowers Jun 14, 2022
a2f201b
watcher tests are working omg
camisowers Jun 14, 2022
2277278
Merge branch 'main' into MPH_plots
camisowers Jun 14, 2022
0585aef
add mph to WatcherCases
camisowers Jun 14, 2022
bf600a5
Merge remote-tracking branch 'origin/MPH_plots' into MPH_plots
camisowers Jun 14, 2022
19c132e
set defaults for mass specs
camisowers Jun 14, 2022
5af5de8
mph_output_dir --> mph_out_dir
camisowers Jun 14, 2022
408c9ff
out_dir --> img_out_dir
camisowers Jun 14, 2022
275f942
update watcher notebook
camisowers Jun 14, 2022
d269df3
set default regression=False
camisowers Jun 14, 2022
804739a
comments
camisowers Jun 14, 2022
751c591
regression defualt in tests
camisowers Jun 15, 2022
ef2cb06
fix failing tests
camisowers Jun 15, 2022
886c203
fix failing tests 2
camisowers Jun 15, 2022
031926c
update test_watcher
camisowers Jun 16, 2022
1a17736
adam is the goat
camisowers Jun 16, 2022
4860e43
fix double points
camisowers Jun 22, 2022
8a35f00
clean notebook
camisowers Jun 22, 2022
4ff4ca4
Merge branch 'main' into MPH_plots
camisowers Jun 22, 2022
9f9d48d
name change and mph_dir
camisowers Jun 22, 2022
11dba1d
pulse_height.csv -> mph_pulse.csv
camisowers Jun 22, 2022
b9bdae1
img_out_dir -> plot_dir
camisowers Jun 22, 2022
9fa212f
Merge remote-tracking branch 'origin/MPH_plots' into MPH_plots
camisowers Jun 22, 2022
1505791
img_out_dir -> plot_dir 2
camisowers Jun 23, 2022
445a428
csv renaming
camisowers Jun 23, 2022
7f9d141
Merge branch 'main' into MPH_plots
camisowers Jun 27, 2022
8f0a7cb
new data and fov sortiing
camisowers Jun 27, 2022
6857087
time axis accurate
camisowers Jun 28, 2022
016356e
stray pulse_height.csv
camisowers Jun 28, 2022
5614cb6
separate generate_time_ticks() function
camisowers Jun 29, 2022
9d83c0e
test_generate_time_ticks
camisowers Jun 29, 2022
ef54b4a
Merge branch 'main' into MPH_plots
camisowers Jun 29, 2022
28f4395
Merge remote-tracking branch 'origin/MPH_plots' into MPH_plots
camisowers Jun 29, 2022
0292f79
use normalize.compute_run_metrics, changes to mph_pulse_combined.csv
camisowers Jun 29, 2022
3b8d738
edit notebook for new csv name
camisowers Jun 29, 2022
f47ad6e
Merge branch 'main' into MPH_plots
camisowers Jun 29, 2022
e5d99b2
pycodestyle
camisowers Jun 29, 2022
e0ebefb
Merge branch 'main' into MPH_plots
camisowers Jun 30, 2022
f436f7f
Merge remote-tracking branch 'origin/MPH_plots' into MPH_plots
camisowers Jun 30, 2022
23c1ff2
pycodestyle
camisowers Jun 30, 2022
08a2e68
notebook tested on data and comments
camisowers Jul 8, 2022
4d285d8
notebook updated
camisowers Jul 11, 2022
52dbe1e
natsort and readme update
camisowers Jul 12, 2022
6139219
notebook renaming
camisowers Jul 12, 2022
0f9b8c2
broken links fix
camisowers Jul 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,10 @@ The [second notebook](./templates/2_create_tma_mibi_run.ipynb) is for TMAs. This
There are a number of different computational tasks to complete once a MIBI run has finished to ensure everything went smoothly.

- 3a: real time monitoring. The [MIBI monitoring](./templates/3a_monitor_MIBI_run.ipynb) notebook will monitor an ongoing MIBI run, and begin processing the image data as soon as it is generated. This notebook is being continually be updated as we move more of our processing pipeline to happen in real time as the data is generated.
- 3b: post-run monitoring. For each step in the monitoring notebook, we have a dedicated notebook that can perform the same tasks once a run is complete. This includes [the image extraction notebook](./templates/extract_bin_file.ipynb) and the [qc metrics notebook](./templates/3b_generate_qc_metrics.ipynb).
- 3b - 3d: post-run monitoring. For each step in the monitoring notebook, we have a dedicated notebook that can perform the same tasks once a run is complete.
- 3b: [the image extraction notebook](./templates/extract_bin_file.ipynb) will extract images from bin files that have not already been processed
- 3c: [qc metrics notebook](./templates/3c_generate_qc_metrics.ipynb) computes and visualizes the QC metrics for the images
- 3d: [median pulse heights notebook](./templates/3d_compute_median_pulse_height.ipynb) generates plots showing median pulse heights for each FOV, along with estimated run time

### 4. Processing MIBI data
Once your run has finished, you can begin to process the data to make it ready for analysis. To remove background signal contamination, as well as compensate for channel crosstalk, you can use the [compensation](./templates/4a_compensate_image_data.ipynb) notebook. This will guide you through the Rosetta algorithm, which uses a flow-cytometry style compensation approach to remove spurious signal.
Expand Down
26 changes: 20 additions & 6 deletions templates/3a_monitor_MIBI_run.ipynb
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Monitoring an ongoing MIBI run\n",
"\n",
"This notebook can be run alongside an active MIBIScope run. As images are generated, this notebook will automatically pass the data through pre-specified functions, like tiff \n",
"extraction, qc metric computation, etc. Eventually, all of the processing steps in the toffy repo will be incorporated into the Watcher functionality here. For now, this notebook will automatically extract tifs and generate QC plots of your data. \n",
"extraction, qc metric computation, etc. Eventually, all of the processing steps in the toffy repo will be incorporated into the Watcher functionality here. For now, this notebook will automatically extract tifs and generate the QC and MPH plots of your data. \n",
"\n",
"## This notebook is an example: create a copy before running it or you will get merge conflicts!"
]
Expand Down Expand Up @@ -91,7 +98,11 @@
" - `axes_size`\n",
" - `wrap`\n",
" - `dpi`\n",
" - `save_dir`"
" - `save_dir`\n",
" \n",
"The `plot_mph_metrics` run callback will compute the median pulse height data for each \n",
"FoV, and plot the results once the run has completed. Additional arguments are:\n",
"`regression` which when set to True will also plot the linear regression line for the data."
]
},
{
Expand All @@ -104,12 +115,15 @@
"qc_dir = os.path.join('C:\\\\Users\\\\Customer.ION\\\\Documents\\\\run_metrics', run_name)\n",
"\n",
"fov_callback, run_callback = build_callbacks(\n",
" run_callbacks = ['plot_qc_metrics'],\n",
" run_callbacks = ['plot_qc_metrics', 'plot_mph_metrics'],\n",
" fov_callbacks = ['extract_tiffs'],\n",
" tiff_out_dir=extraction_dir,\n",
" qc_out_dir=qc_dir,\n",
" mph_out_dir=qc_dir,\n",
" plot_dir=qc_dir,\n",
" panel=panel,\n",
" intensities=False,\n",
" regression=False,\n",
")"
]
},
Expand All @@ -125,9 +139,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "toffy_env",
"language": "python",
"name": "python3"
"name": "toffy_env"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -139,7 +153,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.8.13"
}
},
"nbformat": 4,
Expand Down
192 changes: 192 additions & 0 deletions templates/3d_compute_median_pulse_height.ipynb

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions toffy/fov_watcher_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,14 @@ def test_watcher(run_cbs, fov_cbs, kwargs, validators, add_blank):

tiff_out_dir = os.path.join(tmpdir, 'cb_0', RUN_DIR_NAME)
qc_out_dir = os.path.join(tmpdir, 'cb_1', RUN_DIR_NAME)
mph_out_dir = os.path.join(tmpdir, 'cb_2', RUN_DIR_NAME)
plot_dir = os.path.join(tmpdir, 'cb_3', RUN_DIR_NAME)

# add directories to kwargs
kwargs['tiff_out_dir'] = tiff_out_dir
kwargs['qc_out_dir'] = qc_out_dir
kwargs['mph_out_dir'] = mph_out_dir
kwargs['plot_dir'] = plot_dir

run_data = os.path.join(tmpdir, 'test_run')
log_out = os.path.join(tmpdir, 'log_output')
Expand Down
188 changes: 188 additions & 0 deletions toffy/mph_comp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
import os
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
from natsort import natsort_keygen

from mibi_bin_tools import bin_files
from ark.utils import io_utils
from toffy.normalize import combine_run_metrics


def get_estimated_time(bin_file_dir, fov):
"""Retrieve run time data for each fov json file
Args:
bin_file_dir (str): path to the FOV bin and json files
fov (str): name of fov to get estimated time for
Returns:
fov_time (int): estimated run time for the given fov
"""

# path validation
io_utils.validate_paths(bin_file_dir)

# get fov json file in bin_file_path
json_file = io_utils.list_files(bin_file_dir, fov+".json")
if len(json_file) == 0:
raise FileNotFoundError(f"The FOV name supplied doesn't have a JSON file: {fov}")

# retrieve estimated time (frame dimensions x pixel dwell time)
with open(os.path.join(bin_file_dir, json_file[0])) as file:
run_metadata = json.load(file)
try:
size = run_metadata.get('frameSize')
time = run_metadata.get('dwellTimeMillis')
estimated_time = int(size**2 * time)
except TypeError:
raise KeyError("The FOV json file is missing one of the necessary keys "
"(frameSize or dwellTimeMillis)")

return estimated_time


def generate_time_ticks(mph_df):
"""Create a time axis for median pulse heights with ticks at approx. 6 hour increments
Args:
mph_df: contains mph date, specifically requires cum_total_count and cum_total_time
columns
Returns:
list of two lists detailing tick locations and tick number labels
"""

# determine number of ticks and what the labels should be based on total run time
sub_df = mph_df[['cum_total_count', 'cum_total_time']]
total_time = sub_df['cum_total_time'].iloc[-1]
tick_num = int(total_time / (6*(3600*1000)))
tick_labels = [i * 6 for i in range(0, tick_num+1)]
time_ticks = [tick*(3600*1000) for tick in tick_labels[1:len(tick_labels)]]

# find count value associated with the time closest to each tick
tick_locations = [0]
for tick in time_ticks:
count_tick = (sub_df.iloc[(sub_df['cum_total_time']
- tick).abs().argsort()[:1]])['cum_total_count']
count_tick = (count_tick.to_string()).split(' ')[4]
tick_locations.append(int(count_tick)/1000000)

return [tick_locations, tick_labels]


def compute_mph_metrics(bin_file_dir, csv_dir, fov, mass=98, mass_start=97.5, mass_stop=98.5):
"""Retrieves total counts, pulse heights, & estimated time for a given FOV
Args:
bin_file_dir (str): path to the FOV bin and json files
csv_dir (str): path to output csv to
fov (string): name of fov bin file without the extension
mass (float): mass for the panel
mass_start (float): beginning of mass integration range
mass_stop (float): end of mass integration range
"""

target = None
panel = pd.DataFrame([{
'Mass': mass,
'Target': target,
'Start': mass_start,
'Stop': mass_stop,
}])

# retrieve the data from bin file and output to individual csv
pulse_height_file = fov + '-mph_pulse.csv'

try:
median = bin_files.get_median_pulse_height(bin_file_dir, fov,
target, panel)
count_dict = bin_files.get_total_counts(bin_file_dir, [fov])
except FileNotFoundError:
raise FileNotFoundError(f"The FOV name supplied doesn't have a JSON file: {fov}")

count = count_dict[fov]
time = get_estimated_time(bin_file_dir, fov)

out_df = pd.DataFrame({
'fov': [fov],
'MPH': [median],
'total_count': [count],
'time': [time]})

# saves individual .csv files to csv_dir
out_df.to_csv(os.path.join(csv_dir, pulse_height_file), index=False)


def combine_mph_metrics(csv_dir, return_data=False):
"""Combines data from individual csvs into one
Args:
csv_dir (str): path where FOV mph data csvs are stored
return_data (bool): whether to return dataframe with mph metrics, default False

Returns:
combined mph data for all FOVs
"""

# path validation checks
io_utils.validate_paths(csv_dir)

# combine individual csv files
combine_run_metrics(csv_dir, 'mph_pulse')

# calculate cumulative sums of total counts and time
combined_df = pd.read_csv(os.path.join(csv_dir, 'mph_pulse_combined.csv'))
combined_df = combined_df.sort_values(by="fov", key=natsort_keygen())
combined_df['cum_total_count'] = combined_df['total_count'].cumsum()
combined_df['cum_total_time'] = combined_df['time'].cumsum()

combined_df.to_csv(os.path.join(csv_dir, 'mph_pulse_combined.csv'), index=False)

# return data
if return_data:
return combined_df


def visualize_mph(mph_df, out_dir, regression: bool = False):
"""Create a scatterplot visualizing median pulse heights by FOV cumulative count
Args:
mph_df (pd.DataFrame): data detailing total counts and pulse heights
out_dir (str): path of directory to save plot to
regression (bool): whether to plot regression line, default is False
"""

# path validation checks
if out_dir is not None:
io_utils.validate_paths(out_dir)

# visualize the median pulse heights
plt.style.use('dark_background')
# plt.title('FOV total counts vs median pulse height')
fig = plt.figure()
ax1 = fig.add_subplot(111)
x = mph_df['cum_total_count']/1000000
y = mph_df['MPH']
ax1.set_xlabel('FOV cumulative count (in millions)')
ax1.set_ylabel('median pulse height')
ax1.scatter(x, y)
ax2 = ax1.twiny()
ax2.set_xlabel('estimated time (hours)')

# create time axis
new_ticks = generate_time_ticks(mph_df)
tick_locations = new_ticks[0]
tick_labels = new_ticks[1]
ax2.set_xlim(ax1.get_xlim())
ax2.set_xticks(tick_locations)
ax2.set_xticklabels(tick_labels)
plt.gcf().set_size_inches(18.5, 10.5)

# plot regression line
if regression:
# plot with regression line
x2 = np.array(mph_df['cum_total_count']/1000000)
y2 = np.array(mph_df['MPH'])
m, b = np.polyfit(x2, y2, 1)
ax1.plot(x2, m * x2 + b)

# save figure
file_path = os.path.join(out_dir, 'fov_vs_mph.jpg')
if os.path.exists(file_path):
os.remove(file_path)
plt.savefig(file_path)
Loading