Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.3.2 #368

Merged
merged 24 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
a3420fd
Update to v0.3.0 (#237)
sjfleming Aug 6, 2023
02dd763
Add Nature Methods citation and update docs
sjfleming Aug 7, 2023
da9fabb
Merge branch 'master' into dev
sjfleming Aug 28, 2023
e358888
Add WDL input to set number of retries. (#247)
kshakir Oct 31, 2023
7a834a4
Move hash computation so that it is recomputed on retry, and now-inva…
alecw Oct 31, 2023
cf71148
Bug fix for WDL using MTX input (#246)
sjfleming Oct 31, 2023
322971d
Memory-efficient posterior generation (#263)
sjfleming Oct 31, 2023
12b4758
Fix posterior and estimator integer overflow bugs on Windows (#259)
sjfleming Oct 31, 2023
be36006
Move from setup.py to pyproject.toml (#240)
sjfleming Oct 31, 2023
084194b
Fix bugs with report generation across platforms (#302)
sjfleming Oct 31, 2023
a4da38a
Merge branch 'master' into dev
sjfleming Oct 31, 2023
97c1d40
Fix major bug in v0.3.1: negative counts (#347)
sjfleming Apr 11, 2024
d0cfc36
Merge branch 'master' into dev
sjfleming Apr 19, 2024
578c214
Tweak readme appearance
sjfleming Apr 19, 2024
80514f0
Add newer Cell Ranger feature types (#339)
jpintar Apr 22, 2024
72e6a3f
Fix test_elbo bug when retrying with more memory from checkpoint (#345)
alecw Apr 22, 2024
a71e183
Retry: prevent ZeroDivisionError if initial test ELBO is the best tes…
alecw Apr 23, 2024
f3c8865
Merge branch 'master' into dev
sjfleming Jun 21, 2024
2c3dbe8
Azurize CellBender to run on ToA (#367)
aawdeh Jun 24, 2024
569b7b1
Updates to benchmarking scripts
sjfleming Jun 24, 2024
31130fd
Update documentation in anticipation of v0.3.2
sjfleming Jun 24, 2024
1198d8f
Update wdl for benchmarking compatibility
sjfleming Jun 24, 2024
87f33e1
Merge branch 'dev' of github.com:broadinstitute/CellBender into dev
sjfleming Jun 24, 2024
d7d228b
Bump version to v0.3.2
sjfleming Jun 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cellbender/VERSION.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.3.1.dev0
0.3.2
12 changes: 10 additions & 2 deletions cellbender/remove_background/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,14 +117,22 @@ def validate_args(args) -> argparse.Namespace:
args.fpr = fpr_list_correct_dtypes

# Ensure that "exclude_features" specifies allowed features.
# As of CellRanger 6.0, the possible features are:
# As of CellRanger 7.2, the possible features are:
# Gene Expression
# Antibody Capture
# CRISPR Guide Capture
# Custom
# Peaks
# Multiplexing Capture
# VDJ
# VDJ-T
# VDJ-T-GD
# VDJ-B
# Antigen Capture
allowed_features = ['Gene Expression', 'Antibody Capture',
'CRISPR Guide Capture', 'Custom', 'Peaks']
'CRISPR Guide Capture', 'Custom', 'Peaks',
'Multiplexing Capture', 'VDJ', 'VDJ-T',
'VDJ-T-GD', 'VDJ-B', 'Antigen Capture']
for feature in args.exclude_features:
if feature not in allowed_features:
sys.stdout.write(f"Specified '{feature}' using --exclude-feature-types, "
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,9 @@ def cromshell_submit(wdl: str,
submit_cmd = ['cromshell', 'submit',
tmp_wdl,
inputs,
'--options-json',
options,
'--dependencies-zip',
dependencies_zip]

# submit job
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def get_cromshell_output_h5(workflow: str, grep: str = '_out.h5') -> Union[str,
"""Use cromshell list-outputs to get the relevant file gsURL"""

output = grep_from_command(['cromshell', 'list-outputs', workflow], grep=grep)
out = output[:-1].decode().split('\n')
out = output.decode().lstrip('run_cellbender_benchmark.h5_array: ').rstrip('\n').split('\n')
if len(out) > 1:
return out
else:
Expand All @@ -95,18 +95,18 @@ def metadata_from_workflow_id(workflow: str) -> Tuple[str, str, Optional[str]]:
# git hash
output = grep_from_command(['cromshell', 'metadata', workflow],
grep='"git_hash":')
git_hash = output[17:-3].decode()
git_hash = output.decode().split('"git_hash": ')[-1].lstrip('"').split('"')[0]

# input file
output = grep_from_command(['cromshell', 'metadata', workflow],
grep='run_cellbender_benchmark.cb.input_file_unfiltered')
input_file = output[58:-3].decode()
input_file = 'gs://' + output.decode().split('gs://')[-1].split('"')[0]

# truth file
output = grep_from_command(['cromshell', 'metadata', workflow],
grep='run_cellbender_benchmark.cb.truth_file')
if 'null' not in output.decode():
truth_file = output[47:-3].decode()
truth_file = 'gs://' + output.decode().split('gs://')[-1].split('"')[0]
else:
truth_file = None

Expand Down
37 changes: 20 additions & 17 deletions cellbender/remove_background/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,6 @@ def run_training(model: RemoveBackgroundPyroModel,

# Initialize train and tests ELBO with empty lists.
train_elbo = []
test_elbo = []
lr = []
epoch_checkpoint_freq = 1000 # a large number... it will be recalculated

Expand Down Expand Up @@ -212,16 +211,15 @@ def run_training(model: RemoveBackgroundPyroModel,
if epoch % test_freq == 0:
model.eval()
total_epoch_loss_test = evaluate_epoch(svi, test_loader)
test_elbo.append(-total_epoch_loss_test)
model.loss['test']['epoch'].append(epoch)
model.loss['test']['elbo'].append(-total_epoch_loss_test)
logger.info("[epoch %03d] average test loss: %.4f"
% (epoch, total_epoch_loss_test))

# Check whether test ELBO has spiked beyond specified conditions.
if (epoch_elbo_fail_fraction is not None) and (len(test_elbo) > 2):
current_diff = max(0., test_elbo[-2] - test_elbo[-1])
overall_diff = np.abs(test_elbo[-2] - test_elbo[0])
if (epoch_elbo_fail_fraction is not None) and (len(model.loss['test']['elbo']) > 2):
current_diff = max(0., model.loss['test']['elbo'][-2] - model.loss['test']['elbo'][-1])
overall_diff = np.abs(model.loss['test']['elbo'][-2] - model.loss['test']['elbo'][0])
fractional_spike = current_diff / overall_diff
if fractional_spike > epoch_elbo_fail_fraction:
raise ElboException(
Expand All @@ -245,15 +243,20 @@ def run_training(model: RemoveBackgroundPyroModel,

# Check on the final test ELBO to see if it meets criteria.
if final_elbo_fail_fraction is not None:
best_test_elbo = max(test_elbo)
if test_elbo[-1] < best_test_elbo:
final_best_diff = best_test_elbo - test_elbo[-1]
initial_best_diff = best_test_elbo - test_elbo[0]
if (final_best_diff / initial_best_diff) > final_elbo_fail_fraction:
best_test_elbo = max(model.loss['test']['elbo'])
if model.loss['test']['elbo'][-1] < best_test_elbo:
final_best_diff = best_test_elbo - model.loss['test']['elbo'][-1]
initial_best_diff = best_test_elbo - model.loss['test']['elbo'][0]
if initial_best_diff == 0:
raise ElboException(
f'Training failed because final test loss {test_elbo[-1]:.2f} '
f"Training failed because there was no improvement from the initial test loss {model.loss['test']['elbo'][0]:.2f}. "
f"Final test loss was {model.loss['test']['elbo'][-1]}"
)
elif (final_best_diff / initial_best_diff) > final_elbo_fail_fraction:
raise ElboException(
f"Training failed because final test loss {model.loss['test']['elbo'][-1]:.2f} "
f'is not sufficiently close to best test loss {best_test_elbo:.2f}, '
f'compared to the initial test loss {test_elbo[0]:.2f}. '
f"compared to the initial test loss {model.loss['test']['elbo'][0]:.2f}. "
f'Fractional difference is {final_best_diff / initial_best_diff:.2f}, '
f'which is > specified final_elbo_fail_fraction {final_elbo_fail_fraction:.2f}'
)
Expand Down Expand Up @@ -284,14 +287,14 @@ def run_training(model: RemoveBackgroundPyroModel,
logger.info(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))

# Check final ELBO meets conditions.
if (final_elbo_fail_fraction is not None) and (len(test_elbo) > 1):
best_test_elbo = max(test_elbo)
if -test_elbo[-1] >= -best_test_elbo * (1 + final_elbo_fail_fraction):
raise ElboException(f'Training failed because final test loss ({-test_elbo[-1]:.4f}) '
if (final_elbo_fail_fraction is not None) and (len(model.loss['test']['elbo']) > 1):
best_test_elbo = max(model.loss['test']['elbo'])
if -model.loss['test']['elbo'][-1] >= -best_test_elbo * (1 + final_elbo_fail_fraction):
raise ElboException(f"Training failed because final test loss ({-model.loss['test']['elbo'][-1]:.4f}) "
f'exceeds best test loss ({-best_test_elbo:.4f}) by >= '
f'{100 * final_elbo_fail_fraction:.1f}%')

# Free up all the GPU memory we can once training is complete.
torch.cuda.empty_cache()

return train_elbo, test_elbo
return train_elbo, model.loss['test']['elbo']
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
82 changes: 64 additions & 18 deletions docs/source/changelog/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,33 +11,37 @@ edge case bug fixes, speedups, and small new features might bump up the last
digit of the version number. For example, the difference between 0.2.1 and 0.2.0
represents this kind of small change.

Version 0.1.0

Version 0.3.2
-------------

This was the initial release. The output count matrix was constructed via
imputation, so that there were no explicit guarantees that CellBender would
only subtract counts and never add.
Small improvements aimed at reducing memory footprint, along with bug fixes.

This version has been deprecated, and we do not recommend using it any longer.
Improvements:

- Imputes the "denoised" count matrix using a variational autoencoder
- Make posterior generation more memory efficient

Version 0.2.0
-------------
New features:

A significant overhaul of the model and the output generation procedure were
undertaken to explicitly guarantee that CellBender only subtracts counts and
never adds. The output is not constructed by imputation or smoothing, and
CellBender intentionally tries to modify the raw data as little as possible in
order to achieve denoising. A nominal false positive rate is approximately
controlled at the level of the entire dataset, to prevent removal of too much
signal.
- WDL workflow updates to facilitate automatic retries on failure
- Added to list of allowed feature types to match 2024.04 CellRanger definitions

- Uses a variational autoencoder as a prior
Bug fixes:

- Computes the "denoised" count matrix using a MAP estimate and posterior regularization
- Fix bug with MTX inputs for WDL
- Fix Windows bug during posterior generation
- Fix report generation bugs on Mac and Windows


(Version 0.3.1 -- redacted)
---------------------------

WARNING: redacted

If you managed to obtain a copy of v0.3.1 before it was redacted, do not use it. An integer
overflow bug caused outputs to be incorrect in nearly all cases. For more information, see
`github issue 347 here <https://github.com/broadinstitute/CellBender/pull/347>`_.

- CellBender never adds counts

Version 0.3.0
-------------
Expand Down Expand Up @@ -84,6 +88,37 @@ a workflow using Google Colab on a GPU for free.
hundreds of samples in automated pipelines. This file can be parsed to look for
indications that a sample may need to be re-run.


Version 0.2.0
-------------

A significant overhaul of the model and the output generation procedure were
undertaken to explicitly guarantee that CellBender only subtracts counts and
never adds. The output is not constructed by imputation or smoothing, and
CellBender intentionally tries to modify the raw data as little as possible in
order to achieve denoising. A nominal false positive rate is approximately
controlled at the level of the entire dataset, to prevent removal of too much
signal.

- Uses a variational autoencoder as a prior

- Computes the "denoised" count matrix using a MAP estimate and posterior regularization

- CellBender never adds counts


Version 0.1.0
-------------

This was the initial release. The output count matrix was constructed via
imputation, so that there were no explicit guarantees that CellBender would
only subtract counts and never add.

This version has been deprecated, and we do not recommend using it any longer.

- Imputes the "denoised" count matrix using a variational autoencoder


Human-mouse mixture benchmark
-----------------------------

Expand Down Expand Up @@ -137,3 +172,14 @@ v0.3.0

.. image:: /_static/remove_background/v0.3.0_hgmm.png
:width: 750 px

This represents a real improvement over the results published in the paper.

v0.3.2
~~~~~~

.. image:: /_static/remove_background/v0.3.2_hgmm.png
:width: 750 px

This appears identical to v0.3.0, as the changes were intended to fix bugs and
reduce memory footprint.
3 changes: 2 additions & 1 deletion docs/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,4 +185,5 @@ The information contained in the posterior can be used to
quantitatively answer questions such as "What is the probability that the
number of viral gene counts in this cell is nonzero?" For help with these kinds
of computations, please open a
`github issue <https://github.com/broadinstitute/CellBender/issues>`_.
`github issue <https://github.com/broadinstitute/CellBender/issues>`_, or see
the `semi-worked example on the github issue here <https://github.com/broadinstitute/CellBender/issues/299>`_.
1 change: 1 addition & 0 deletions wdl/cellbender_remove_background.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ task run_cellbender_remove_background_gpu {
git clone -q https://github.com/broadinstitute/CellBender.git /cromwell_root/CellBender
cd /cromwell_root/CellBender
git checkout -q ~{dev_git_hash__}
yes | pip install -U pip setuptools
yes | pip install --no-cache-dir -U -e /cromwell_root/CellBender
pip list
cd /cromwell_root
Expand Down
Loading
Loading