-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seurat v3 HVG method #1182
Seurat v3 HVG method #1182
Conversation
Please let me know if you have any recommendations @ivirshup ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I've made some comments on the code about correctness. I'm still working on making sure I completely understand the code. I'd like for @gokceneraslan and @LuckyMD to take a look at this too, since they're more familiar with this.
Thanks for including that link to the overleaf page, that will be useful in understanding this. Do you think some of that could just go directly in the doc-string?
Could you add:
- Tests
- An example for us, showing that it works/ is useful
- An short example of usage in the doc-string
Co-Authored-By: Isaac Virshup <ivirshup@gmail.com>
Co-Authored-By: Isaac Virshup <ivirshup@gmail.com>
Thanks for the review. I will add the other requested changes/additions soon. So also the way I interpreted the way they clip values based on the Stuart '19 publication is different than what it appears they do in their code, which I linked in the overleaf doc in the docstring. My original interpretation is the first code in the linked issue. |
@ivirshup I added a custom implementation of loess, which uses a degree 2 polynomial (closer to seurat implementation). It's a bit slow, I'm sure it can be sped up with numba, but it's having a problem with this line: Any idea why? Edit: I got it to work by not using clip, but it's still a bit slow. Please let me know if you have any ideas! Edit 2: I found a stable python implementation of degree 2 loess |
I'm also not sure why the test is failing -- it works interactively locally for me. |
@ivirshup is it possible that Travis has cached pbmc3k and that's what's causing the error? I really don't have it running pytest locally either. Also as far as the code review -- I understand code is duplicated, but this code does not really fit in the existing implementation because it works a bit differently and requires raw data. Let me know how you'd like to address this. Thanks! |
This got a bit messy -- travis is defeating me because the tests work locally. I'm going to close this and figure it out on my fork and make a new PR with a clean history. |
Sorry for the lack of feedback! I got caught up in my own work.
|
No worries - I really can't tell what the issue is since it works perfectly fine locally. My sense now is that
is not working properly on travis because based on this method, if the regularized std is 1, which occurs when it's constant, the new variance would just be the same as in the variances column of the df, which is what's happening on travis. |
Hey @adamgayoso , I really love this HVG method, but sometimes I get the following error from the loess fit: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-244-764977f87ce6> in <module>
----> 1 sc.pp.highly_variable_genes(ad_sub, n_top_genes=1500, flavor='seurat_v3', layer='counts')
2 sc.pp.pca(ad_sub)
3 sc.pp.neighbors(ad_sub)
4 sc.tl.umap(ad_sub)
5 sc.tl.leiden(ad_sub, resolution=2.0)
~/.miniconda3/lib/python3.8/site-packages/scanpy/preprocessing/_highly_variable_genes.py in highly_variable_genes(adata, layer, n_top_genes, min_disp, max_disp, min_mean, max_mean, span, n_bins, flavor, subset, inplace, batch_key)
413
414 if flavor == 'seurat_v3':
--> 415 return _highly_variable_genes_seurat_v3(
416 adata,
417 layer=layer,
~/.miniconda3/lib/python3.8/site-packages/scanpy/preprocessing/_highly_variable_genes.py in _highly_variable_genes_seurat_v3(adata, layer, n_top_genes, batch_key, span, subset, inplace)
82 x = np.log10(mean[not_const])
83 model = loess(x, y, span=span, degree=2)
---> 84 model.fit()
85 estimat_var[not_const] = model.outputs.fitted_values
86 reg_std = np.sqrt(10 ** estimat_var)
_loess.pyx in _loess.loess.fit()
ValueError: b'reciprocal condition number 7.4971e-16\n' This is due to the very low but non-zero variance genes, I think. It goes away when I run Maybe we can make the variance check more stringent, or we can print a better error message for the users? |
Fixes #993.
This is an approximate implementation of the Seurat v3 hvg method. The only difference should be the use of lowess instead of loess (which is not available in python as far as I know).
This method takes the UMI counts as input. The way HVGs from batches are merged is also different from the other flavors. As such, I didn't see a straightforward way to integrate this in the existing HVG code.