Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimating Usage of GEP in another data set? #84

Open
DiegoSafian opened this issue May 18, 2024 · 4 comments
Open

Estimating Usage of GEP in another data set? #84

DiegoSafian opened this issue May 18, 2024 · 4 comments

Comments

@DiegoSafian
Copy link

DiegoSafian commented May 18, 2024

Hi,

I wonder if there is an appropriate way to estimate the usage of GEPs in another dataset so that one can compare changes in usage in different conditions? For example, I estimate GEPs usage per cell class in data set A and I want to know the usage of these GEPs in data set B.

My best,
Diego

@dylkot
Copy link
Owner

dylkot commented May 21, 2024

Hi, this is actually the topic of our recent preprint https://t.co/OexYxSnc3D

The code we use for doing this is here:

https://github.com/immunogenomics/starCAT

The step to package the output of cNMF for starCAT is a little bit of a work in progress but it is the build_reference() function on the development branch which you can optionally enable automatically in the consensus step with build_ref=True

Let me know if this makes sense or if you have questions!

@DiegoSafian
Copy link
Author

DiegoSafian commented May 24, 2024

Hi,
Thanks for your response. I am actually running cnmf using the command line (installed pip install cnmf), but I cannot find the way to enable build_ref=True. Do I need to work on a Python environment to do it?

This is how I run it:

conda activate cnmf

echo "### Step 1: prepare" 
cnmf prepare --output-dir ./data --name 15_26_cNMF_5000 -c data_matrix.txt -k 15 16 17 18 19 20 20 22 24 26 --n-iter 250 --seed 14 --numgenes 5000 --total-workers 10

echo "### Step 2: factorize" 
cnmf factorize --output-dir ./data --name 15_26_cNMF_5000 --worker-index 0

echo "### Step 3: combine"
cnmf combine --output-dir ./data --name 15_26_cNMF_5000

echo "### Step 4: plot"
cnmf k_selection_plot --output-dir ./data --name 15_26_cNMF_5000

echo "### Step 5: consensus"
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 17 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 18 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 19 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 20 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 22 --local-density-threshold 0.025 --show-clustering
cnmf consensus --output-dir ./data --name 15_26_cNMF_5000 --components 24 --local-density-threshold 0.025 --show-clustering

@dylkot
Copy link
Owner

dylkot commented May 24, 2024

Currently it is only on the development branch of the github (it will be moved to the main branch in the next few weeks hopefully). You can install it with pip like so:

pip install git+https://github.com/dylkot/cNMF.git@development

If you don't mind, let me know how it goes since this is something we are actively working on supporting.

@DiegoSafian
Copy link
Author

Hi again,
I tried it and it works perfectly fine and extremely fast!
The results are good; however, the Usage % in the dataset B decreased quite a bit. On the other hand, I am probably asking too much because I am actually comparing single nuclei data in two different species, which can be more challenging due to differences in cell composition and gene expression capture. Still, it produces very coherent results. I would definitely keep using it. I am attaching a fig for you, so you can have an idea about the results
example.pdf

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants