Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training & test curves are strange shape #199

Closed
twoneu opened this issue Apr 15, 2023 · 2 comments
Closed

Training & test curves are strange shape #199

twoneu opened this issue Apr 15, 2023 · 2 comments
Assignees
Labels
user question User question about a specific dataset

Comments

@twoneu
Copy link

twoneu commented Apr 15, 2023

Hello CellBender team,

Thank you so much for your incredible tool! I have encountered a strange error with one of my samples where the training and test curves converge, but are a strange shape:

image

I have tried several different parameters with no real change in any of these curves:

  1. Default settings (failed)
  2. Total_droplets_included = 50K
  3. Total_droplets_included = 50K, epochs = 300
  4. Total_droplets_included = 50K, epochs = 150, empty_training_frac = 0.75
  5. Total_droplets_included = 50K, learning rate = 1e-6

Do you have any suggestions for other parameters to alter?

@sjfleming
Copy link
Member

Hi @twoneu ! I agree that this run looks strange. This is definitely not how the learning curve is supposed to look. Something like this is much more ideal:
image

So, it sounds like you've already tried quite a few different things, including a reduced learning rate, which is one thing I usually see help.

I'm wondering about the dataset itself. I see that the PCA plot of CellBender's learned representation of gene expression is just kind of one big blob. I also see that the UMI curve looks particularly challenging.

How many cells do you expect to see in this experiment? Can you paste a plot of the full log-log UMI curve here? I'm interested to see if the droplets in that middle plot (< 50k) seem to be cells or empty droplets. They have only several hundred UMI counts each. Typically I expect cells to have 1000 UMI counts or more.

To me, it looks like it's a distinct possibility that this sample was a QC failure, and the experiment just didn't work as intended. I often see this kind of UMI curve and PCA plot when the experiment has inadvertently broken open all the cells (or nuclei), so that basically the whole experiment is just one big sampling of ambient RNA. Is that a possibility here? It does look like, from the UMI curve, there is no way by eye to distinguish what might be a cell versus what might be empty. It's just kind of a smooth decline.

@sjfleming sjfleming added the user question User question about a specific dataset label Apr 18, 2023
@sjfleming
Copy link
Member

This kind of thing will hopefully be solved in v0.3.0

Closed by #238

@sjfleming sjfleming self-assigned this Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user question User question about a specific dataset
Projects
None yet
Development

No branches or pull requests

2 participants