Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including covariates with nullResiduals() #13

Open
johnchamberlin opened this issue Aug 13, 2020 · 4 comments
Open

Including covariates with nullResiduals() #13

johnchamberlin opened this issue Aug 13, 2020 · 4 comments

Comments

@johnchamberlin
Copy link

Is it possible to include gene or cell covariates when using the nullResiduals approximation of glc-pca (i.e. X and Z)? I am working with a large dataset and am trying to speed up glm-pca by using nullResiduals instead. By cell covariate I mean a computed value, not a batch identity.

Thanks,
John

@willtownes
Copy link
Collaborator

Hi thanks for your interest. So far this is not supported in scry; we only allow adjusting for cell covariates that are single categorical variables. The reason is because that is the scenario where the residuals can be computed in closed-form. In the future we might provide this feature but it will be much slower because it will require an iterative optimization procedure (basically equivalent to a Poisson regression) for each gene. Here are some possible alternative approaches in the meantime (in decreasing order of favorability):

  1. Try using the sctransform package, which uses negative binomial residuals instead of Poisson or binomial. I believe it supports arbitrary cell covariates. There is a slight difference in that it uses log-total count as a covariate instead of an offset by default and doesn't support deviance residuals (only Pearson), but neither of those should matter much. The main advantage scry has over sctransform is speed and simplicity, but once we start having non-categorical covariates that would go away.
  2. If your goal is to eventually apply PCA to the residuals, you could ignore the covariates and compute the ordinary scry residuals, apply PCA to them, then regress off your covariates from the resulting principal component values. It may be of interest to compare this to the harmony method.
  3. You could implement the above Poisson regression procedure yourself using the glm function from regular R (or some faster equivalent). Compute the log of total counts for each cell as an offset. Then for each gene (independently) regress the vector of gene counts y_j against your covariates matrix X with the offsets and the Poisson glm family with log link. Then take the deviance residuals from the fitted model. That should be equivalent to what scry provides in the case that X encodes a single categorical covariate and/or just an intercept term.

@willtownes
Copy link
Collaborator

oh one more thing, if you have tried glmpca in the past and been disappointed with speed, please make sure you have tried the newer v0.2 release (currently on CRAN), where we have increased the speed with a new optimization algorithm. Of course, it is still slower than computing residuals with scry.

@johnchamberlin
Copy link
Author

johnchamberlin commented Aug 13, 2020

I just started using glmpca yesterday. It runs in [edit: 23 minutes] with 40K cells, 2K features, L=10, which I think is perfectly adequate. But I wanted to produce the deviance matrix as well, which is when I switched to scry.

Can I use deviances to perform differential expression, similar to sctransform, or is the idea simply to use glm-pca (or PCA on deviances) ahead of clustering and then revert to raw counts?

Thanks

@willtownes
Copy link
Collaborator

I suppose you could use the deviance residuals for differential expression, but we would recommend using the raw counts instead along with a tool like DEseq2 or edgeR. Soneson & Robinson 2018 may be relevant.

By the way, I'm not sure of your use case for deviance residuals other than differential expression, but if you have already run GLM-PCA, you can use the predict.glmpca method to get a matrix of the imputed mean values for original data matrix. This is sort of like a denoised version of the data. If you were to normalize and log-transform it (there won't be any exact zeros), it should have similar properties as the deviance residuals matrix you would get from scry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants