Skip to content

Revised genotyping

Compare
Choose a tag to compare
@rhysnewell rhysnewell released this 12 Nov 05:06
· 1246 commits to master since this release

So, in keeping with tradition this release brings a bunch of changes to Lorikeet that make it pretty distant from where it was a month ago. I know only a few people are trying to keep track of all changes that keep being made here, and I'm sorry things are so stochastic. I think the words of my supervisor put it best when I told him about one of the changes I had made... "Ah, so freebayes is out this week, huh?"

Yeah, freebayes is out. Cancelled. For generating illegal instructions and segmentation fault on GPU nodes. I ain't fixing that, I'll just make my own variant caller.

Lorikeet's new best friends are UMAP and HDBSCAN. The curse of dimensionality hexed me pretty good during benchmarking, so UMAP is being used for dimensionality reduction. I chose it over PCA since it seems to discriminate grouping of variants way better. Also, since we now have to use a python library for UMAP, might as well upgrade fuzzy DBSCAN to it's better version: HDBSCAN

Changes:

  • Freebayes. OUT.
  • Fuzzy DBSCAN. OUT.
  • UMAP. IN.
  • HDBSCAN. IN.
  • Evolve now reports per sample dNdS and coverage values for each ORF

Current workflow:

lorikeet_revised (1)