-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use usher-sampled if installed #499
Use usher-sampled if installed #499
Conversation
…t out a message recommending that the user update usher. Include the usher(-sampled) command in the log output to help with debugging just in case.
Hi Angie, looks great! I've just checked through the code and ran it on my local instance (with Usher 0.5.6 installed). First run through, I just installed this version without changing anything else about my environment. pangolin still runs and assigns, so all good there. With this install it runs and produces this output:
In default mode it'll print all these temp dir paths to screen when it's echo'ing the usher command. It's up to you as it's the usher pipeline, but I think this looks a bit messy so might suggest either an abridged command print or else just report that usher is running. I then ran the suggested:
which ran fine (I'd suggest maybe include an alternative mamba command for those with mamba installed, e.g:
I'm on OSX though, so even though I ran the command, the latest usher for osx is still the one I was already running 0.5.6 (like you said). I'd modify the print out to reflect that only linux users will be able to use usher sampled right now, otherwise mac users might be confused trying to install an update that's not available to them yet. Otherwise the code looks great and passess checks, so isn't breaking on linux tests (which should be running the sampled mode) and on macosx (which will be like me still running the older mode). So impressed with the reported speed-ups in usher sampled! Fantastic work from the usher team! |
Thanks for the testing and feedback!
What, I think it looks great! With all the output from snakemake that is... less specific... 😆 Just kidding. You're right, I'll take it back out, the number of people running pangolin who will want to know what options are passed to usher is probably not much greater than one, and usher.smk is there for anyone to read. 🤓
👍 will do
Yeah, good point, I'll work on that message, and meanwhile try to figure out why bioconda hasn't updated usher for Mac since 0.5.6. Thanks again! |
This is what the info message for lack of usher-sampled looks like now:
It looks like bioconda folks are skipping usher for Mac builds because it was failing to build |
I made a BioConda PR (bioconda/bioconda-recipes#38446) that omits the ripples-fast program from usher when building on Mac for now, until we can figure out why the bioconda Mac x86_64 build fails for ripples-fast. Hopefully that will go through soon and then usher-sampled will be available for Mac. 🤞 |
usher 0.6.1 with usher-sampled is now available from bioconda for Mac! I'm going to merge this and see if I can find some tip-of-tree beta testers. :) If that goes well then I hope we can tag a pangolin release soon. |
* Update pangolin to 4.2 * Bump usher requirement See cov-lineages/pangolin#499 * Adjust test for usher 0.6.1 Co-authored-by: Wolfgang Maier <maierw@informatik.uni-freiburg.de>
* Update pangolin to 4.2 * Bump usher requirement See cov-lineages/pangolin#499 * Adjust test for usher 0.6.1 Co-authored-by: Wolfgang Maier <maierw@informatik.uni-freiburg.de>
UShER release v0.6.0 includes the new usher-sampled, a much faster version of usher developed by @yceh. Using usher-sampled instead of usher speeds up pangolin's usher mode dramatically when running on large numbers of input sequences.
Ironically, in order to run with pangolin's default --threads=1 and small numbers of sequences, a couple of small changes were required, so UShER release v0.6.1 or later should be installed with pangolin. It is now available from bioconda for linux. (Looks like 0.5.6 is the most recent version of usher available on Mac.)
This PR changes usher.smk's usher_inference rule to use usher-sampled if it is installed; otherwise, it prints out a message recommending that the user update the usher package, and uses usher as usual.
Not only is usher-sampled faster than usher, it also avoids some overcounting of multiple equally parsimonious placements (e.g. near both parent & siblings but on the same node) and doesn't count an N-match at the end of a placement path as an EPP when there are non-N matches at the ends of other paths. This helps to reduce some occasional over-specific assignments that depended on Ns or ambiguous bases at the ends of placement paths.