Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use usher-sampled if installed #499

Merged
merged 3 commits into from
Jan 3, 2023

Conversation

AngieHinrichs
Copy link
Member

@AngieHinrichs AngieHinrichs commented Dec 9, 2022

UShER release v0.6.0 includes the new usher-sampled, a much faster version of usher developed by @yceh. Using usher-sampled instead of usher speeds up pangolin's usher mode dramatically when running on large numbers of input sequences.

Ironically, in order to run with pangolin's default --threads=1 and small numbers of sequences, a couple of small changes were required, so UShER release v0.6.1 or later should be installed with pangolin. It is now available from bioconda for linux. (Looks like 0.5.6 is the most recent version of usher available on Mac.)

This PR changes usher.smk's usher_inference rule to use usher-sampled if it is installed; otherwise, it prints out a message recommending that the user update the usher package, and uses usher as usual.

Not only is usher-sampled faster than usher, it also avoids some overcounting of multiple equally parsimonious placements (e.g. near both parent & siblings but on the same node) and doesn't count an N-match at the end of a placement path as an EPP when there are non-N matches at the ends of other paths. This helps to reduce some occasional over-specific assignments that depended on Ns or ambiguous bases at the ends of placement paths.

…t out a message recommending that the user update usher. Include the usher(-sampled) command in the log output to help with debugging just in case.
@aineniamh
Copy link
Member

Hi Angie, looks great! I've just checked through the code and ran it on my local instance (with Usher 0.5.6 installed).

First run through, I just installed this version without changing anything else about my environment. pangolin still runs and assigns, so all good there.

With this install it runs and produces this output:

Using UShER as inference engine.

*** usher-sampled is not installed -- please upgrade usher to at least v0.6.1 ***
*** If you used conda to install usher, run 'conda update --no-pin usher' ***

usher command: usher -n -D -i /Users/s1680070/opt/miniconda3/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb -v /var/folders/6c/8t0c48s536q0x65wps6jp_d80000gr/T/tmp2h7afvw3/sequences.aln.vcf -T 1 -d /var/folders/6c/8t0c48s536q0x65wps6jp_d80000gr/T/tmp2h7afvw3 &> /var/folders/6c/8t0c48s536q0x65wps6jp_d80000gr/T/tmp2h7afvw3/logs/usher.log
****
Output file written to: /Users/s1680070/repositories/pangolin/lineage_report.csv

In default mode it'll print all these temp dir paths to screen when it's echo'ing the usher command. It's up to you as it's the usher pipeline, but I think this looks a bit messy so might suggest either an abridged command print or else just report that usher is running.

I then ran the suggested:

conda update --no-pin usher

which ran fine (I'd suggest maybe include an alternative mamba command for those with mamba installed, e.g:

Run: 
conda update --no-pin usher
Alternatively users with mamba installed can update by running
mamba update --no-pin usher

I'm on OSX though, so even though I ran the command, the latest usher for osx is still the one I was already running 0.5.6 (like you said). I'd modify the print out to reflect that only linux users will be able to use usher sampled right now, otherwise mac users might be confused trying to install an update that's not available to them yet.

Otherwise the code looks great and passess checks, so isn't breaking on linux tests (which should be running the sampled mode) and on macosx (which will be like me still running the older mode).

So impressed with the reported speed-ups in usher sampled! Fantastic work from the usher team!

@AngieHinrichs
Copy link
Member Author

Thanks for the testing and feedback!

usher command: usher -n -D -i ...

What, I think it looks great! With all the output from snakemake that is... less specific... 😆 Just kidding. You're right, I'll take it back out, the number of people running pangolin who will want to know what options are passed to usher is probably not much greater than one, and usher.smk is there for anyone to read. 🤓

Alternatively users with mamba installed can update by running
mamba update --no-pin usher

👍 will do

I'd modify the print out to reflect that only linux users will be able to use usher sampled right now, otherwise mac users might be confused trying to install an update that's not available to them yet.

Yeah, good point, I'll work on that message, and meanwhile try to figure out why bioconda hasn't updated usher for Mac since 0.5.6.

Thanks again!

@AngieHinrichs
Copy link
Member Author

AngieHinrichs commented Dec 15, 2022

This is what the info message for lack of usher-sampled looks like now:

*** usher-sampled is not installed -- please upgrade usher to at least v0.6.1 ***
*** If you used conda to install usher, run 'conda update --no-pin usher'     ***
*** Alternatively if mamba is installed, run 'mamba update --no-pin usher'    ***
*** If you use Mac OS X and usher 0.6.1 or later is not yet available, then   ***
*** please pardon the inconvenience but watch for updates.                    ***

It looks like bioconda folks are skipping usher for Mac builds because it was failing to build for ARM/M1 -- one of the new programs (ripples-fast) is x86-only. [Edit: building ripples-fast fails on Mac x86_64 too. We can skip ripples-fast for the Mac build for now.] We're looking into how to get around that.

@AngieHinrichs
Copy link
Member Author

I made a BioConda PR (bioconda/bioconda-recipes#38446) that omits the ripples-fast program from usher when building on Mac for now, until we can figure out why the bioconda Mac x86_64 build fails for ripples-fast. Hopefully that will go through soon and then usher-sampled will be available for Mac. 🤞

@AngieHinrichs
Copy link
Member Author

usher 0.6.1 with usher-sampled is now available from bioconda for Mac! I'm going to merge this and see if I can find some tip-of-tree beta testers. :) If that goes well then I hope we can tag a pangolin release soon.

@AngieHinrichs AngieHinrichs merged commit 888f4d1 into cov-lineages:master Jan 3, 2023
wm75 added a commit to bioconda/bioconda-recipes that referenced this pull request Jan 13, 2023
bgruening pushed a commit to bioconda/bioconda-recipes that referenced this pull request Jan 13, 2023
* Update pangolin to 4.2

* Bump usher requirement

See cov-lineages/pangolin#499

* Adjust test for usher 0.6.1

Co-authored-by: Wolfgang Maier <maierw@informatik.uni-freiburg.de>
cokelaer pushed a commit to cokelaer/bioconda-recipes that referenced this pull request Apr 28, 2023
* Update pangolin to 4.2

* Bump usher requirement

See cov-lineages/pangolin#499

* Adjust test for usher 0.6.1

Co-authored-by: Wolfgang Maier <maierw@informatik.uni-freiburg.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants