Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for selective sweeps #1341

Merged
merged 2 commits into from
Sep 20, 2022

Conversation

nspope
Copy link
Collaborator

@nspope nspope commented Aug 27, 2022

@nspope nspope marked this pull request as draft August 27, 2022 19:16
@nspope nspope added this to the Version 0.2.0-beta milestone Aug 27, 2022
@codecov
Copy link

codecov bot commented Aug 27, 2022

Codecov Report

Base: 99.67% // Head: 99.78% // Increases project coverage by +0.11% 🎉

Coverage data is based on head (10cfd5f) compared to base (2bfa88b).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 10cfd5f differs from pull request most recent head d4f8126. Consider uploading reports for the commit d4f8126 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1341      +/-   ##
==========================================
+ Coverage   99.67%   99.78%   +0.11%     
==========================================
  Files         113      113              
  Lines        3699     3798      +99     
  Branches      475      505      +30     
==========================================
+ Hits         3687     3790     +103     
+ Misses          8        4       -4     
  Partials        4        4              
Impacted Files Coverage Δ
stdpopsim/ext/selection.py 100.00% <100.00%> (ø)
stdpopsim/genomes.py 100.00% <100.00%> (ø)
stdpopsim/slim_engine.py 100.00% <100.00%> (ø)
stdpopsim/utils.py 100.00% <0.00%> (ø)
stdpopsim/species.py 97.89% <0.00%> (ø)
stdpopsim/genetic_maps.py 100.00% <0.00%> (ø)
stdpopsim/warning_categories.py 100.00% <0.00%> (ø)
stdpopsim/catalog/PapAnu/__init__.py 100.00% <0.00%> (ø)
stdpopsim/cli.py 99.77% <0.00%> (+<0.01%) ⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nspope
Copy link
Collaborator Author

nspope commented Aug 29, 2022

I think we need to store information about the sweep in the metadata, but I'm not sure yet what that should look like (as mutation fitness can change through time). Currently, single sites will show up in the metadata as strictly neutral DFEs that span a length-1 interval on the contig.

@nspope nspope marked this pull request as ready for review August 29, 2022 22:31
@petrelharp
Copy link
Contributor

What do you mean "we need to"? We have not yet formalized how we want to store anything in metadata; currently I think the only thing we're always storing there is Q. And, getting into the weeds of trying to store enough information there to completely recreate everything is a giant can of worms - there's some kind of happy balance to be struck that we haven't gotten at yet.

However, storing stuff in metadata for testing purposes is very useful; maybe this is what you meant? (i.e., in the "if debug > x" block)

@nspope
Copy link
Collaborator Author

nspope commented Aug 30, 2022

What I mean is that _recap_and_rescale walks over the DFEs in the metadata here

for i, dfe in enumerate(ts.metadata["stdpopsim"]["DFEs"]):
assert len(dfe["proportions"]) == len(dfe["mutation_types"])
for prop, mt in zip(dfe["proportions"], dfe["mutation_types"]):
if mt["is_neutral"]:

to simulate mutations over strictly neutral DFEs. Single sites are currently stored as strictly neutral DFEs, so there's a very small chance they could end up with neutral mutations. So, I think we should store something about the sweep in the metadata for its parent DFE, like the selection coefficient. Or maybe I'm overthinking things?

@petrelharp
Copy link
Contributor

Oh, good catch - I forgot about that. Yes, i agree - should put something distinguishing in the metadata. I've not got a good suggestion at the moment, though.

stdpopsim/genomes.py Outdated Show resolved Hide resolved
@petrelharp
Copy link
Contributor

A thought: since msprime actually has (less sophisticated) ability to simluate sweeps; maybe we don't want to call these slim_events, as maybe in the future we'll do 'em with msprime also? (what do you think, @andrewkern?)

@petrelharp
Copy link
Contributor

This is great! I have minor suggestions, above. @andrewkern should probably look at the proposed API at least to see if it's the right amount of flexible?

@andrewkern andrewkern self-requested a review September 7, 2022 21:56
Copy link
Member

@andrewkern andrewkern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me @nspope. some minor suggestions here.

docs/tutorial.rst Outdated Show resolved Hide resolved
docs/tutorial.rst Outdated Show resolved Hide resolved
docs/tutorial.rst Outdated Show resolved Hide resolved
stdpopsim/ext/selection.py Outdated Show resolved Hide resolved
stdpopsim/genomes.py Outdated Show resolved Hide resolved
Copy link
Member

@mufernando mufernando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a few comments, but LGTM after you finish responding @petrelharp and @andrewkern points!

@nspope
Copy link
Collaborator Author

nspope commented Sep 9, 2022

This should be good to go. The only thing I'm unsure of is how far to go in modifying the metadata for single sites. What I've done is to find those single sites that are referenced by extended events, and mark the mutation type within the associated DFE as non-neutral. This ensures that neutral mutations won't be accidentally simulated on top of any single sites with drawn mutations.

But, there's no other information copied over (selection coefficients, etc). In general, there could be multiple ChangeMutationFitness events for a single mutation -- and in that case it's not clear to me which one should "donate" parameters to the metadata. Alternatively, information from all associated extended events could be copied over into long description. I think the decision on what to do here could be postponed until either msprime sweeps are supported, or an API for more complicated "sweep-like" scenarios is introduced.

stdpopsim/genomes.py Outdated Show resolved Hide resolved
@petrelharp
Copy link
Contributor

Sorry to bring this up so late, but I'm just noticing that sweeps created by selective_sweep are only beneficial in population_id. I would have expected that a "sweep" unless further qualified would be globally beneficial. Sure, many (local) sweeps are only locally adaptive, but the generic "selective sweep" is globally adaptive, I would have thought? Maybe this is OK, but here are some other options:

  1. call this local_sweep( ) instead of selective_sweep( )
  2. just have it globally beneficial
  3. have a flag that switches it from "global" to "local"
  4. have separate arguments for "originating population" and "populations where it's beneficial

I like option (3), currently - it seems to strike a good balance between simplicity and allowing the original behavior. I'd say that it should default to 'global'.

@andrewkern
Copy link
Member

ah this is a good point @petrelharp! ultimately we'd like to be able to have a bit of flexibility about the selection coefficient in space and time

Copy link
Contributor

@petrelharp petrelharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the API comment, looks great to me!

@nspope
Copy link
Collaborator Author

nspope commented Sep 14, 2022

  • Extended events now use string population instead of integer population_id.
  • Added flag for global sweep to ext.selective_sweep that's on by default. The ChangeMutationFitness event is modified so that if population is omitted, it applies to all populations.
  • Allele frequency conditioning only applies to the source population, regardless of this flag -- after talking it over with @petrelharp we decided that global frequency conditioning was a bad idea (eg would make the rejection sampling really inefficient in many cases). I think this behavior is made sufficiently clear in the documentation but see what you think.
  • Do we want to migrate this stuff out of ext?

@andrewkern
Copy link
Member

this looks all good to me @nspope. wrt ext I think this can probably come out of there, but I vote we merge this first

@petrelharp
Copy link
Contributor

I didn't see it - do you have a test for what happens if you pass in an integer for 'population' when it's now supposed to be a string? (ie the old behavior)

@petrelharp
Copy link
Contributor

I'm happy to merge this when you say it's ready.

@nspope
Copy link
Collaborator Author

nspope commented Sep 20, 2022

OK, this is good to merge.

The population argument gets checked against a dict containing the population names. So, if it's an integer, an error will get raised along the lines of "Population XXX was not found for model YYY which has defined populations ZZZ". There's an explicit test for this now.

@andrewkern
Copy link
Member

okay i'm going to squash and merge. thanks @nspope!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants