Add API for selective sweeps #1341

nspope · 2022-08-27T19:16:04Z

Fixes Convenience functions for single site selection models #1329 and Align selective sweeps API with DFEs #1082.
Fixes a bug in the SLiM engine at

stdpopsim/stdpopsim/slim_engine.py

Line 217 in ec4fedd

community.allScriptBlocks.active = F;

that caused fitness calculations to be disabled for one generation after a restore. This probably had very little impact on simulation correctness, but did reduce efficiency of the rejection sampler.
Fixes a bug where fitness logging would be disabled if the logging interval was set to every generation.
Removes selection_coeff and dominance_coeff as arguments from Contig.add_single_site, as these are now specified in ext.selective_sweep. For the lower-level API this means that these parameters have to be set by a ChangeMutationFitness event.
Fixes Docstrings needed for stdpopsim.ext extended event classes #1336.
Fixes Allele frequency conditioning with multiple DrawMutation events at the same locus #1337 and fix allele frequency calculation in slim code, or disallow multiple drawn mutations per MutationType #608.
"Fixes" rename stdpopsim.ext.ExtendedEvents to ...SLiMEvents #656. As these classes may ultimately also be used for sweep specification in msprime, we'll keep calling them "extended events".

codecov · 2022-08-27T19:22:54Z

Codecov Report

Base: 99.67% // Head: 99.78% // Increases project coverage by +0.11% 🎉

Coverage data is based on head (10cfd5f) compared to base (2bfa88b).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 10cfd5f differs from pull request most recent head d4f8126. Consider uploading reports for the commit d4f8126 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1341      +/-   ##
==========================================
+ Coverage   99.67%   99.78%   +0.11%     
==========================================
  Files         113      113              
  Lines        3699     3798      +99     
  Branches      475      505      +30     
==========================================
+ Hits         3687     3790     +103     
+ Misses          8        4       -4     
  Partials        4        4

Impacted Files	Coverage Δ
stdpopsim/ext/selection.py	`100.00% <100.00%> (ø)`
stdpopsim/genomes.py	`100.00% <100.00%> (ø)`
stdpopsim/slim_engine.py	`100.00% <100.00%> (ø)`
stdpopsim/utils.py	`100.00% <0.00%> (ø)`
stdpopsim/species.py	`97.89% <0.00%> (ø)`
stdpopsim/genetic_maps.py	`100.00% <0.00%> (ø)`
stdpopsim/warning_categories.py	`100.00% <0.00%> (ø)`
stdpopsim/catalog/PapAnu/__init__.py	`100.00% <0.00%> (ø)`
stdpopsim/cli.py	`99.77% <0.00%> (+<0.01%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

nspope · 2022-08-29T22:31:05Z

I think we need to store information about the sweep in the metadata, but I'm not sure yet what that should look like (as mutation fitness can change through time). Currently, single sites will show up in the metadata as strictly neutral DFEs that span a length-1 interval on the contig.

petrelharp · 2022-08-29T22:41:28Z

What do you mean "we need to"? We have not yet formalized how we want to store anything in metadata; currently I think the only thing we're always storing there is Q. And, getting into the weeds of trying to store enough information there to completely recreate everything is a giant can of worms - there's some kind of happy balance to be struck that we haven't gotten at yet.

However, storing stuff in metadata for testing purposes is very useful; maybe this is what you meant? (i.e., in the "if debug > x" block)

nspope · 2022-08-30T03:42:05Z

What I mean is that _recap_and_rescale walks over the DFEs in the metadata here

stdpopsim/stdpopsim/slim_engine.py

Lines 1668 to 1671 in 3166ead

    
           for i, dfe in enumerate(ts.metadata["stdpopsim"]["DFEs"]): 
        
               assert len(dfe["proportions"]) == len(dfe["mutation_types"]) 
        
               for prop, mt in zip(dfe["proportions"], dfe["mutation_types"]): 
        
                   if mt["is_neutral"]:

to simulate mutations over strictly neutral DFEs. Single sites are currently stored as strictly neutral DFEs, so there's a very small chance they could end up with neutral mutations. So, I think we should store something about the sweep in the metadata for its parent DFE, like the selection coefficient. Or maybe I'm overthinking things?

petrelharp · 2022-08-30T04:55:05Z

Oh, good catch - I forgot about that. Yes, i agree - should put something distinguishing in the metadata. I've not got a good suggestion at the moment, though.

stdpopsim/ext/selection.py

stdpopsim/genomes.py

petrelharp · 2022-08-31T23:41:14Z

A thought: since msprime actually has (less sophisticated) ability to simluate sweeps; maybe we don't want to call these slim_events, as maybe in the future we'll do 'em with msprime also? (what do you think, @andrewkern?)

petrelharp · 2022-08-31T23:47:59Z

This is great! I have minor suggestions, above. @andrewkern should probably look at the proposed API at least to see if it's the right amount of flexible?

andrewkern

looks good to me @nspope. some minor suggestions here.

docs/tutorial.rst

stdpopsim/ext/selection.py

stdpopsim/genomes.py

stdpopsim/ext/selection.py

mufernando

made a few comments, but LGTM after you finish responding @petrelharp and @andrewkern points!

nspope · 2022-09-09T21:27:30Z

This should be good to go. The only thing I'm unsure of is how far to go in modifying the metadata for single sites. What I've done is to find those single sites that are referenced by extended events, and mark the mutation type within the associated DFE as non-neutral. This ensures that neutral mutations won't be accidentally simulated on top of any single sites with drawn mutations.

But, there's no other information copied over (selection coefficients, etc). In general, there could be multiple ChangeMutationFitness events for a single mutation -- and in that case it's not clear to me which one should "donate" parameters to the metadata. Alternatively, information from all associated extended events could be copied over into long description. I think the decision on what to do here could be postponed until either msprime sweeps are supported, or an API for more complicated "sweep-like" scenarios is introduced.

stdpopsim/genomes.py

stdpopsim/ext/selection.py

petrelharp · 2022-09-13T17:43:47Z

Sorry to bring this up so late, but I'm just noticing that sweeps created by selective_sweep are only beneficial in population_id. I would have expected that a "sweep" unless further qualified would be globally beneficial. Sure, many (local) sweeps are only locally adaptive, but the generic "selective sweep" is globally adaptive, I would have thought? Maybe this is OK, but here are some other options:

call this local_sweep( ) instead of selective_sweep( )
just have it globally beneficial
have a flag that switches it from "global" to "local"
have separate arguments for "originating population" and "populations where it's beneficial

I like option (3), currently - it seems to strike a good balance between simplicity and allowing the original behavior. I'd say that it should default to 'global'.

andrewkern · 2022-09-13T17:53:11Z

ah this is a good point @petrelharp! ultimately we'd like to be able to have a bit of flexibility about the selection coefficient in space and time

petrelharp

Other than the API comment, looks great to me!

nspope · 2022-09-14T20:38:26Z

Extended events now use string population instead of integer population_id.
Added flag for global sweep to ext.selective_sweep that's on by default. The ChangeMutationFitness event is modified so that if population is omitted, it applies to all populations.
Allele frequency conditioning only applies to the source population, regardless of this flag -- after talking it over with @petrelharp we decided that global frequency conditioning was a bad idea (eg would make the rejection sampling really inefficient in many cases). I think this behavior is made sufficiently clear in the documentation but see what you think.
Do we want to migrate this stuff out of ext?

andrewkern · 2022-09-19T19:54:46Z

this looks all good to me @nspope. wrt ext I think this can probably come out of there, but I vote we merge this first

stdpopsim/slim_engine.py

petrelharp · 2022-09-19T20:46:56Z

I didn't see it - do you have a test for what happens if you pass in an integer for 'population' when it's now supposed to be a string? (ie the old behavior)

petrelharp · 2022-09-19T20:47:34Z

I'm happy to merge this when you say it's ready.

…tium#1354)

nspope · 2022-09-20T18:25:57Z

OK, this is good to merge.

The population argument gets checked against a dict containing the population names. So, if it's an integer, an error will get raised along the lines of "Population XXX was not found for model YYY which has defined populations ZZZ". There's an explicit test for this now.

andrewkern · 2022-09-20T21:31:08Z

okay i'm going to squash and merge. thanks @nspope!!

nspope marked this pull request as draft August 27, 2022 19:16

nspope added this to the Version 0.2.0-beta milestone Aug 27, 2022

nspope marked this pull request as ready for review August 29, 2022 22:31

nspope requested review from petrelharp and mufernando August 29, 2022 22:31