Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align selective sweeps API with DFEs #1082

Closed
petrelharp opened this issue Nov 2, 2021 · 7 comments
Closed

Align selective sweeps API with DFEs #1082

petrelharp opened this issue Nov 2, 2021 · 7 comments
Labels
enhancement New feature or request
Milestone

Comments

@petrelharp
Copy link
Contributor

petrelharp commented Nov 2, 2021

The API for incorporating selective sweeps works directly with mutation type IDs (i.e., they need to know what the SLiM ID of the mutation type will be once it is simulated). Here's the rewrite of that example using the add_DFE way of doing things from #1078:

species = stdpopsim.get_species("DroMel")
model = stdpopsim.PiecewiseConstantSize(100000)
samples = model.get_samples(100)
contig = species.get_contig("2L", length_multiplier=0.01)

### start modified stuff
mt = stdpopsim.MutationType(
        distribution_type="f",
        dominance_coeff=1.0,
        distribution_args=[0.5],
        convert_to_substitution=False,
)
dfe = stdpopsim.DFE(
    id="new_mutation",
    mutation_types=[mt],
    proportions=[1.0],
    description="added mutation",
    long_description="mutation type to be added",
)
contig.add_DFE(
    intervals=np.empty((0,2), dtype='int'),
    DFE=dfe,
)
for mt_info in contig.mutation_types():
    if mt_info["dfe_id"] == dfe.id:
        break
                                                  
mut_id = mt_info["id"]
### end modified stuff

coordinate = int(contig.recombination_map.sequence_length / 2)
T_mut = 1000
extended_events = [
    stdpopsim.ext.DrawMutation(
        time=T_mut,
        mutation_type_id=mut_id,
        population_id=0,
        coordinate=coordinate,
        save=True,
    )
]

This works, but to make it easier, I propose making a new Contig method that does the (a) make a mutation type (b) make a DFE with that mutation type (c) add it with no intervals to the Contig steps, and (d) return the mutation type ID. (i.e., everything in the ### above) We could call this Contig.add_single_mutation_type? Or, something more evocative? What do you think, @grahamgower?

@petrelharp petrelharp added the enhancement New feature or request label Nov 2, 2021
@grahamgower
Copy link
Member

Yeah, we definitely need a wrapper function for this to avoid confusion. It seems a bit awkward that the mutation type needs to have a dummy DFE, but if this simplifies the implementation then I guess it's a win. Perhaps the name of the function could be more explicit, e.g. Contig.add_single_site_mutation_type?

@petrelharp
Copy link
Contributor Author

I think having DFEs and intervals and mutation types is a lot, and kinda redundant, when we can do with just the first two. I'd really like to remove the .mutation_types property altogether, since that's mostly reflecting how it happens to be implemented in SLiM. But we need some way of communicating which mutation types came from which DFE.

I don't think it's too bad to have a 'dummy' DFE, actually, since that's a place that we can attach names and descriptions, which we can't do for a MutationType.

I think we could get rid of mutation_types if we put information about the mutation types in top-level metadata - e.g., something that mapped mutation IDs to DFE IDs, so one wouldn't have to refer back to the Contig to interpret the tree sequence.

@andrewkern
Copy link
Member

so the next step after Contig.add_single_site_mutation_type() would be to define the parameters of the event(s) associated with the mutation type to be injected, e.g. a sweep.

So perhaps we want a function like stdpopsim.ext.sample_conditional_on_allele_frequency(time, final_frequency, selection_coeff, ... ) that would call Contig.add_single_site_mutation_type() and then go ahead and set up the extended events as in the tutorial, returning that events array to be passed to engine.simulate(). How's that sound?

@petrelharp
Copy link
Contributor Author

To summarize:

  • the Events need to know which mutation type to refer to
  • this is too bad because we'd like to hide the implementation detail of which numeric ID goes with which mutation type

@andrewkern's proposal would get around this by (I think) adding a method that takes a contig and a bunch of other stuff and then both (a) sets up the mutation type/DFE in the contig and (b) sets up and returns the Events. But, this seems kinda hard, @andrewkern, since consider the example in the docs where there's several conditionings. To do as you propose, we'd have to have make up some way of passing this series of conditionings into the sample_conditional_on_allele_frequency() method; and this series of conditionings is a sequence of frequency thresholds, operators, populations, and time intervals. We could do that, but it seems a bit more awkward than the current proposal? It would bar the way to future "events" as we might want to add, however.

Here's another proposal: instead of mutation_type, have the Events take a DFE ID (like, the name of the DFE); then we'd just need to (a) find the DFE and check that the name is unique; (b) check the DFE has no associated intervals; (c) find its mutation type. So, we'd do something like:

contig.add_single_site_mutation_type(
    selection_coeff=0.5,
    id="new_mutation"
)
extended_events = [
    stdpopsim.ext.DrawMutation(
        time=T_mut,
        mutation_id="new_mutation",
        population_id=0,
        coordinate=coordinate,
        save=True,
    ),
    stdpopsim.ext.ConditionOnAlleleFrequency(
        start_time=stdpopsim.ext.GenerationAfter(T_mut),
        end_time=0,
        mutation_id="new_mutation",
        population_id=0,
        op=">",
        allele_frequency=0.0,
    )
]
engine = stdpopsim.get_engine("slim")
ts_sweep = engine.simulate( model, contig, samples, seed=123,
    extended_events=extended_events)

@grahamgower
Copy link
Member

A string name for the mutation id is a great idea! We should probably transition to string names for the population id too.

@andrewkern
Copy link
Member

I think this is a case of "why not both?"

i.e. i think we could provide convenience functions to the user to set up these commonly used models. for instance what if the code for sweeps were something like:

contig.add_single_site_mutation_type(
    selection_coeff=0.5,
    id="new_mutation"
)
sweep_events = sample_conditional_on_allele_freq(
                                    mutation_id="new_mutation",   
                                    population_id=0,
                                    time_mut=T_mut,
                                    conditional_freq=1.0
                                )
engine = stdpopsim.get_engine("slim")
ts_sweep = engine.simulate( model, contig, samples, seed=123,
    extended_events=sweep_events)

@nspope
Copy link
Collaborator

nspope commented Sep 26, 2022

Closed by #1341

@nspope nspope closed this as completed Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants