Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for selective sweeps #1341

Merged
merged 2 commits into from
Sep 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-18.04, macos-10.15, windows-latest]
os: [ubuntu-20.04, macos-10.15, windows-latest]
python: [3.7, "3.10"]
env:
CONDA_ENV_NAME: stdpopsim
Expand Down
40 changes: 18 additions & 22 deletions docs/selection_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@

logger = logging.getLogger(__name__)

# TODO: This example has been updated to reflect changes in the extended events
# API (see PRs #1306 and #1341) but it should be run and checked for
# correctness at some point


def adaptive_introgression(seed):
"""
Expand All @@ -21,13 +25,6 @@ def adaptive_introgression(seed):
100, 0, 0, 100, 2, 2 # YRI, CEU, CHB, Papuan, DenA, NeaA
)

# One mutation type, which we'll use for the positively selected mutation.
# Neutral mutations will be added by the SLiM engine as usual, after the
# SLiM phase of the simulation has completed.
positive = stdpopsim.MutationType(convert_to_substitution=False)
mutation_types = [positive]
mut_id = len(mutation_types) - 1

# We need some demographic model parameters to set bounds on the timing
# of random variables and extended_events (below).
# These values were copied from the PapuansOutOfAfrica_10J19 model
Expand Down Expand Up @@ -55,9 +52,9 @@ def adaptive_introgression(seed):
logger.info(f"Parameters: T_mut={T_mut:.3f}, T_sel={T_sel:.3f}, s={s:.3g}")

# Place the drawn mutation in the middle of the contig.
locus_id = "introgressed_locus"
coordinate = round(contig.recombination_map.sequence_length / 2)

pop = {p.name: i for i, p in enumerate(model.populations)}
contig.add_single_site(id=locus_id, coordinate=coordinate)

# Thinking forwards in time, we define a number of extended events that
# correspond to drawing the mutation, conditioning on the new allele not
Expand All @@ -69,9 +66,8 @@ def adaptive_introgression(seed):
# Draw mutation in DenA.
stdpopsim.ext.DrawMutation(
time=T_mut,
mutation_type_id=mut_id,
population_id=pop["DenA"],
coordinate=coordinate,
single_site_id=locus_id,
population="DenA",
),
# Because the drawn mutation is neutral at the time of introduction,
# it's likely to be lost due to drift. To avoid this, we condition on
Expand All @@ -86,8 +82,8 @@ def adaptive_introgression(seed):
# which will give an error due to "start_time < end_time".
start_time=stdpopsim.ext.GenerationAfter(T_mut),
end_time=T_Den_split,
mutation_type_id=mut_id,
population_id=pop["DenA"],
single_site_id=locus_id,
population="DenA",
op=">",
allele_frequency=0,
),
Expand All @@ -96,8 +92,8 @@ def adaptive_introgression(seed):
stdpopsim.ext.ConditionOnAlleleFrequency(
start_time=stdpopsim.ext.GenerationAfter(T_Den_split),
end_time=T_mig,
mutation_type_id=mut_id,
population_id=pop["Den1"],
single_site_id=locus_id,
population="Den1",
op=">",
allele_frequency=0,
),
Expand All @@ -106,8 +102,8 @@ def adaptive_introgression(seed):
stdpopsim.ext.ConditionOnAlleleFrequency(
start_time=stdpopsim.ext.GenerationAfter(T_mig),
end_time=0,
mutation_type_id=mut_id,
population_id=pop["Papuan"],
single_site_id=locus_id,
population="Papuan",
op=">",
allele_frequency=0,
),
Expand All @@ -117,17 +113,17 @@ def adaptive_introgression(seed):
stdpopsim.ext.ChangeMutationFitness(
start_time=T_sel,
end_time=0,
mutation_type_id=mut_id,
population_id=pop["Papuan"],
single_site_id=locus_id,
population="Papuan",
selection_coeff=s,
dominance_coeff=0.5,
),
# Condition on AF > 0.05 in Papuans at the end of the simulation.
stdpopsim.ext.ConditionOnAlleleFrequency(
start_time=0,
end_time=0,
mutation_type_id=mut_id,
population_id=pop["Papuan"],
single_site_id=locus_id,
population="Papuan",
op=">",
allele_frequency=0.05,
),
Expand Down
103 changes: 31 additions & 72 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1189,7 +1189,7 @@ but nonetheless there is lower diversity in exons than outside of them:
is preliminary, and subject to change!

You may be interested in simulating and tracking a single mutation. To illustrate
this scenario, let's simulate a selective sweep until it reaches an abitrary
this scenario, let's simulate a selective sweep until it reaches an arbitrary
allele frequency.

First, let's define a contig and a demographic model; here, we are simulating a
Expand All @@ -1212,34 +1212,21 @@ We must also decide the time the mutation will be added, when selection will
start and at what frequency we want our selected mutation to be at the end of
the simulation.

Let's assume the mutation appeared 1000 generations ago, it has a positive
effect on fitness (s=0.5). Also, we want the mutation to have reached a frequency
of at least 0.8 by the end. Next, we'll walk through the steps required to do this:

.. note::

Note that because we are doing a forward-in-time simulation, you should be
careful with your conditioning. For example, even a strongly selected mutation
would not be able to reach 80% frequency in just a few generations. Since
this conditioning works by re-running the simulation until the condition is
achieved, a nearly impossible condition will result in very long run times.

To set things up, we need to first add the site at which the selected mutation
will occur. This is like adding a DFE, except to a single site -- we're saying
that there is a potential mutation at a particular site with defined fitness
consequences. So that we can refer to the single site later, we give it a
unique string ID. Here, we'll add the site in the middle of the contig, with ID
"hard sweep".
First, we need to add the site at which the selected mutation will occur. This
is like adding a DFE, except to a single site -- we're saying that there is a
potential mutation at a particular site with defined fitness consequences. So
that we can refer to the single site later, we give it a unique string ID.
Here, we'll add the site in the middle of the contig with ID "hard sweep",
so named because we will imagine this beneficial mutation originates at
frequency :math:`1 / 2N`.

.. code-block:: python

mut_id = "hard sweep"
locus_id = "hard sweep"
coordinate = round(contig.recombination_map.sequence_length / 2)
contig.add_single_site(
id=mut_id,
id=locus_id,
coordinate=coordinate,
fitness_coeff=0.1,
dominance_coeff=0.5,
)

.. note::
Expand All @@ -1251,62 +1238,34 @@ unique string ID. Here, we'll add the site in the middle of the contig, with ID
"overwritten" and an error will be raised in simulation.

Next, we will set up the "extended events" which will modify the demography.
The first extended event is the origination of the selected mutation, which
will occur in a random individual from the first population (id 0), 1000
generations ago.
This is done through :func:`stdpopsim.ext.selective_sweep`, which represents a
general model for a mutation that is beneficial within a single population. We
specify that the mutation should originate 1000 generations ago in a random
individual from the first population (named "pop_0" by default); that the
selection coefficient for the mutation should be 0.5; and that the frequency of
the mutation in the present day (e.g. at the end of the sweep) should be
greater than 0.8.

.. code-block:: python

T_mut = 1000
extended_events = [
stdpopsim.ext.DrawMutation(
time=T_mut,
single_site_id=mut_id,
population_id=0,
)
]

Next, we condition on the mutation not being lost. Since in the next step we
condition on the mutation being at 80% frequency at the end, this is redundant,
but it allows the simulation to immediately restart from any generation in
which the mutation is lost, rather than waiting until the end. Note that this
conditioning must start one generation after the mutation is placed, for which
we use ``stdpopsim.ext.GenerationAfter(T_mut)``. We cannot simply specify
``T_mut - 1`` if rescaling is present, otherwise the conditioning would start
at the same generation when the mutation is placed.

.. code-block:: python

extended_events.append(
stdpopsim.ext.ConditionOnAlleleFrequency(
start_time=stdpopsim.ext.GenerationAfter(T_mut),
end_time=0,
single_site_id=mut_id,
population_id=0,
op=">",
allele_frequency=0.0,
)
extended_events = stdpopsim.ext.selective_sweep(
single_site_id=locus_id,
population="pop_0",
selection_coeff=0.5,
mutation_generation_ago=1000,
min_freq_at_end=0.8,
)

Finally, we condition on the mutation being above 80% at the end of the simulation.
(The "end" is at time 0, since "time" is in generations before the end of the simulation.)

.. code-block:: python
.. note::

extended_events.append(
stdpopsim.ext.ConditionOnAlleleFrequency(
start_time=0,
end_time=0,
single_site_id=mut_id,
population_id=0,
op=">=",
allele_frequency=0.8,
)
)
Note that because we are doing a forward-in-time simulation, you should be
careful with your conditioning. For example, even a strongly selected mutation
would not be able to reach 80% frequency in just a few generations. Since
this conditioning works by re-running the simulation until the condition is
achieved, a nearly impossible condition will result in very long run times.

Now we can simulate, using SLiM of course.
For comparison, we will run the same simulation
without selection - i.e., without the "extended events":
Now we can simulate, using SLiM of course. For comparison, we will run the
same simulation without selection - i.e., without the "extended events":

.. code-block:: python

Expand Down
Loading