Skip to content

Simulating complex pollen dispersal and mating to Simcoal2 simulations

Notifications You must be signed in to change notification settings

HobanLab/Pollen_dispersal_sims

Repository files navigation

Pollen_dispersal_sims

Project repository for the manuscript titled: Simulating pollen dispersal and realistic field sampling constraints helps revise seed sampling recommendations for conserving genetic diversity (submitted)

In collaboration by Kaylee Rosenberger and Sean Hoban

Overview

Genetic simulations have been used to develop seed sampling strategies to inform botanic gardens and arboreta on how to conserve specie genetic diversity effectively and efficiently in ex situ collections. These informed sampling strategies are one way of ensuring that a genetically diverse and representative sample has been collected, to be conserved for future use in restoration efforts. However, previous simulations model a simplified version of mating and pollen dispersal compared to reality. For example, in previous simulations, panmictic mating is assumed. In reality, for many species, the individuals closest in proximity to another will donate the majority of the pollen, so it is more spatially restricted than the assumption. Furthermore, previous seed sampling strategies in simulations assumed that one seed be sampled from one maternal plant, but in reality, collectors will often sample many seeds from a given plant. While it has been shown that sampling as many unique maternal trees as possible will result in more genetically diverse collections, this is often not feasible or realistic due to logistical constraints. We hypothesized that these two factors (unrealistic mating and sampling assumptions) may influence the genetic conservation of a given sample size.

Here, we created more biologically realistic simulations of pollination systems to determine the impact of different pollen dispersal types on the diversity conserved in a given sampling strategy. Additionally, we modeled sampling strategies that are closer to the reality that conservation seed samplers often face and determine both the relative impact of sampling more than one seed per plant and how sampling an unequal number of seeds per plant impacts the diversity captured with a given sample size.

Approach

We followed the general approach: (1) simulate the genetics of a generic plant species, (2) apply sampling functions to this dataset, such that we can create seeds from different pollination, seed sampler decisions etc., (3) each time we sample, compare the genetic diversity in our seed set to the total genetic diversity available, (4) compare efficiency of genetic diversity conserved among the different situations e.g. 5 seeds and 10 mothers vs. 10 seeds and 5 mothers, or skewed sampling or pollination and non skewed sampling or pollination.

Here, we simulated two hypothetical species using the software Simcoal 2–one species with a single population of size 2500 individuals, and one species with two populations each of size 2500. These hypothetical species are simple models that we use to answer our broad question and to make it applicable to a wide range of taxa. The two hypothetical species are of medium sizes and moderately rare. We use two versions to show that the results are robust to multiple populations.

To increase the complexity of mating, pollen dispersal, and seed creation that is represented in base Simcoal 2, we pass the simulation output files to a custom function that randomly selects maternal trees and pollen donors to create a "seed." To create a seed, the function randomly selects alleles from each parent for seeds to inherit. In other words, the function creates seed sets from selected trees, and samples them to test different seed sampling strategies in a more realistic method. The function we have written allows for many different combinations of parameters to be passed to the function–-parameters include the number of maternal trees, pollen donors, and pollen donation probability. This allows for testing many different scenarios. We define a scenario as different combinations of parameters that are passed to the function.

Function parameters and assumptions

We created different scenarios to pass to the seed creation function. The scenarios are different combinations of parameters--parameters include the number of maternal trees collectors will sample from, the number of seeds to be sampled from each tree, the pollen donor notes, and the total seeds collected. In this project, we defined three different 'types' of pollen donors. First, we simulate a scenario of random, population-wide mating that is similar to the assumptions of previous studies. In this scenario (referred to from here as ‘all eligible’), all individuals in the population have some probability of pollinating a given maternal plant. Next, we simulate a scenario which may better represent pollen dispersal in reality–the closest plants spatially to a given maternal plant are most likely to pollinate that plant. Here, the number of potential donors for a given maternal plant is restricted to a max of 10 potential donors–one with 60% chance to donate pollen, one with 20% chance to donate, 3 with 5% chance to donate, 5 with 1% chance (hence, we refer to this scenario as ‘skewed’). Lastly, we model a scenario representing an extreme version of reality, where there is only one potential pollen donor for a given maternal plant (referred to as ‘single’).

We also incorporate two types of scenarios of seed collecting--ideal and realistic. In the ideal scenarios, we implement a simple approach where an equal number of seeds are sampled from every tree. This is a simplified version of reality, so we implement a more realistic approach where the number of seeds sampled from each tree varies. These scenarios representing a seed collector taking many seeds from one tree, and taking fewer from other trees. In reality, some trees in a population have very high reproductive output, while others only make a few seeds.

Directory contents

R project: this file keeps all directory contents contained and easy to access. it standardizes most file paths as relative paths from the root directory (though some external functions have issues with it). makes code easier for others to use!
Simulations: contains simulation parameter and output files representing a hypothetical species
    one_pop_2500: files represent a hypothetical species with one population of size 2500
    two_pop_2500: files represent a hypothetical species with two populations each of size 2500
R-scripts: contains R scripts used for data importing and processing, and sampling scripts
    0_arp2gen_edit.R: edited version of the function arp2gen
    0_defining_function_parameters.R: this script makes lists that containing sets of function parameters to be passed to the sample_seed function.
    0_import_seed_functions.R: this script defines the functions used in the script 1_generate_data.R. There are multiple functions--some import and convert data into more usable file types (genalex). The main function in this loop (sample_seed) imports a genetic data and creates new seed sets. The number of pollen donors, number of seeds sampled per tree, and number of trees to sample from are taken as function parameters, making the inputs highly customizable.
    1_generate_data.R: this script loops over simulation replicates and calls the functions defined in the 0_import_seed_functions.R script to run the functions with varying inputs, defined in 0_defining_function_parameters.R to generate the proportion of alleles conserved in seed sets by each sampling scenario.
    2_data_prep.Rmd: this script prepares the data into tidy format for the linear model in later scripts
    3_linear_model_ideal.Rmd: this script runs the linear model for ideal sampling scenarios and plots the model fit
    3_linear_model_realistic.Rmd: this script runs the linear model for realistic sampling scenarios and plots the model fit. The same model is run for both, but the parsing of the model data frames are different due to the different number of ideal vs. realistic scenarios
    4_figures.Rmd: this script generates figures used in the manuscript, including some additional visualizations
    4_figures_twopop: this script generates figures for simulations with two populations     5_t_tests.Rmd: this script runs t tests to determine significant differences between ideal and realistic sampling scenarios
Conceptual-figures: contains conceptual figures made in the software BioRender

About

Simulating complex pollen dispersal and mating to Simcoal2 simulations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages