Skip to content

Latest commit

 

History

History
18 lines (9 loc) · 2.91 KB

expected_outcome.md

File metadata and controls

18 lines (9 loc) · 2.91 KB

###Point 1 -- Expected Outcome

With this project we aim at accelerating breeding processes by enhancing in-silico productivity as well as computational interchangeability with key analytical tools that inform the breeding process. The end result will be an automated variant calling pipeline that allows users to integrate research data and visualise the associated phenotype distributions within an established interpretive context. To achieve this goal, a process is needed that validates and scrutinises current variant calling pipelines in an automated as well as easy to use manner. This will encompass employing the latest technology in variant calling as well as automation and cloud computing techniques. We intend to deliver acceleration by producing an automated framework for evaluating existing variant call pipelines and the analytic outputs they provide given input data as generated by researchers, as well as exemplar datasets \cite{Torkamaneh_16}. We intend to construct an interpretive context so that genetic inferences can be made with statistical power. Moreover, we intend to identify the strengths and weaknesses of the most commonly used tools so that researchers can make informed choices when analysing their data. Providing scientists and breeders with accurate and contextualized informatics will contribute to identifying more robust genetic markers and will facilitate faster and more efficient selection of of new cultivars using DNA-informed breeding.

The above describes the 'thing' that we'll be making. The expected outcomes should be more than that. e.g. improved accuracy of genetic analyses based on better marker data -- extrapolate from here. --Rob I had a go at rewording the last sentence to address this. David

Another attempt to polish this point. Charles

We have three use cases to ensure we have the volume and variety of data to trail and stress the framework and ensure practical utility.

Case#1: variant call optimisation for genomic selection in apple and snapper. PFR is using reduced representation genotyping by sequencing (GBS) for genomic selection in apple and snapper. Large datasets have been generated using GBS, as well as whole genome re-sequencing of the parents and founders of the breeding populations. (I also have some populations with both 8k SNP array and GBS)

Case#2: pooled whole genome sequencing in manuka. As part of a joint MBIE programme with LandCare Research, we are using whole genome re-sequencing of manuka accessions collected across the country. DNA from 30 accessions at 40 sites will be pooled and sequenced to estimate allelic frequency within and among populations.

Case#3: skim sequencing: how shallow can we go? We have whole re-sequencing data for apple and kiwifruit parents from the breeding programme at >30X read depth. We can use this data set to examine the variant call accuracy at lower depth, with the vision of replacing GBS by skim sequencing in the future.