In this repository, you can find the datasets, scripts, and plots used in my Bachelor’s thesis titled as “Fine-Scale Recombination Maps of The Cattle Genome Inferred by Linkage Disequilibrium”.
The goal of the project is to produce and compare fine-scale recombination maps in two Swiss breed cattle populations, Braunvieh and Fleckvieh.
In this study, we use an R package called LDJump to infer high-resolution breed-specific recombination maps for two cattle breeds. Given the two cattle breeds, it is questionable to see how the landscapes of their recombination maps differ from each other.
The detailed process of generating datasets is explained in the thesis document which can be found on GitHub stored as thesis.pdf.
-
Recombination Rate Estimates:
- Description: Recombination rates calculated using LDJump, stored as RDS files.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- License: Creative Commons Attribution License (CC-BY)
- Access Restrictions: There are no access restrictions at the moment.
- Data Format: RDS (R Data Serialization)
- Data Size: ~ 1Gb in total
- Data Source: The dataset is generated through applying LdJump on the genotyped data of Braunvieh and Fleckvieh available on Zenodo.
- Keywords: recombination rate, recombination map
- Additional Information:
- The recombination rates generated by LdJump are organized and stored on GitHub under a folder called RMPs. We used descriptive file names that reflect the breed’s name, inbreeding coefficient, genomic region, and additional information (neutrality or demography). To denote the breed name, we use the "bv_" prefix for Braunvieh and "fv_" prefix for Fleckvieh. The inbreeding coefficient takes three different values: “all”, “125” and “625”. “All” refers to a subset of data which includes all individuals on a given population while “125” and “625” refer to inbreeding coefficients of 0.125 and 0.0625 respectively. The genomic region part of the file name takes either “low” or “high” referring to two different genomic regions with the lowest and highest SNP density regions. The last part of the file name reflects the demography settings meaning that “demoT” denotes the estimation of recombination rates under demography while “demoF” denoted the estimation of recombination rates under neutrality.
-
GC Content Data:
- Description: GC content for a subset of chromosome 25, calculated using custom R scripts.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- License: Creative Commons Attribution License (CC-BY)
- Access Restrictions: There are no access restrictions at the moment.
- Data Format: txt
- Data Size: 16K
- Data Source: Reference genome for Bos Taurus taken from NCBI was used to generate the GC content densities for a subset of chromosome 25.
- Keywords: GC content density
- Additional Information:
- The GC content is stored on GitHub and is available under Correlation-To-GC-Content/gc_content_reference_genome.txt
-
SNP Density Data:
- Description: SNP densities for chromosome 25 in Braunvieh and Fleckvieh breeds, computed using custom R scripts.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- License: Creative Commons Attribution License (CC-BY)
- Access Restrictions: There are no access restrictions at the moment.
- Data Format: txt
- Data Size: ~ 400K
- Data Source: The dataset is generated through applying bioinformatics tools on the genotyped data of Braunvieh and Fleckvieh available on Zenodo.
- Keywords: SNP density
- Additional Information:
- The SNP density data for Braunvieh is stored on GitHub and is available under RMPs-and-SNP-density/bv_all_SNP_freqs_4k.snpden
- The SNP density data for Fleckvieh is stored on GitHub and is available under RMPs-and-SNP-density/fv_all_SNP_freqs_4k.snpden
-
Calculating SNP distributions:
- Description: This file describes the work process for obtaining the SNP-distribution Chapter 6.1 "SNP-distribution analysis along chromosome 25 of two cattle populations", which includes running VCF-tools on the original VCF-file, loading the data, transforming and visualizing in R
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: SNP distribution
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R, bash
- Dependencies: ggplot2, gridExtra, evobiR, devtools, vcftools
- Additional Information:
- The script is stored on GitHub under SNP-Distribution/git_snp_distribution.R
-
Calculating SNP densities:
- Description: This file describes the work process for obtaining the SNP-density Chapter 6.2 "Identification of the highest and lowest SNP-density regions of chromosome 25 for two cattle populations". The script includes running VCF-tools on the original VCF-file, loading the data, transforming and visualizing in R.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: SNP densities
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R, bash
- Dependencies: ggplot2, gridExtra, evobiR, devtools, vcftools
- Additional Information:
- The script is stored on GitHub under SNP-Density/git_snp_density.R
-
Calculating inbreeding coefficients:
- Description: This file describes the work process for subsetting the cattle-populations as described in Chapter 6.3 "Estimation of recombination rates under neutrality and demography using genotyped cattle data", which includes: running PLINK on the orgininal VCF-file, loading the data, subsetting groups.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: Calculating inbreeding coefficients
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R, bash
- Dependencies: PLINK
- Additional Information:
- The script is stored on GitHub under Cattle-Subsets/git_cattle_subsets.R
-
Comparison of recombination maps between two breeds:
- Description: This file describes the work process for Chapter 6.4 "Comparison of recombination patterns between two cattle breeds". Here, we compare the recombination maps of both populations with each other for different inbreeding coefficients.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: recombination map
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R
- Dependencies: ggplot2, cowplot
- Additional Information:
- The script is stored on GitHub under RMPs-Comparison/git_RMPs_comparison.R
-
Finding shared recombination patterns between two breeds:
- Description: This file describes the work process for Chapter 6.5 "Shared recombination patterns between breeds"). Here, we plot shared recombination patterns by using the combined dataset. The combined recombination map is plotted along with the individual recombination maps.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: recombination map
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R
- Dependencies: ggplot2, cowplot
- Additional Information:
- The script is stored on GitHub under RMPs-Shared/git_RMPs_shared.R
-
GC Content Visualization:
- Description: This R script describes the work process for Chapter 6.8 "Correlation between recombination rate and GC content". In this script, we plot and test the correlation between GC content and recombination rates.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: GC Content Visualization
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R
- Dependencies: ggplot2, cowplot
- Additional Information:
- The script is stored on GitHub under Correlation-To-GC-Content/git_Correlation_to_gc_content.R
-
SNP Density Visualization:
- Description: This R script describes the work process for Chapter 6.7 "Correlation between recombination rate and SNP density".
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: SNP Density Visualization
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R
- Dependencies: ggplot2, cowplot
- Additional Information:
- The script is stored on GitHub under Correlation-To-SNP-Density/git_correlation_to_SNP_density.R
-
Recombination rate Visualization:
- Description: This R script describes the work process for Chapter 6.3 "Estimation of recombination rates under neutrality and demography using genotyped cattle data". We will plot the recombination maps for all three groups for both cow populations on top of each other, where we want to compare demography settings equal to TRUE and FALSE.
- Creator: Fardokht Sadat Mohammadi
- Affiliation: Single Cell Genetics Lab at Johannes Kepler University
- Contact Information: fardokht.fm@gmail.com
- Keywords: Recombination rate Visualization
- License: MIT license
- Access Restrictions: There are no access restrictions at the moment.
- Language: R
- Dependencies: ggplot2, cowplot
- Additional Information:
- The script is stored on GitHub under RMPs/git_RMPs.R