Evolutionary trajectory of SARS-CoV-2 genome shifts during widespread vaccination and emergence of Omicron variant
Understanding the adaptation of SARS-CoV-2 is critical for the development of effective treatments against this exceptionally successful human pathogen. To predict the emergence of new variants that may escape host immunity or increase virulence, it is important to characterize the biological forces driving its evolution. We conducted a comprehensive population genetic study of over thirteen million SARS-CoV-2 genome sequences, collected over a timeframe of approximately three years, to investigate these forces. Our analysis revealed that during the first year of the pandemic (2020 to 2021), the SARS-CoV-2 genome was subject to strong conservation. However, we observed a sharp increase in the diversification of the RBD during 2021, indicating selective pressures that promote the accumulation of mutations. This period coincided with broad viral infection and adoption of vaccination worldwide, and we observed the acquisition of mutations that later defined the Omicron lineages in independent SARS-CoV-2 strains, suggesting that diversifying selection at these sites could have led to their fixation in Omicron lineages by convergent evolution. Since the emergence of Omicron, we observed a further decrease in the conservation of structural genes, including M, N, and the spike proteins, and identified new sites defining future potential emerging strains. Our results exhibit that ongoing rapid antigenic evolution continues to produce new high-frequency functional variants. Sites under selection are critical for virus fitness, and currently known T cell epitope sequences are highly conserved. Altogether, our study provides a comprehensive dynamic map of all sites under selection and conservation across the entirety of the SARS-CoV-2 genome.
We have provided notebooks and data necessary to replicate manuscript figures.
Gayvert et al. "Viral population genomics reveals host and infectivity impact on SARS-CoV-2 adaptive landscape". bioRXiv 2021.
License can be found in the file LICENSE.