SARS-CoV-2 genomic surveillance in Germany

This repository contains a join of the metadata and pango lineage tables of all German SARS-CoV-2 sequences published by the Robert-Koch-Institut on Github.

The resulting dataset can be downloaded here, beware it's currently around 50MB in size: https://raw.githubusercontent.com/corneliusroemer/desh-data/main/data/meta_lineages.csv

The analysis uses the genomicsurveillance python package. The main file is genomicsurveillance.ipynb.

Current share (nowcast)

Share by state over time and extrapolation

Estimated growth advantage

This shows the growth advantage over BA.5. It is only related to the relative share of variants and assumed to be fixed over time. Variation between states (dots) is typically low.

Estimated growth rate

This shows the growth rate of observed cases by lineage and by state. It varies over time as the overall growth rate changes in each state.

Absolute cases by state

Description of data

Column description:

IMS_ID: Unique identifier of the sequence
DATE_DRAW: Date the sample was taken from the patient
SEQ_REASON: Reason for sequencing, one of:
- X: Unknown
- N: Random sampling
- Y: Targeted sequencing (exact reason unknown)
- A[<reason>]: Targeted sequencing because variant PCR indicated VOC
PROCESSING_DATE: Date the sample was processed by the RKI and added to Github repo
SENDING_LAB_PC: Postcode (PLZ) of lab that did the initial PCR
SEQUENCING_LAB_PC: Postcode (PLZ) of lab that did the sequencing
lineage: Pango lineage as reported by pangolin
scorpio_call: Alternative, rough, variant as determined by scorpio (part of pangolin), this is less precise but a bit more robust than pangolin.

Excerpt

Here are the first 10 lines of the dataset.

IMS_ID,DATE_DRAW,SEQ_REASON,PROCESSING_DATE,SENDING_LAB_PC,SEQUENCING_LAB_PC,lineage,scorpio_call
IMS-10294-CVDP-00001,2021-01-14,X,2021-01-25,40225,40225,B.1.1.297,
IMS-10025-CVDP-00001,2021-01-17,N,2021-01-26,10409,10409,B.1.389,
IMS-10025-CVDP-00002,2021-01-17,N,2021-01-26,10409,10409,B.1.258,
IMS-10025-CVDP-00003,2021-01-17,N,2021-01-26,10409,10409,B.1.177.86,
IMS-10025-CVDP-00004,2021-01-17,N,2021-01-26,10409,10409,B.1.389,
IMS-10025-CVDP-00005,2021-01-18,N,2021-01-26,10409,10409,B.1.160,
IMS-10025-CVDP-00006,2021-01-17,N,2021-01-26,10409,10409,B.1.1.297,
IMS-10025-CVDP-00007,2021-01-18,N,2021-01-26,10409,10409,B.1.177.81,
IMS-10025-CVDP-00008,2021-01-18,N,2021-01-26,10409,10409,B.1.177,
IMS-10025-CVDP-00009,2021-01-18,N,2021-01-26,10409,10409,B.1.1.7,Alpha (B.1.1.7-like)
IMS-10025-CVDP-00010,2021-01-17,N,2021-01-26,10409,10409,B.1.1.7,Alpha (B.1.1.7-like)
IMS-10025-CVDP-00011,2021-01-17,N,2021-01-26,10409,10409,B.1.389,

Suggested import into pandas

You can import the data into pandas as follows:

#%%
import pandas as pd

#%%
df = pd.read_csv(
    'https://raw.githubusercontent.com/corneliusroemer/desh-data/main/data/meta_lineages.csv',
    index_col=0,
    parse_dates=[1,3],
    infer_datetime_format=True,
    cache_dates=True,
    dtype = {'SEQ_REASON': 'category',
             'SENDING_LAB_PC': 'category',
             'SEQUENCING_LAB_PC': 'category',
             'lineage': 'category',
             'scorpio_call': 'category'
             }
)
#%%
df.rename(columns={
    'DATE_DRAW': 'date',
    'PROCESSING_DATE': 'processing_date',
    'SEQ_REASON': 'reason',
    'SENDING_LAB_PC': 'sending_pc',
    'SEQUENCING_LAB_PC': 'sequencing_pc',
    'lineage': 'lineage',
    'scorpio_call': 'scorpio'
    },
    inplace=True
)
df

License

The underlying files that I use as input are licensed by RKI under CC-BY 4.0, see more details here: https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland#lizenz.

The software here is licensed under the "Unlicense". You can do with it whatever you want.

For the data, just cite the original source, no need to cite this repo since it's just a trivial join.

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
data		data
lib		lib
plots		plots
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
genomicsurveillance		genomicsurveillance
genomicsurveillance-lte.ipynb		genomicsurveillance-lte.ipynb
genomicsurveillance.ipynb		genomicsurveillance.ipynb
numpyro_requirements.txt		numpyro_requirements.txt
plot_requirements.txt		plot_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARS-CoV-2 genomic surveillance in Germany

Current share (nowcast)

Share by state over time and extrapolation

Estimated growth advantage

Estimated growth rate

Absolute cases by state

Description of data

Excerpt

Suggested import into pandas

License

About

Releases

Packages

Languages

License

gerstung-lab/SARS-CoV-2-Germany

Folders and files

Latest commit

History

Repository files navigation

SARS-CoV-2 genomic surveillance in Germany

Current share (nowcast)

Share by state over time and extrapolation

Estimated growth advantage

Estimated growth rate

Absolute cases by state

Description of data

Excerpt

Suggested import into pandas

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages