This readme provides the information required to reproduce the results. Please contact support@pacb.com with any questions.
- HG002 was sequenced on the Revio system with SPRQ chemistry, yielding 146 Gbp. The reads were aligned with pbmm2 v1.13.1 and downsampled from 8-fold to 40-fold coverage aligned depth for variant calling and benchmarking.
- Small variants were called with DeepVariant 1.6.1.
- Structural variants were called with Sawfish 0.12.4.
- Link to root directory
Depth | Type | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
---|---|---|---|---|---|---|---|---|---|---|---|---|
20-fold | SNP | 3365127 | 3356805 | 8322 | 4143790 | 2255 | 779837 | 1225 | 0.997527 | 0.99933 | 0.188194 | 0.998428 |
30-fold | SNP | 3365127 | 3362495 | 2632 | 4177463 | 1152 | 808868 | 492 | 0.999218 | 0.999658 | 0.193627 | 0.999438 |
20-fold | INDEL | 525469 | 513484 | 11985 | 956105 | 9346 | 414296 | 5286 | 0.977192 | 0.98275 | 0.433316 | 0.979963 |
30-fold | INDEL | 525469 | 520114 | 5355 | 971125 | 4625 | 426619 | 2506 | 0.989809 | 0.991506 | 0.439304 | 0.990657 |
Depth | Recall | Precision | F1-score |
---|---|---|---|
9.72 | 0.8804 | 0.9900 | 0.9320 |
11.67 | 0.9072 | 0.9895 | 0.9466 |
13.61 | 0.9229 | 0.9894 | 0.9550 |
15.56 | 0.9356 | 0.9894 | 0.9618 |
17.50 | 0.9411 | 0.9891 | 0.9645 |
19.45 | 0.9463 | 0.9891 | 0.9672 |
24.31 | 0.9525 | 0.9882 | 0.9701 |
29.17 | 0.9560 | 0.9885 | 0.9720 |
34.04 | 0.9586 | 0.9883 | 0.9733 |
38.90 | 0.9608 | 0.9882 | 0.9743 |
TP_base | TP_comp | FP | FN | Recall | Precision | F1-score |
---|---|---|---|---|---|---|
22557 | 21062 | 232 | 1281 | 0.9463 | 0.9891 | 0.9672 |
- HG002 DRAGEN variant call sets were obtained from 10.5281/zenodo.8350255 (DRAGEN 4.2.1/4.2.4) and S3 bucket (
s3://human-pangenomics/publications/PANGENOME_2022/DeepVariant/samples/HG002
) (DRAGEN 3.7.5) - Small variant calls:
- DRAGEN 4.2.1, 35-fold depth: HG002_35x.hard-filtered.vcf.gz (md5sum
388f58faa52a8811fe19b06533d2c3d5
) - DRAGEN 3.7.5, 30-fold depth: HG002.30x_novaseq_pcrfree.dragen.vcf.gz (md5sum
cf2c302a99b96e1e4806cb644524357c
)
- DRAGEN 4.2.1, 35-fold depth: HG002_35x.hard-filtered.vcf.gz (md5sum
- Structural variant calls:
- DRAGEN 4.2.4: HG002_35x.sv.vcf.gz (md5sum
760b9c5c295fc82b045f83ed15e524a9
)
- DRAGEN 4.2.4: HG002_35x.sv.vcf.gz (md5sum
DRAGEN version | Depth | Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3.7.5 | 30-fold | SNP | PASS | 3365127 | 3353531 | 11596 | 4042134 | 14752 | 672953 | 3869 | 0.996554 | 0.995621 | 0.166485 | 0.996088 |
4.2.1 | 35-fold | SNP | PASS | 3365127 | 3357852 | 7275 | 3849974 | 1860 | 489362 | 985 | 0.997838 | 0.999447 | 0.127108 | 0.998642 |
3.7.5 | 30-fold | INDEL | PASS | 525469 | 521874 | 3595 | 995996 | 3500 | 448346 | 1869 | 0.993158 | 0.993609 | 0.450148 | 0.993384 |
4.2.1 | 35-fold | INDEL | PASS | 525469 | 524141 | 1328 | 980875 | 721 | 433435 | 474 | 0.997473 | 0.998683 | 0.441886 | 0.998077 |
TP_base | TP_comp | FP | FN | Recall | Precision | F1-score |
---|---|---|---|---|---|---|
9243 | 8454 | 247 | 13549 | 0.4055 | 0.9716 | 0.5722 |
- HG002 variant call sets from 60-fold aligned sup-basecall reads were downloaded from the s3 bucket associated with this EPI2ME post.
- Small variants were called by Clair3 1.0.0:
hg002.wf_snp.vcf.gz
(s3://ont-open-data/giab_2023.05/analysis/variant_calling/hg002_sup_60x/hg002.wf_snp.vcf.gz
, md5sumfa2111cdeb4959e1ed1cfe402d128c39
) - Structural variants were called by Sniffles2 2.0.7:
hg002.wf_sv.vcf.gz
(s3://ont-open-data/giab_2023.05/analysis/variant_calling/hg002_sup_60x/hg002.wf_snp.vcf.gz
, md5sumcd185fb011345e702f7eb2ba7a19213b
)
Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP | PASS | 3365127 | 3357580 | 7547 | 4418637 | 4925 | 1054410 | 1166 | 0.997757 | 0.998536 | 0.238628 | 0.998147 |
INDEL | PASS | 525469 | 453173 | 72296 | 776968 | 24911 | 283861 | 8400 | 0.862416 | 0.949482 | 0.365345 | 0.903857 |
TP_base | TP_comp | FP | FN | Recall | Precision | F1-score |
---|---|---|---|---|---|---|
21087 | 18649 | 243 | 1743 | 0.9237 | 0.9871 | 0.9543 |
- see SV readme
- for benchmarking steps using truevari please also see: Saunders, et al. bioRxiv, 2024