-
Notifications
You must be signed in to change notification settings - Fork 60
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ACTIN-46: Make health checker component for WTS (#497)
Add's new stand-alone component called CREST that checks that WTS samples are correctly matched to the same patient as the WGS sample.
- Loading branch information
1 parent
7213faa
commit 2cf8af0
Showing
11 changed files
with
519 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Crest - Check Reference Equality to Sample Transcriptome | ||
|
||
To ensure that WTS samples are correctly matched to the same patient as | ||
the WGS sample, Crest performs a simple test on a multi-sample VCF to ensure | ||
that 90% of the germline SNPs have support in the specified RNA sample. | ||
|
||
The input is assumed to be a germline VCF annotated with RNA calls. The | ||
thresholds applied are described below and can be adjusted by the user. | ||
Only SNPs impacting a gene with filters PASSED are counted. | ||
|
||
The computed ratio of RNA supported to total reads is written to log output, | ||
and a flag file "{sample}.CrestCheckSucceeded" or "{sample}.CrestCheckFailed" | ||
is written to the output directory for use in multi-step pipelines. | ||
|
||
## Example usage | ||
|
||
```bash | ||
$ java -jar crest.jar -purple_dir /path/to/purple -sample COLO829v003T -rna_sample COLO829v003T_RNA | ||
``` | ||
|
||
This assumes standard layout of the purple directory, with the wgs sample having been overwritten by | ||
the sage annotated version. The vcf file purple/COLO829v003T.purple.germline.vcf.gz is assumed | ||
to exist and will be examined. | ||
|
||
## Parameters | ||
|
||
| Parameter | Description | Default | | ||
|-------------------|--------------------------------------------------------------------------------------------|-----------------------| | ||
| purple_dir | Location of annotated vcf | | | ||
| sample | Name of the WGS sample, used to construct the VCF filename | | | ||
| rna_sample | The name of the RNA sample in the vcf to be examined | | | ||
| do_not_write_file | If given, the output .CrestCheck flag file is not produced | false if not provided | | ||
| min_total_reads | Min number of reads at SNP in the RNA sample to count towards total | 10 | | ||
| min_rna_reads | Min number of reads at SNP matching the variant allele in RNA sample to count as supported | 1 | | ||
| acceptance_ratio | Lower threshold on ratio of rna supported / total reads for test to pass | 0.90 | | ||
| output_dir | Directory in which to write .CrestCheck flag file | | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<parent> | ||
<artifactId>hmftools</artifactId> | ||
<groupId>com.hartwig</groupId> | ||
<version>local-SNAPSHOT</version> | ||
</parent> | ||
|
||
<artifactId>crest</artifactId> | ||
<packaging>jar</packaging> | ||
<version>${crest.version}</version> | ||
<name>HMF Tools - Crest</name> | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>com.hartwig</groupId> | ||
<artifactId>hmf-common</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.immutables</groupId> | ||
<artifactId>value</artifactId> | ||
<scope>provided</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>junit</groupId> | ||
<artifactId>junit</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
</dependencies> | ||
|
||
<build> | ||
<resources> | ||
<resource> | ||
<directory>src/main/resources</directory> | ||
<filtering>true</filtering> | ||
</resource> | ||
</resources> | ||
<plugins> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-assembly-plugin</artifactId> | ||
<executions> | ||
<execution> | ||
<phase>package</phase> | ||
<goals> | ||
<goal>single</goal> | ||
</goals> | ||
</execution> | ||
</executions> | ||
<configuration> | ||
<archive> | ||
<manifest> | ||
<addClasspath>true</addClasspath> | ||
<mainClass>com.hartwig.hmftools.crest.CrestApplication</mainClass> | ||
<addDefaultImplementationEntries>true</addDefaultImplementationEntries> | ||
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries> | ||
</manifest> | ||
</archive> | ||
|
||
<descriptorRefs> | ||
<descriptorRef>jar-with-dependencies</descriptorRef> | ||
</descriptorRefs> | ||
</configuration> | ||
</plugin> | ||
|
||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-compiler-plugin</artifactId> | ||
<configuration> | ||
<source>${maven.compiler.source}</source> | ||
<target>${maven.compiler.target}</target> | ||
</configuration> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
|
||
</project> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Create a minimal vcf for testing | ||
|
||
from dataclasses import dataclass | ||
|
||
header = '''##fileformat=VCFv4.2 | ||
##FILTER=<ID=LOW_TUMOR_VCN,Description="Germline variant has very low tumor variant copy number"> | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed"> | ||
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Allelic frequency calculated from read context counts as (Full + Partial + Core + Realigned + Alt) / Coverage"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> | ||
##INFO=<ID=IMPACT,Number=10,Type=String,Description="Variant Impact [Gene, Transcript, CanonicalEffect, CanonicalCodingEffect, SpliceRegion, HgvsCodingImpact, HgvsProteinImpact, OtherReportableEffects, WorstCodingEffect, GenesAffected]"> | ||
##INFO=<ID=PURPLE_VCN,Number=1,Type=Float,Description="Purity adjusted variant copy number"> | ||
##INFO=<ID=DEVELOPER_COMMENT,Number=1,Type=String,Description="Developer Comment"> | ||
##contig=<ID=17,length=81195210> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor_sample ref_sample rna_sample | ||
''' | ||
|
||
|
||
@dataclass | ||
class Variant: | ||
chr: int | ||
pos: int | ||
ref: str | ||
alt: str | ||
filter: str # e.g. "PASS", "LOW_TUMOR_VCN" | ||
gene: str | ||
ref_reads: int | ||
allele_reads: int | ||
total_reads: int | ||
comment: str | ||
|
||
def to_row(self) -> str: | ||
fields = ( | ||
self.chr, self.pos, ".", self.ref, self.alt, | ||
500, # qual | ||
self.filter, | ||
self.info(), | ||
"GT:AD:AF:DP", # format | ||
self.tumor_sample(), | ||
self.ref_sample(), | ||
self.rna_sample(), | ||
) | ||
return '\t'.join((str(f) for f in fields)) + '\n' | ||
|
||
def info(self) -> str: | ||
if self.filter == "LOW_TUMOR_VCN": | ||
vcn = 0 | ||
else: | ||
vcn = 1 | ||
coding_effect = "NONE" | ||
worst_coding_effect = "NONE" | ||
comment = self.comment.replace(' ', '_') | ||
return f"IMPACT={self.gene},,,{coding_effect},,,,,{worst_coding_effect},1;PURPLE_VCN={vcn};DEVELOPER_COMMENT={comment}" | ||
|
||
def tumor_sample(self) -> str: | ||
return f"./.:0,100:1.0:100" | ||
|
||
def ref_sample(self) -> str: | ||
return f"1/1:0,30:1.0:30" | ||
|
||
def rna_sample(self) -> str: | ||
AD = f"{self.ref_reads},{self.allele_reads}" | ||
DP = f"{self.total_reads}" | ||
|
||
if self.total_reads > 0: | ||
AF = self.allele_reads / self.total_reads | ||
else: | ||
AF = 0.0 | ||
return f"./.:{AD}:{AF}:{DP}" | ||
|
||
def write_vcf(filename, data): | ||
with open(filename, "w") as f: | ||
f.write(header) | ||
|
||
for record in data: | ||
f.write(record.to_row()) | ||
|
||
if __name__ == '__main__': | ||
|
||
data = [ | ||
Variant(17, 7579472, 'G', 'C', 'PASS', 'TP53', 48, 32, 80, "counted"), | ||
Variant(17, 7579473, 'G', 'C', 'LOW_TUMOR_VCN', 'TP53', 48, 0, 80, "not counted filter fail"), | ||
Variant(17, 7579474, 'G', 'C', 'PASS', '', 48, 32, 80, "not counted no gene impact"), | ||
Variant(17, 7579475, 'G', 'CC', 'PASS', 'TP53', 48, 32, 80, "not counted not a SNP"), | ||
Variant(17, 7579476, 'G', 'C', 'PASS', 'TP53', 48, 0, 80, "counted for total but not allele"), | ||
Variant(17, 7579477, 'G', 'C', 'PASS', 'TP53', 4, 1, 5, "not counted not enough total reads"), | ||
] | ||
|
||
write_vcf("minimal.vcf", data) |
158 changes: 158 additions & 0 deletions
158
crest/src/main/java/com/hartwig/hmftools/crest/CrestAlgo.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
package com.hartwig.hmftools.crest; | ||
|
||
import static com.hartwig.hmftools.common.utils.file.FileWriterUtils.checkAddDirSeparator; | ||
|
||
import static htsjdk.tribble.AbstractFeatureReader.getFeatureReader; | ||
|
||
import java.io.FileOutputStream; | ||
import java.io.IOException; | ||
|
||
import com.hartwig.hmftools.common.purple.PurpleCommon; | ||
import com.hartwig.hmftools.common.utils.version.VersionInfo; | ||
import com.hartwig.hmftools.common.variant.AllelicDepth; | ||
import com.hartwig.hmftools.common.variant.VariantContextDecorator; | ||
import com.hartwig.hmftools.common.variant.VariantType; | ||
|
||
import org.apache.logging.log4j.LogManager; | ||
import org.apache.logging.log4j.Logger; | ||
import org.jetbrains.annotations.NotNull; | ||
import org.jetbrains.annotations.Nullable; | ||
|
||
import htsjdk.tribble.AbstractFeatureReader; | ||
import htsjdk.tribble.readers.LineIterator; | ||
import htsjdk.variant.variantcontext.VariantContext; | ||
import htsjdk.variant.vcf.VCFCodec; | ||
import htsjdk.variant.vcf.VCFHeader; | ||
|
||
public class CrestAlgo | ||
{ | ||
private static final Logger LOGGER = LogManager.getLogger(CrestAlgo.class); | ||
|
||
@NotNull | ||
private final String purpleDir; | ||
@Nullable | ||
private final String outputDir; | ||
@NotNull | ||
private final String sampleId; | ||
@NotNull | ||
private final String sampleToCheck; | ||
|
||
private final int minTotalReads; | ||
private final int minRnaReads; | ||
private final double acceptanceRatio; | ||
private final boolean doNotWriteFile; | ||
|
||
public CrestAlgo(@NotNull final String purpleDir, @Nullable final String outputDir, | ||
@NotNull final String sampleId, @NotNull final String sampleToCheck, | ||
final int minTotalReads, final int minRnaReads, final double acceptanceRatio, | ||
final boolean doNotWriteFile) | ||
{ | ||
this.purpleDir = purpleDir; | ||
this.outputDir = outputDir; | ||
this.sampleId = sampleId; | ||
this.sampleToCheck = sampleToCheck; | ||
this.minTotalReads = minTotalReads; | ||
this.minRnaReads = minRnaReads; | ||
this.acceptanceRatio = acceptanceRatio; | ||
this.doNotWriteFile = doNotWriteFile; | ||
} | ||
|
||
void run() throws IOException | ||
{ | ||
logVersion(); | ||
logParams(); | ||
|
||
String rnaAnnotatedGermlineVcf = PurpleCommon.purpleGermlineVcfFile(purpleDir, sampleId); | ||
LOGGER.info("Checking file: {}", rnaAnnotatedGermlineVcf); | ||
|
||
boolean success = crestCheck(rnaAnnotatedGermlineVcf); | ||
|
||
if(success) | ||
{ | ||
LOGGER.info("Check succeeded"); | ||
} | ||
else | ||
{ | ||
LOGGER.error("Check failed, ratio of supported reads is below threshold"); | ||
} | ||
|
||
if(!doNotWriteFile) | ||
{ | ||
String outputFilename = getOutputFilename(success); | ||
LOGGER.info("Writing file: {}", outputFilename); | ||
new FileOutputStream(outputFilename).close(); | ||
} | ||
} | ||
|
||
public boolean crestCheck(@NotNull String vcfFile) throws IOException | ||
{ | ||
double supportRatio = computeRnaSupportRatio(vcfFile); | ||
return supportRatio >= acceptanceRatio; | ||
} | ||
|
||
public double computeRnaSupportRatio(@NotNull String vcfFile) throws IOException | ||
{ | ||
int supported = 0; | ||
var total = 0; | ||
|
||
try(AbstractFeatureReader<VariantContext, LineIterator> reader = getFeatureReader(vcfFile, new VCFCodec(), false)) | ||
{ | ||
final VCFHeader header = (VCFHeader) reader.getHeader(); | ||
if(!sampleInFile(sampleToCheck, header)) | ||
{ | ||
throw new RuntimeException("Sample " + sampleToCheck + " not found in file " + vcfFile); | ||
} | ||
|
||
for(VariantContext context : reader.iterator()) | ||
{ | ||
VariantContextDecorator decorator = new VariantContextDecorator(context); | ||
|
||
if(decorator.isPass() && decorator.type() == VariantType.SNP && !decorator.gene().isEmpty()) | ||
{ | ||
AllelicDepth rnaDepth = decorator.allelicDepth(sampleToCheck); | ||
if(rnaDepth.totalReadCount() >= minTotalReads) | ||
{ | ||
total += 1; | ||
if(rnaDepth.alleleReadCount() >= minRnaReads) | ||
{ | ||
supported += 1; | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
double ratio = total > 0 ? supported * 1D / total : 0D; | ||
LOGGER.info("Supported: " + supported + " Total: " + total + " Fraction: " + ratio); | ||
return ratio; | ||
} | ||
|
||
private void logParams() | ||
{ | ||
LOGGER.info("purpleDir: {}", purpleDir); | ||
LOGGER.info("outputDir: {}", outputDir); | ||
LOGGER.info("sampleId: {}", sampleId); | ||
LOGGER.info("sampleToCheck: {}", sampleToCheck); | ||
LOGGER.info("minTotalReads: {}", minTotalReads); | ||
LOGGER.info("minRnaReads: {}", minRnaReads); | ||
LOGGER.info("acceptanceRatio: {}", acceptanceRatio); | ||
LOGGER.info("doNotWriteFile: {}", doNotWriteFile); | ||
} | ||
|
||
private static void logVersion() | ||
{ | ||
final VersionInfo version = new VersionInfo("crest.version"); | ||
LOGGER.info("Crest version: {}", version.version()); | ||
} | ||
|
||
private String getOutputFilename(boolean success) | ||
{ | ||
String extension = success ? ".CrestCheckSucceeded" : ".CrestCheckFailed"; | ||
return (outputDir == null ? "" : checkAddDirSeparator(outputDir)) + sampleId + extension; | ||
} | ||
|
||
private static boolean sampleInFile(@NotNull final String sample, @NotNull final VCFHeader header) | ||
{ | ||
return header.getSampleNamesInOrder().stream().anyMatch(x -> x.equals(sample)); | ||
} | ||
} |
Oops, something went wrong.