Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VAT clinvar reverse complement #9070

Open
wants to merge 7 commits into
base: ah_var_store
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ workflows:
branches:
- master
- ah_var_store
- rc-1528-n-rounds-update
- rc-vs-1457-vat-clinvar
tags:
- /.*/
- name: GvsCreateVATFilesFromBigQuery
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import argparse
import logging
import sys
import Bio.Seq
RoriCremer marked this conversation as resolved.
Show resolved Hide resolved

vat_nirvana_positions_dictionary = {
"position": "position", # required
Expand Down Expand Up @@ -233,14 +234,19 @@ def make_annotated_json_row(row_position, row_ref, row_alt, variant_line, transc
updated_dates = [] # grab the most recent
phenotypes = [] # ordered alphabetically
clinvar_ids = [] # For easy validation downstream
# Note that inside the clinvar array, are multiple objects that may or may not be the one we are looking for. We check by making sure the ref and alt are the same
# Note that inside the clinvar array, are multiple objects that may or may not be the one we are looking for.
# We check by making sure the ref and alt are the same including any reverse complements
## TODO add clinvar star rating!!!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to do this work in this ticket or is this another ticket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another ticket

for clinvar_obj in clinvar_objs:
# get only the clinvar objs with right variant and the id that starts with RCV
if (clinvar_obj.get("refAllele") == var_ref) and (clinvar_obj.get("altAllele") == var_alt) and (clinvar_obj.get("id")[:3] == "RCV"):
if (((clinvar_obj.get("refAllele") == var_ref) and (clinvar_obj.get("altAllele") == var_alt)) or
((clinvar_obj.get("refAllele") == Bio.Seq.reverse_complement(variant_line["refAllele"])) and (clinvar_obj.get("altAllele") == Bio.Seq.reverse_complement(variant_line["altAllele"])))) and (clinvar_obj.get("id")[:3] == "RCV"):
RoriCremer marked this conversation as resolved.
Show resolved Hide resolved
clinvar_ids.append(clinvar_obj.get("id"))
significance_values.extend([x.lower() for x in clinvar_obj.get("significance")])
updated_dates.append(clinvar_obj.get("lastUpdatedDate"))
phenotypes.extend(clinvar_obj.get("phenotypes"))
## TODO add the ("variationId") and the ("reviewStatus")--note that the reviewStatus will need to maintain the ordering of the significance arrays
## we need to do this with a tuple so that the reviewStatus lines up with the significance (since significance seems to be an array, while star is a single value)
Comment on lines +249 to +250
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering if this work is for this ticket / PR or another

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes another ticket--but I'm in here, it seemed like a waste to not note where it needed to live

if len(clinvar_ids) > 0:
ordered_significance_values = []
# We want to collect all the significance values and order them by the significance_ordering list
Expand Down
1 change: 1 addition & 0 deletions scripts/variantstore/scripts/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ google-cloud-storage
firecloud
terra-notebook-utils
pybedtools
biopython
2 changes: 1 addition & 1 deletion scripts/variantstore/wdl/GvsUtils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ task GetToolVersions {
# GVS generally uses the smallest `alpine` version of the Google Cloud SDK as it suffices for most tasks, but
# there are a handlful of tasks that require the larger GNU libc-based `slim`.
String cloud_sdk_slim_docker = "gcr.io/google.com/cloudsdktool/cloud-sdk:435.0.0-slim"
String variants_docker = "us-central1-docker.pkg.dev/broad-dsde-methods/gvs/variants:2024-11-25-alpine-913039adf8f4"
String variants_docker = "us-central1-docker.pkg.dev/broad-dsde-methods/gvs/variants:2025-01-17-alpine-2b4c2ca0187c"
String variants_nirvana_docker = "us.gcr.io/broad-dsde-methods/variantstore:nirvana_2022_10_19"
String gatk_docker = "us-central1-docker.pkg.dev/broad-dsde-methods/gvs/gatk:2024-11-24-gatkbase-1807487d5912"
String real_time_genomics_docker = "docker.io/realtimegenomics/rtg-tools:latest"
Expand Down
Loading