Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VS-361 Add GvsWithdrawSamples wdl #7765

Merged
merged 10 commits into from
Apr 7, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,14 @@ workflows:
branches:
- master
- ah_var_store
- name: GvsWithdrawSamples
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsWithdrawSamples.wdl
filters:
branches:
- master
- ah_var_store
- gg_VS-361_AddGvsWithdrawSamples
- name: MitochondriaPipeline
subclass: WDL
primaryDescriptorPath: /scripts/mitochondria_m2_wdl/MitochondriaPipeline.wdl
Expand Down
85 changes: 85 additions & 0 deletions scripts/variantstore/wdl/GvsWithdrawSamples.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
version 1.0

workflow GvsWithdrawSamples {

input {
String dataset_name
String project_id

Array[String] sample_names

String? service_account_json_path
}

call WithdrawSamples {
input:
project_id = project_id,
dataset_name = dataset_name,
sample_names = sample_names,
service_account_json_path = service_account_json_path
}

output {
Int num_rows_updated = WithdrawSamples.num_rows_updated
}
}

task WithdrawSamples {
input {
String project_id
String dataset_name

Array[String] sample_names

String? service_account_json_path
}

String has_service_account_file = if (defined(service_account_json_path)) then 'true' else 'false'

meta {
description: "Withdraw Samples from GVS by marking them as 'withdrawn' in the sample_info table"
volatile: true
}

command <<<
set -e
set -x

# make sure that sample names were actually passed, warn and exit if empty
num_samples=~{length(sample_names)}
if [ $num_samples -eq 0 ]; then
echo "No sample names passed. Exiting"
exit 0
fi

if [ ~{has_service_account_file} = 'true' ]; then
gsutil cp ~{service_account_json_path} local.service_account.json
gcloud auth activate-service-account --key-file=local.service_account.json
fi

echo "project_id = ~{project_id}" > ~/.bigqueryrc

# perform actual update
bq --project_id=~{project_id} query --format=csv --use_legacy_sql=false \
'UPDATE `~{dataset_name}.sample_info` SET withdrawn = CURRENT_TIMESTAMP() WHERE sample_name IN ("~{sep='\", \"' sample_names}")' > log_message.txt;

cat log_message.txt | sed -e 's/Number of affected rows: //' > rows_updated.txt
typeset -i rows_updated=$(cat rows_updated.txt)

if [ $num_samples -ne $rows_updated ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like having this check...I wish there was a cleaner way of doing this with the bq command line tool but if there is I haven't found it... 😞

echo "Error: Expected to update $num_samples rows - but only updated $rows_updated."
exit 1
fi

>>>
runtime {
docker: "us.gcr.io/broad-gatk/gatk:4.2.5.0"
memory: "3.75 GB"
disks: "local-disk 10 HDD"
cpu: 1
}
output {
Int num_rows_updated = read_int("rows_updated.txt")
}
}