Welcome to the GENXT TES server demo repository! This project showcases the use of the Task Execution Service (TES) API, adhering to the GA4GH standard. The TES API facilitates the definition of batch execution tasks, including input files, Docker containers, commands, output files, and additional logging and metadata.
Test TES server URL: http://tesktest.genxt.cloud/ga4gh/tes/v1/
For detailed information about the TES API, refer to the GA4GH Task Execution Service API documentation.
This repository includes a demonstration script to run GRAPE (Genomic RelAtedness detection PipelinE) using the GENXT TES enabled computing service. GRAPE is an open-source tool for detecting genomic relatedness and requires a multisample VCF file as input. It also includes workflows for downloading and verifying reference datasets.
To utilize the script in this repository, follow these steps:
Configure your AWS S3 credentials and region for downloading input files and saving results:
AWS_ACCESS_KEY_ID = "<Your_Access_Key_ID>"
AWS_SECRET_ACCESS_KEY = "<Your_Secret_Access_Key>"
AWS_REGION = "us-east-1"
DOCKER_REGISTRY = ""
DOCKER_LOGIN = ""
DOCKER_PASS = ""
Configure your GENXT credentials:
username = ''
password = ''
Create a JSON object for task data:
task_data = {
"name": "GRAPE run",
"resources": {"disk_gb": 200},
"volumes": ["/vol/a/"],
"executors": []
}
Define the steps of your workflow by adding executors:
Use the AWS CLI docker container from Docker Hub to download the input files from AWS S3 to the persistent volume, utilizing the provided S3 credentials:
task_data["executors"].append({
"image": "amazon/aws-cli",
"command": ["aws", "s3", "cp", "s3://grapetestbucket/input.vcf.gz", "/vol/a/input.vcf.gz"],
"env": {
"AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY_ID,
"AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY,
"AWS_REGION": AWS_REGION
}
})
If you have input files on another storage service, you can use the BusyBox docker image and download files with wget:
task_data["executors"].append({
"image": "busybox",
"command": ["/bin/wget", "ftp://https://ftp.ebi.ac.uk/robots.txt", "/vol/a/robots.txt"]
})
Execute the GRAPE software using the genxnetwork/grape Docker Hub container. This step involves downloading the reference panel to the persistent volume and preprocessing the input file.
Download the reference panel to Persistent volume /vol/a/ in the folder /media/ref:
task_data["executors"].append({
"docker_login": DOCKER_LOGIN,
"docker_pass": DOCKER_PASS,
"docker_registry": DOCKER_REGISTRY,
"image": "genxnetwork/grape",
"command": [
"python",
"launcher.py",
"reference",
"--use-bundle",
"--ref-directory",
"/vol/a/media/ref",
"--real-run"
]
})
Preprocess the input file with the following command:
task_data["executors"].append({
"docker_login": DOCKER_LOGIN,
"docker_pass": DOCKER_PASS,
"docker_registry": DOCKER_REGISTRY,
"image": "genxnetwork/grape",
"command": [
"python",
"launcher.py",
"preprocess",
"--ref-directory",
"/vol/a/media/ref",
"--vcf-file",
"/vol/a/input.vcf.gz",
"--directory",
"/vol/a/media/data",
"--assembly",
"hg37",
"--real-run"
]
})
Run the relatives search part of the GRAPE software using the IBIS flow:
task_data["executors"].append({
"docker_login": DOCKER_LOGIN,
"docker_pass": DOCKER_PASS,
"docker_registry": DOCKER_REGISTRY,
"image": "genxnetwork/grape",
"command": [
"python",
"launcher.py",
"find",
"--flow",
"ibis",
"--ref-directory",
"/vol/a/media/ref",
"--directory",
"/vol/a/media/data",
"--real-run"
]
})
Upload the result files from the persistent volume to the AWS S3 bucket:
task_data["executors"].append({
"image": "amazon/aws-cli",
"command": ["aws", "s3", "cp", "/vol/a/media/data/results/relatives.tsv", "s3://grapetestbucket/relatives_output.tsv"],
"env": {
"AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY_ID,
"AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY,
"AWS_REGION": AWS_REGION
}
})
Create a task using the POST /tasks endpoint with the following code:
url = "http://tesktest.genxt.cloud/ga4gh/tes/v1/tasks"
headers = {
"accept": "application/json",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=json.dumps(task_data), auth=HTTPBasicAuth(username, password))
print("POST request response:", response.json())
task_id = response.json().get("id")
Monitor the status of your task using the GET /tasks/{task_id} endpoint:
url = f"http://tesktest.genxt.cloud/ga4gh/tes/v1/tasks/{task_id}"
headers = {
"accept": "application/json",
"Content-Type": "application/json"
}
response = requests.get(url, headers=headers, auth=HTTPBasicAuth(username, password))
print("GET request response:", response.json())
Retrieve a list of all tasks using the GET /tasks endpoint:
url = f"http://tesktest.genxt.cloud/ga4gh/tes/v1/tasks"
headers = {
"accept": "application/json",
"Content-Type": "application/json"
}
response = requests.get(url, headers=headers, auth=HTTPBasicAuth(username, password))
#print("GET request response:", response.json())
all_tasks = response.json().get("tasks")
for task in all_tasks:
print(task)