Prototype for batch Python interface #4937

jigold · 2018-12-10T20:00:43Z

@cseed I got a version working!!!! I'd like to test on other potential pipelines and make a real example that will run for demo purposes.

In [1]: from pyapi import Pipeline
   ...: p = Pipeline()
   ...:
   ...: subset = (p.new_task()
   ...:            .label('subset')
   ...:            .command('plink --bfile {{bfile}} --make-bed {{tmp1}}')
   ...:            .command("awk '{ print $1, $2}' {{tmp1}}.fam | sort | uniq -c | awk '{ if ($1 != 1) print $2, $3 }' > {{tmp2}}")
   ...:            .command("plink --bfile {{bfile}} --remove {{tmp2}} --make-bed {{ofile}}"))
   ...:
   ...: shapeit_tasks = []
   ...: for contig in [str(x) for x in range(1, 4)]:
   ...:     shapeit = (p.new_task()
   ...:                 .label('shapeit')
   ...:                 .command('shapeit --bed-file {{bfile}} --chr ' + contig + ' --out {{ofile}}')
   ...:                 .inputs(bfile=subset.ofile))
   ...:     shapeit_tasks.append(shapeit)
   ...:
   ...: merger = (p.new_task()
   ...:            .label('merge')
   ...:            .command('cat {{files}} >> {{ofile}}')
   ...:            .inputs(files=[task.ofile for task in shapeit_tasks]))
   ...:
   ...:
   ...: p.write_output(merger.ofile + ".haps", "gs://jigold/final_output.txt")
   ...: p.run()
   ...:
#! /usr/bash
set -ex


# __TASK__0 subset
__RESOURCE__0=/tmp/9CiA1t
__RESOURCE__1=/tmp/y7HdVA
__RESOURCE__2=/tmp/l7skDb
__RESOURCE__3=/tmp/McFulO
plink --bfile $__RESOURCE__1 --make-bed $__RESOURCE__0
awk '{ print $1, $2}' $__RESOURCE__0.fam | sort | uniq -c | awk '{ if ($1 != 1) print $2, $3 }' > $__RESOURCE__2
plink --bfile $__RESOURCE__1 --remove $__RESOURCE__2 --make-bed $__RESOURCE__3


# __TASK__1 shapeit
__RESOURCE__4=/tmp/PQiR68
__RESOURCE__5=/tmp/McFulO
shapeit --bed-file $__RESOURCE__5 --chr 1 --out $__RESOURCE__4


# __TASK__2 shapeit
__RESOURCE__6=/tmp/sjoOQX
__RESOURCE__7=/tmp/McFulO
shapeit --bed-file $__RESOURCE__7 --chr 2 --out $__RESOURCE__6


# __TASK__3 shapeit
__RESOURCE__8=/tmp/gNw0he
__RESOURCE__9=/tmp/McFulO
shapeit --bed-file $__RESOURCE__9 --chr 3 --out $__RESOURCE__8


# __TASK__4 merge
__RESOURCE__10=/tmp/RY0Raq
__RESOURCE__11=(/tmp/PQiR68 /tmp/sjoOQX /tmp/gNw0he)
cat ${__RESOURCE__11[*]} >> $__RESOURCE__10


# __TASK__5
__RESOURCE__14=gs://jigold/final_output.txt
__RESOURCE__15=/tmp/RY0Raq.haps
cp $__RESOURCE__15 $__RESOURCE__14

cseed · 2018-12-12T20:32:35Z

This is really great.

I have some thoughts below, mostly brain storming. Don't take any of it too seriously.

Some thoughts:

I thought you wanted to support f-strings. By making the batch file quote double curly parens, that means if you use them in an f-string, you need to write f'{{{{foo}}}}' which is a bit much. But that means that non-batch uses of {} need to be double-quoted, so awk '{{ ... }}' and f"awk '{{{{ ... }}}}'". Hmm. Maybe using the same escape syntax as f-strings is not ideal.

I don't have a no-brainer suggestion. Happy to brainstorm ideas offline. I think ultimately this is a minor syntactic choice.

Inputs seem ... almost redundant, because they also appear in the command strings. What about:

.command('shapeit --bed-file {{<subset.ofile}} --chr ' + contig + ' --out {{>ofile}}')

Then the question becomes, how do associate subset with the corresponding Python variable? You could use the task label, but then the user has to maintain two sets of names, which isn't ideal. Hmm, maybe this doesn't work.

I like arrays of resources!

.command('cat {{files}} >> {{ofile}}')

I wonder, will we ever want arrays to be formatted other than joined with spaces? I worry the user will want more flexibility in formatting, and we'll want that in Python. What about if the argument is a function, it takes a dictionary from resource names to their string representation, and you can format however you want? Then you could write the last command as:

  .command(lambda rs: f'cat {' '.join(rs['files'])} >> {rs['ofile']}')

I was confused by this:

p.write_output(merger.ofile + ".haps", ...)

What's the left hand side? Why isn't this just merger.ofile?

This suggests another issue: what if you want to use ofile in a plink command, but plink outputs some files with various extensions with ofile as the base? We might need an outputs that lists (docker local) output files based on a base path.

jigold · 2018-12-12T21:03:39Z

All of these things I completely agree with!

I was lazy and used the Jinja template engine to parse and find the variable declarations. I need to write a custom parser, but wanted to figure out exactly what we're going to support. Which makes me worried that I don't want to implement an expr language or it should be minimal.
Tim suggested something similar: %%IN bfile%% and `%%OUT ofile%%. Requires the custom parser. See comment 1 above.
I was also concerned about the formatting of arrays. I tried using lambdas for comment 4 and it got complicated. I like your proposal but want to think about it more.
PLINK, etc. output lots of files and you specify the file root and then it outputs a bunch of files with different extensions. We must be able to support this and make it easy for users. I agree with your suggestion. I'll try that in the example.

jigold · 2018-12-14T20:28:28Z

@cseed I don't really like how this looks with lists as the inputs to the commands. To me, it's much harder to read and modify. I like the commands looking as much like writing a shell script as possible. Maybe others feel differently though... Also, you can't do something like this ' '.join([task.ofile for task in shapeit_tasks]) because you'll lose information about the dependencies before command sees the original resource inputs. What I really want is a version of f-string interpolation where I parse and detect the variables (known and unknown), handle them properly by either creating new resources or adding dependencies to the Task, and then execute the Python formatting code inside the curly braces. I'm not sure if it is possible to do this. If it is, it's probably complicated and we'll have to use the Python ast and parser modules and call eval ourselves.

from pyapi import Pipeline
p = Pipeline()

bfile_root = 'gs://jigold/input'
bed = bfile_root + '.bed'
bim = bfile_root + '.bim'
fam = bfile_root + '.fam'

p.write_input(bed=bed, bim=bim, fam=fam)

subset = p.new_task()
subset = (subset
           .label('subset')
           .command(['plink', '--bed', p.bed, '--bim', p.bim, '--fam', p.fam, '--make-bed', '--out', subset.tmp1])
           .command(['awk', "'{ print $1, $2}'", subset.tmp1 + '.fam', "| sort | uniq -c | awk '{ if ($1 != 1) print $2, $3 }'",
                     '>', subset.tmp2])
           .command(['plink', '--bed', p.bed, '--bim', p.bim, '--fam', p.fam, '--remove', subset.tmp2,
                     '--make-bed', '--out', subset.tmp2]))

shapeit_tasks = []
for contig in [str(x) for x in range(1, 4)]:
    shapeit = p.new_task()
    shapeit = (shapeit
                .label('shapeit')
                .command(['shapeit', '--bed-file', subset.ofile, '--chr ', contig, '--out', shapeit.ofile]))
    shapeit_tasks.append(shapeit)

merger = p.new_task()
merger = (merger
           .label('merge')
           .command(['cat', ' '.join([task.ofile for task in shapeit_tasks]), '>>', merger.ofile))

p.write_output(merger.ofile + ".haps", "gs://jigold/final_output.txt")
p.run()

danking · 2018-12-20T18:18:01Z

should this have an assigned reviewer?

jigold · 2019-01-03T22:19:13Z

@cseed Sorry if this doesn't make sense -- we can discuss in person.

Here's my attempt to fix the problems outlined above. The tradeoff made is that we have to refer to the object (ex: subset). So instead of thinking of writing the command as a template where inputs specifies how to substitute into the template, writing commands in this interface is the same as using Python to generate the correct string to output. I'm not sure that I like this better. One of the advantages of writing commands as templates is they are reusable. In the latter case, the strings are reusable if they are written as .format() templates instead of f-strings. So maybe it's approximately the same, but there's an extra step to define the inputs to format.

I tried hacking the Python AST to not have to refer to the object, but I think it's going to be difficult to get the AST parsing exactly right and not have too many implicit rules within our language. I also considered writing a DSL, but found that it's hard to specify the part with the shapeit_output in a DSL.

from pyapi import Pipeline, resource_group
p = Pipeline()

input_bfile = p.new_resource_group(bed="gs://jigold/input_root.bed",
                                   bim="gs://jigold/input_root.bim",
                                   fam="gs://jigold/input_root.fam")

def bfile(root):
    return resource_group(root, lambda x: {"bed": x + ".bed", "bim": x + ".bim", "fam": x + ".fam"})

subset = (p.new_task()
           .label('subset'))
subset = subset
           .command(f'plink --bfile {input_bfile} --make-bed {bfile(subset.tmp1)}')
           .command("awk '{print $1, $2}'" +
                    subset.tmp1.fam +
                    " | sort | uniq -c | awk '{ if ($1 != 1) print $2, $3 }' > " +
                    subset.tmp2)
           .command(f"plink --bfile {input_bfile} --remove {subset.tmp2} --make-bed {bfile(subset.ofile)}"))

def shapeit_output(root):
    return resource_group(root, lambda x: {"haps": x + ".haps", "log": x + ".log"})

for contig in [str(x) for x in range(1, 4)]:
    shapeit = (p.new_task()
		.label('shapeit'))
    shapeit = (shapeit
		.command(f'shapeit --bed-file {subset.ofile} --chr {contig} --out {shapeit_output(shapeit.ofile)}'))

merger = (p.new_task()
           .label('merge'))
merger = (merger
           .command('cat {files} >> {ofile}'.format(files=" ".join([task.ofile.haps for task in p.select_tasks("shapeit")]), ofile=merger.ofile))

p.write_output(merger.ofile, "gs://jigold/final_output.txt")
p.run()

jigold · 2019-01-09T16:11:57Z

@cseed I'm really happy with the interface now! Could you please look over this again and let me know if there are any suggestions you have before I write some tests and give this to someone to code review. I also called this pyapi for lack of a better name and it's currently in the batch module...

from pyapi import Pipeline, resource_group_builder

p = Pipeline() # initialize a pipeline

# Define resource group builders (used with `declare_resource_group`)
rgb_bfile = resource_group_builder(bed="{root}.bed",
                                   bim="{root}.bim",
                                   fam="{root}.fam")

rgb_shapeit = resource_group_builder(haps="{root}.haps",
                                     log="{root}.log")

# Import a file as a resource
file = p.write_input('gs://hail-jigold/random_file.txt')

# Import a set of input files as a resource group
input_bfile = p.write_input_group(bed='gs://hail-jigold/input.bed',
                                  bim='gs://hail-jigold/input.bim',
                                  fam='gs://hail-jigold/input.fam')

# Remove duplicate samples from a PLINK dataset
subset = p.new_task()
subset = (subset
          .label('subset')
          .declare_resource_group(tmp1=rgb_bfile, ofile=rgb_bfile)
          .command(f'plink --bfile {input_bfile} --make-bed {subset.tmp1}')
          .command(f"awk '{{ print $1, $2}}' {subset.tmp1.fam} | sort | uniq -c | awk '{{ if ($1 != 1) print $2, $3 }}' > {subset.tmp2}")
          .command(f"plink --bed {input_bfile.bed} --bim {input_bfile.bim} --fam {input_bfile.fam} --remove {subset.tmp2} --make-bed {subset.ofile}"
))

# Run shapeit for each contig from 1-3 with the output from subset
for contig in [str(x) for x in range(1, 4)]:
    shapeit = p.new_task()
    shapeit = (shapeit
                .label('shapeit')
                .declare_resource_group(ofile=rgb_shapeit)
                .command(f'shapeit --bed-file {subset.ofile} --chr {contig} --out {shapeit.ofile}'))

# Merge the shapeit output files together
merger = p.new_task()
merger = (merger
           .label('merge')
           .command('cat {files} >> {ofile}'.format(files=" ".join([t.ofile.haps for t in p.select_tasks('shapeit')]),
                                                    ofile=merger.ofile)))

# Write the result of the merger to a permanent location
p.write_output(merger.ofile, "gs://jigold/final_output.txt")

# Execute the pipeline
p.run()

#! /usr/bash
set -ex


# define tmp directory
__TMP_DIR__=/tmp//pipeline.yG41vqpS/


# __TASK__0 write_input
cp gs://hail-jigold/random_file.txt ${__TMP_DIR__}/rsfKylng


# __TASK__1 write_input
cp gs://hail-jigold/input.bed ${__TMP_DIR__}/xJONBVn7.bed


# __TASK__2 write_input
cp gs://hail-jigold/input.bim ${__TMP_DIR__}/xJONBVn7.bim


# __TASK__3 write_input
cp gs://hail-jigold/input.fam ${__TMP_DIR__}/xJONBVn7.fam


# __TASK__4 subset
__RESOURCE_GROUP__0=${__TMP_DIR__}/xJONBVn7
__RESOURCE_GROUP__1=${__TMP_DIR__}/TB7ZUbj8
__RESOURCE__6=${__TMP_DIR__}/TB7ZUbj8.fam
__RESOURCE__10=${__TMP_DIR__}/EVeRHf7V
__RESOURCE__1=${__TMP_DIR__}/xJONBVn7.bed
__RESOURCE__2=${__TMP_DIR__}/xJONBVn7.bim
__RESOURCE__3=${__TMP_DIR__}/xJONBVn7.fam
__RESOURCE_GROUP__2=${__TMP_DIR__}/MXBQugBx
plink --bfile ${__RESOURCE_GROUP__0} --make-bed ${__RESOURCE_GROUP__1}
awk '{ print $1, $2}' ${__RESOURCE__6} | sort | uniq -c | awk '{ if ($1 != 1) print $2, $3 }' > ${__RESOURCE__10}
plink --bed ${__RESOURCE__1} --bim ${__RESOURCE__2} --fam ${__RESOURCE__3} --remove ${__RESOURCE__10} --make-bed ${__RESOURCE_GROUP__2}


# __TASK__5 shapeit
__RESOURCE_GROUP__2=${__TMP_DIR__}/MXBQugBx
__RESOURCE_GROUP__3=${__TMP_DIR__}/YSm1XkKf
shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 1 --out ${__RESOURCE_GROUP__3}


# __TASK__6 shapeit
__RESOURCE_GROUP__2=${__TMP_DIR__}/MXBQugBx
__RESOURCE_GROUP__4=${__TMP_DIR__}/1HyBvsdN
shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 2 --out ${__RESOURCE_GROUP__4}


# __TASK__7 shapeit
__RESOURCE_GROUP__2=${__TMP_DIR__}/MXBQugBx
__RESOURCE_GROUP__5=${__TMP_DIR__}/jtM69Ahm
shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 3 --out ${__RESOURCE_GROUP__5}


# __TASK__8 merge
__RESOURCE__11=${__TMP_DIR__}/YSm1XkKf.haps
__RESOURCE__13=${__TMP_DIR__}/1HyBvsdN.haps
__RESOURCE__15=${__TMP_DIR__}/jtM69Ahm.haps
__RESOURCE__17=${__TMP_DIR__}/z6ccazmC
cat ${__RESOURCE__11} ${__RESOURCE__13} ${__RESOURCE__15} >> ${__RESOURCE__17}


# __TASK__9 write_output
__RESOURCE__17=${__TMP_DIR__}/z6ccazmC
cp ${__RESOURCE__17} gs://jigold/final_output.txt


# remove tmp directory
rm -r ${__TMP_DIR__}

jigold · 2019-01-09T19:27:30Z

@danking suggested we move this to a separate project from batch. Possibly call it pipeline.

cseed · 2019-01-09T20:34:04Z

This really does look great! I have two small suggestions:

I feel like you should say read_input and rather than write_input. I'm thinking these commands are from the perspective of the pipeline since they are on Pipeline.
Rather than building a group and then declaring it, I think you can do both at once:

# Remove duplicate samples from a PLINK dataset
subset = p.new_task()
subset.declare_resource_groups(tmp1={bed="{root}.bed", bim="{root}.bim", fam="{root}.fam"}, 
    ofile={...})
subset = (subset
          .label('subset')
          .command(f'plink --bfile {input_bfile} --make-bed {subset.tmp1}')
          .command(f"awk '{{ print $1, $2}}' {subset.tmp1.fam} | sort | uniq -c | awk '{{ if ($1 != 1) print $2, $3 }}' > {subset.tmp2}")
          .command(f"plink --bed {input_bfile.bed} --bim {input_bfile.bim} --fam {input_bfile.fam} --remove {subset.tmp2} --make-bed {subset.ofile}"
))

jigold · 2019-01-10T19:50:39Z

This is now ready to be reviewed. @danking Could you please help me setup the tests to run on the CI?

@catoverdrive This is an example of the interface and the output generated. There's also a tests file in there. I'm happy to explain the design to you if you'd like.

from pipeline import Pipeline

p = Pipeline() # initialize a pipeline

# Define mapping for taking a file root to a set of output files
bfile = {'bed': '{root}.bed', 'bim': '{root}.bim', 'fam': '{root}.fam'}

# Import a file as a resource
file = p.read_input('gs://hail-jigold/random_file.txt')

# Import a set of input files as a resource group
input_bfile = p.read_input_group(bed='gs://hail-jigold/input.bed',
                                                      bim='gs://hail-jigold/input.bim',
                                                      fam='gs://hail-jigold/input.fam')

# Remove duplicate samples from a PLINK dataset
subset = p.new_task()
subset = (subset
          .label('subset')
          .docker('ubuntu')
          .declare_resource_group(tmp1=bfile, ofile=bfile)
          .command(f'plink --bfile {input_bfile} --make-bed {subset.tmp1}')
          .command(f"awk '{{ print $1, $2}}' {subset.tmp1.fam} | sort | uniq -c | awk '{{ if ($1 != 1) print $2, $3 }}' > {subset.tmp2}")
          .command(f"plink --bed {input_bfile.bed} --bim {input_bfile.bim} --fam {input_bfile.fam} --remove {subset.tmp2} --make-bed {subset.ofile}"

))

# Run shapeit for each contig from 1-3 with the output from subset
for contig in [str(x) for x in range(1, 4)]:
    shapeit = p.new_task()
    shapeit = (shapeit
                .label('shapeit')
                .declare_resource_group(ofile={'haps': "{root}.haps", 'log': "{root}.log"})
                .command(f'shapeit --bed-file {subset.ofile} --chr {contig} --out {shapeit.ofile}'))

# Merge the shapeit output files together
merger = p.new_task()
merger = (merger
           .label('merge')
           .command('cat {files} >> {ofile}'.format(files=" ".join([t.ofile.haps for t in p.select_tasks('shapeit')]),
                                                    ofile=merger.ofile)))

# Write the result of the merger to a permanent location
p.write_output(merger.ofile, "gs://jigold/final_output.txt")

# Execute the pipeline
p.run(dry_run=True)

#!/bin/bash
set -ex


# change cd to tmp directory
cd /tmp//pipeline.jlQrNJZW/


# __TASK__0 read_input
cp gs://hail-jigold/random_file.txt nfVpMp4n


# __TASK__1 read_input
cp gs://hail-jigold/input.bed 33qZtfwg.bed


# __TASK__2 read_input
cp gs://hail-jigold/input.bim 33qZtfwg.bim


# __TASK__3 read_input
cp gs://hail-jigold/input.fam 33qZtfwg.fam


# __TASK__4 subset
__RESOURCE_GROUP__0=33qZtfwg
__RESOURCE_GROUP__1=yibUlBkL
__RESOURCE__6=yibUlBkL.fam
__RESOURCE__10=29aBQihd
__RESOURCE__1=33qZtfwg.bed
__RESOURCE__2=33qZtfwg.bim
__RESOURCE__3=33qZtfwg.fam
__RESOURCE_GROUP__2=YXS0tQKi
plink --bfile ${__RESOURCE_GROUP__0} --make-bed ${__RESOURCE_GROUP__1}
awk '{ print $1, $2}' ${__RESOURCE__6} | sort | uniq -c | awk '{ if ($1 != 1) print $2, $3 }' > ${__RESOURCE__10}
plink --bed ${__RESOURCE__1} --bim ${__RESOURCE__2} --fam ${__RESOURCE__3} --remove ${__RESOURCE__10} --make-bed ${__RESOURCE_GROUP__2}


# __TASK__5 shapeit
__RESOURCE_GROUP__2=YXS0tQKi
__RESOURCE_GROUP__3=gidGmbcC
shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 1 --out ${__RESOURCE_GROUP__3}


# __TASK__6 shapeit
__RESOURCE_GROUP__2=YXS0tQKi
__RESOURCE_GROUP__4=W5hjCmPK
shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 2 --out ${__RESOURCE_GROUP__4}


# __TASK__7 shapeit
__RESOURCE_GROUP__2=YXS0tQKi
__RESOURCE_GROUP__5=ySM8T0lZ
shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 3 --out ${__RESOURCE_GROUP__5}


# __TASK__8 merge
__RESOURCE__11=gidGmbcC.haps
__RESOURCE__13=W5hjCmPK.haps
__RESOURCE__15=ySM8T0lZ.haps
__RESOURCE__17=Z5OLJG6Y
cat ${__RESOURCE__11} ${__RESOURCE__13} ${__RESOURCE__15} >> ${__RESOURCE__17}


# __TASK__9 write_output
__RESOURCE__17=Z5OLJG6Y
cp ${__RESOURCE__17} gs://jigold/final_output.txt

jigold · 2019-01-11T18:59:52Z

Other things to add in separate PRs:

logging
concept of a ResourceDirectory where you want to copy the files in/out from a directory
change the temp dir to be per task
Support environment variables, cpu, memory

catoverdrive

Looks great!

Let me know when you get the tests working and I'll approve it.

catoverdrive · 2019-01-11T21:35:47Z

pipeline/setup.py

+setup(
+    name = 'pipeline',
+    version = '0.0.1',
+    url = 'https://github.com/hail-is/pipeline.git',


I don't think this URL works

jigold · 2019-01-14T19:05:56Z

@catoverdrive Here's the output with docker commands:

#!/bin/bash



# change cd to tmp directory
cd /tmp//pipeline.S9YTZap5/


# __TASK__0 read_input
cp gs://hail-jigold/random_file.txt DWRmR1Lh


# __TASK__1 read_input
cp gs://hail-jigold/input.bed Aw2arWP9.bed


# __TASK__2 read_input
cp gs://hail-jigold/input.bim Aw2arWP9.bim


# __TASK__3 read_input
cp gs://hail-jigold/input.fam Aw2arWP9.fam


# __TASK__4 subset
docker run -v /tmp//pipeline.S9YTZap5/:/tmp//pipeline.S9YTZap5/ -w /tmp//pipeline.S9YTZap5/ ubuntu /bin/bash -c '__RESOURCE_GROUP__0=Aw2arWP9; __RESOURCE_GROUP__1=srXTmGQE; __RESOURCE__6=srXTmGQE.fam; __RESOURCE__10=8ueGZQqn; __RESOURCE__1=Aw2arWP9.bed; __RESOURCE__2=Aw2arWP9.bim; __RESOURCE__3=Aw2arWP9.fam; __RESOURCE_GROUP__2=ESEFn8Tm; plink --bfile ${__RESOURCE_GROUP__0} --make-bed ${__RESOURCE_GROUP__1}&& awk '"'"'{ print $1, $2}'"'"' ${__RESOURCE__6} | sort | uniq -c | awk '"'"'{ if ($1 != 1) print $2, $3 }'"'"' > ${__RESOURCE__10}&& plink --bed ${__RESOURCE__1} --bim ${__RESOURCE__2} --fam ${__RESOURCE__3} --remove ${__RESOURCE__10} --make-bed ${__RESOURCE_GROUP__2}'


# __TASK__5 shapeit
docker run -v /tmp//pipeline.S9YTZap5/:/tmp//pipeline.S9YTZap5/ -w /tmp//pipeline.S9YTZap5/ gcr.io/shapeit /bin/bash -c '__RESOURCE_GROUP__2=ESEFn8Tm; __RESOURCE_GROUP__3=K1TfWX3n; shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 1 --out ${__RESOURCE_GROUP__3}'


# __TASK__6 shapeit
docker run -v /tmp//pipeline.S9YTZap5/:/tmp//pipeline.S9YTZap5/ -w /tmp//pipeline.S9YTZap5/ gcr.io/shapeit /bin/bash -c '__RESOURCE_GROUP__2=ESEFn8Tm; __RESOURCE_GROUP__4=8dRi0LwZ; shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 2 --out ${__RESOURCE_GROUP__4}'


# __TASK__7 shapeit
docker run -v /tmp//pipeline.S9YTZap5/:/tmp//pipeline.S9YTZap5/ -w /tmp//pipeline.S9YTZap5/ gcr.io/shapeit /bin/bash -c '__RESOURCE_GROUP__2=ESEFn8Tm; __RESOURCE_GROUP__5=NIqfevqS; shapeit --bed-file ${__RESOURCE_GROUP__2} --chr 3 --out ${__RESOURCE_GROUP__5}'


# __TASK__8 merge
docker run -v /tmp//pipeline.S9YTZap5/:/tmp//pipeline.S9YTZap5/ -w /tmp//pipeline.S9YTZap5/ ubuntu /bin/bash -c '__RESOURCE__11=K1TfWX3n.haps; __RESOURCE__13=8dRi0LwZ.haps; __RESOURCE__15=NIqfevqS.haps; __RESOURCE__17=GLxOwBss; cat ${__RESOURCE__11} ${__RESOURCE__13} ${__RESOURCE__15} >> ${__RESOURCE__17}'


# __TASK__9 write_output
__RESOURCE__17=GLxOwBss
cp ${__RESOURCE__17} gs://jigold/final_output.txt

done

tpoterba

requesting changes so this doesn't keep getting tested. Batch testing is broken.

Done

jigold assigned catoverdrive Jan 10, 2019

catoverdrive previously requested changes Jan 11, 2019

View reviewed changes

catoverdrive approved these changes Jan 16, 2019

View reviewed changes

tpoterba previously requested changes Jan 19, 2019

View reviewed changes

jigold added 11 commits January 30, 2019 14:06

Prototype for batch Python interface

e562789

wip

62a6ac8

Attempt #2

20a774c

change how temp dir is formatted

cfbb4fa

change directory structure

0d4d1f3

Add tests and actually execute pipeline

353c07a

added project

2e474cd

better error msg

c8d4dc2

added docker support

f3f294b

Add infrastructure for testing, conda, etc.

1ceeae9

support for new pipeline env

154f034

jigold force-pushed the batch-pyapi-1 branch from bc9cbaa to 154f034 Compare January 30, 2019 19:53

comment out docker test

4f0fcb9

danking merged commit c883545 into hail-is:master Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype for batch Python interface #4937

Prototype for batch Python interface #4937

jigold commented Dec 10, 2018

cseed commented Dec 12, 2018

jigold commented Dec 12, 2018

jigold commented Dec 14, 2018 •

edited

Loading

danking commented Dec 20, 2018

jigold commented Jan 3, 2019 •

edited

Loading

jigold commented Jan 9, 2019

jigold commented Jan 9, 2019 •

edited

Loading

cseed commented Jan 9, 2019

jigold commented Jan 10, 2019

jigold commented Jan 11, 2019

catoverdrive left a comment

catoverdrive Jan 11, 2019

jigold commented Jan 14, 2019

tpoterba left a comment

Prototype for batch Python interface #4937

Prototype for batch Python interface #4937

Conversation

jigold commented Dec 10, 2018

cseed commented Dec 12, 2018

jigold commented Dec 12, 2018

jigold commented Dec 14, 2018 • edited Loading

danking commented Dec 20, 2018

jigold commented Jan 3, 2019 • edited Loading

jigold commented Jan 9, 2019

jigold commented Jan 9, 2019 • edited Loading

cseed commented Jan 9, 2019

jigold commented Jan 10, 2019

jigold commented Jan 11, 2019

catoverdrive left a comment

Choose a reason for hiding this comment

catoverdrive Jan 11, 2019

Choose a reason for hiding this comment

jigold commented Jan 14, 2019

tpoterba left a comment

Choose a reason for hiding this comment

jigold commented Dec 14, 2018 •

edited

Loading

jigold commented Jan 3, 2019 •

edited

Loading

jigold commented Jan 9, 2019 •

edited

Loading