Skip to content

Step 2: Submit jobs

ErinWeisbart edited this page Oct 26, 2020 · 1 revision

Overview

Distributed-Fiji works by breaking your analysis into a series of smaller jobs based on the metadata and groupings you've specified in your pipeline. The choice of how to group your images is largely dependent on the details of your experiment - for example, pipelines like cropping to an ROI that act on a single site at a time (particularly memory intensive pipelines), may be grouped on plate, well, and site so that each node only processes one site at a time while pipelines that require a larger group of images, such as stitching together a well, would need to be grouped based on plate and well.

Once you've decided on a grouping, you're ready to start configuring your job file. Once your job file is configured, simply use python run.py submitJob files/{YourJobFile}.json to send all the jobs to the SQS queue specified in your config file.


Configuring your job file

  • output_file_location: The location where your files will be output. The path of the output_file_location is relative to the root of your S3 bucket.
  • shared_metadata: Metadata that is shared between all of your input images. If the location of input data is on your bucket and is passed to the script, make sure you pass a file location starting with /home/ubuntu/bucket.
  • groups: The list of all the groups of images you'd like to process. Each group is a task and will be processed separately from other groups. For large numbers of groups, it may be helpful to create this list separately as a txt file you can then append into the jobs JSON file. You may create this yourself in your favorite scripting language, but we've provided an additional tool to help you create and format this list:
    • batches.sh allows you to provide a list of all the individual metadata components (plates, columns, rows, etc); it then uses GNU parallel to create a formatted text file with all the possible combinations. This approach is best when you have a large number of groups and the group structure is uniform.

      Example: for a 3-plate, 96-well experiment where one is grouping by Plate and Well, one would edit batches.sh to read: parallel echo '{\"Metadata\": \"Metadata_Plate={1},Metadata_Well={2}{3}\"},' ::: Plate 1 Plate 2 Plate 3 ::: A B C D E F G H ::: 01 02 03 04 05 06 07 08 09 10 11 12 | sort > batches.txt