-
Notifications
You must be signed in to change notification settings - Fork 25
Home
-
Distributed-CellProfiler is a series of scripts designed to help you run CellProfiler on Amazon Web Services using AWS's file storage and computing systems.
-
Images and pipelines are stored in S3 buckets.
-
CellProfiler is run on "SpotFleets" of computers (or "instances") in the cloud.
-
-
Using AWS allows you to create a flexible, on-demand computing infrastructure where you only have to pay for the resources you use. This can give you access to far more computing power than you may have available at your home institution, which is great when you have large image sets to run.
-
Each piece of the infrastructure has to be added and configured separately, which can be time-consuming and confusing.
-
Distributed-CellProfiler tries to leverage the power of the former, while minimizing the problems of the latter.
- Essentially just an AWS account and a terminal program; see our page on getting set up for all the specific steps you'll need to take.
-
To some degree, it's trial and error.
- Looking at the resources CellProfiler uses on your local computer when it runs your images can give you a sense of roughly how much hard drive and memory space each image requires, which can help you determine your group size and what machines to use.
- Prices of different machine sizes fluctuate, so the choice of which type of machines to use in your spot fleet is best determined at the time you run it. How long a job takes to run and how quickly you need the data may also affect how much you're willing to bid for any given machine.
- Running a few large Docker containers (as opposed to many small ones) increases the amount of memory all the copies of CellProfiler are sharing, decreasing the likelihood you'll run out of memory if you stagger your job starts, but means you're at a greater risk of of running out of hard disk space.
-
Keep an eye on all of the logs the first few times you run any pipeline, and you'll get a sense of whether your resources are being utilized well or if you need to do more tweaking.
- Feel free! We're always looking for ways to improve.
- Distributed-CellProfiler is a project from the Carpenter Lab at the Broad Institute in Cambridge, MA, USA.