-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR builds on #5, with two main objectives: - add SLURM, config, and supporting services to the stack - integrate async job processing into the backend via SLURM In addition to the main objectives, this PR includes a few tweaks to API response handling and the testing framework. A new field, `reason`, has been added to the `analyses` table to capture why a status was set. Currently it gets set to an exception traceback if the analysis throws an error. The PR includes a skeleton of how analyses would be processed in `backend/api/dispatch/submit.R` (called via `dispatchAnalysis()` in `backend/api/endpoints/analyses.R`), but it doesn't actually do any real work yet. Things to try: - bringing up the full stack and seeing if it builds and runs without issue - submitting analyses and seeing if the status eventually gets updated - trying the tests: after launching the stack, execute `./run_stack.sh shell` in a separate window, then run `./run_tests.sh`, which will perform a few integration tests
- Loading branch information
1 parent
7af1f3c
commit 185f1e8
Showing
47 changed files
with
2,127 additions
and
141 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,119 @@ | ||
# MolEvolvR Backend | ||
|
||
The backend is implemented as a RESTful API over the following entities: | ||
|
||
- `User`: Represents a user of the system. At the moment logins aren't | ||
required, so all regular users are the special "Anonymous" user. Admins | ||
have individual accounts. | ||
- `Analysis`: Represents an analysis submitted by a user. Each analysis has a unique ID | ||
and is associated with a user. analyses contain the following sub-entities: | ||
- `Submission`: Represents the submission of a Analysis, e.g. the data | ||
itself as well the submission's parameters (both selected by the | ||
The backend is implemented as a RESTful API. It currently provides endpoints for | ||
just the `analysis` entity, but will be expanded to include other entities as | ||
well. | ||
|
||
## Usage | ||
|
||
Run the `launch_api.sh` script to start API server in a hot-reloading development mode. | ||
The server will run on port 9050, unless the env var `API_PORT` is set to another | ||
value. Once it's running, you can access it at http://localhost:9050. | ||
|
||
If the env var `USE_SLURM` is equal to 1, the script will create a basic SLURM | ||
configuration and then launch `munge`, a client used to authenticate to the | ||
SLURM cluster. The template that configures the backend's connection to SLURM | ||
can be found at `./cluster_config/slurm.conf.template`. | ||
|
||
The script then applies any outstanding database migrations via | ||
[atlas](https://github.com/ariga/atlas). Finally the API server is started by | ||
executing the `entrypoint.R` script via | ||
[drip](https://github.com/siegerts/drip), which restarts the server whenever | ||
there are changes to the code. | ||
|
||
*(Side note: the entrypoint contains a bit of custom logic to | ||
defer actually launching the server until the port it listens on is free, since | ||
drip doesn't cleanly shut down the old instance of the server.)* | ||
|
||
## Implementation | ||
|
||
The backend is implemented in [Plumber](https://www.rplumber.io/index.html), a | ||
package for R that allows for the creation of RESTful APIs. The API is defined | ||
in the `api/plumber.R` file, which defines the router and some shared metadata | ||
routes. The rest of the routes are brought in from the `endpoints/` directory. | ||
|
||
Currently implemented endpoints: | ||
- `POST /analyses`: Create a new analysis | ||
- `GET /analyses`: Get all analyses | ||
- `GET /analyses/:id`: Get a specific analysis by its ID | ||
- `GET /analyses/:id/status`: Get just the status field for an analysis by its ID | ||
|
||
*(TBC: more comprehensive docs; see the [Swagger docs](http://localhost:9050/__docs__/) for now)* | ||
|
||
## Database Schema | ||
|
||
The backend uses a PostgreSQL database to store analyses. The database's schema | ||
is managed by [atlas](https://github.com/ariga/atlas); you can find the current | ||
schema definition at `./schema/schema.pg.hcl`. After changing the schema, you | ||
can create a "migration", i.e. a set of SQL statements that will bring the | ||
database up to date with the new schema, by running `./schema/makemigration.sh | ||
<reason>`; if all is well with the schema, the new migration will be put in | ||
`./schema/migrations/`. | ||
|
||
Any pending migrations are applied automatically when the backend starts up, but | ||
you can manually apply new migrations by running `./schema/apply.sh`. | ||
|
||
## Testing | ||
|
||
You can run the tests for the backend by running the `run_tests.sh` script. The | ||
script will recursively search for all files with the pattern `test_*.R` in the | ||
`tests/` directory and run them. Tests are written using the | ||
[testthat](https://testthat.r-lib.org/) package. | ||
|
||
Note that the tests currently depend on the stack's services being available, so | ||
you should run the tests from within the backend container after having started | ||
the stack normally. An easy way to do that is to execute `./run_stack.sh shell` | ||
in the repo root, which will give you an interactive shell in the backend | ||
container. Eventually, we'll have them run in their own environment, which the | ||
`run_tests.sh` script will likely orchestrate. | ||
|
||
## Implementation Details | ||
|
||
### Domain Entities | ||
|
||
*NOTE: the backend is as of now a work in progress, so expect this to change.* | ||
|
||
The backend includes, or will include, the following entities: | ||
|
||
- `User`: Represents a user of the system. At the moment logins aren't required, | ||
so all regular users are the special "Anonymous" user. Admins have individual | ||
accounts. | ||
- `Analysis`: Represents an analysis submitted by a user. Each analysis has a | ||
unique ID and is associated with a user. analyses contain the following | ||
sub-entities: | ||
- `AnalysisSubmission`: Represents the submission of a Analysis, e.g. the | ||
data itself as well the submission's parameters (both selected by the | ||
user and supplied by the system). | ||
- `AnalysisStatus`: Represents the status of a Analysis. Each Analysis has a status | ||
associated with it, which is updated as the Analysis proceeds through its | ||
processing stages. | ||
- `AnalysisStatus`: Represents the status of a Analysis. Each Analysis has a | ||
status associated with it, which is updated as the Analysis proceeds through | ||
its processing stages. | ||
- `AnalysisResult`: Represents the result of a Analysis. | ||
- `Cluster`: Represents the status of the overall cluster, including | ||
how many analyses have been completed, how many are in the queue, | ||
and other statistics related to the processing of analyses. | ||
- `Queue`: Represents the status of processing analyses, including how many | ||
analyses have been completed, how many are in the queue, and other statistics. | ||
- `System`: Represents the system as a whole, including the version of the | ||
backend, the version of the frontend, and other metadata about the system. | ||
Includes runtime statistics about the execution environment as well, such as RAM | ||
and CPU usage. Includes cluster information, too, such as node uptime and | ||
health. | ||
|
||
## Implementation | ||
### Job Processing | ||
|
||
*NOTE: we use the term "job" here to indicate any asynchronous task that the | ||
backend needs to perform outside of the request-response cycle. It's not related | ||
to the app domain's terminology of a "job" (i.e. an analysis).* | ||
|
||
The backend is implemented in Plumber, a package for R that allows for the | ||
creation of RESTful APIs. The API is defined in the `api/router.R` file, which | ||
contains the endpoints for the API. Supporting files are found in | ||
`api/resources/`. | ||
The backend makes use of | ||
[future.batchtools](https://future.batchtools.futureverse.org/), an extension | ||
that adds [futures](https://future.futureverse.org/) support to | ||
[batchtools](https://mllg.github.io/batchtools/index.html), a package for | ||
processing asynchronous jobs. The package provides support for many | ||
job-processing systems, including | ||
[SLURM](https://slurm.schedmd.com/documentation.html); more details on | ||
alternative systems can be found in the [`batchtools` package | ||
documentation](https://mllg.github.io/batchtools/articles/batchtools.html). | ||
|
||
The API is then run using the `launch_api.R` file, which starts the Plumber | ||
server. | ||
In our case, we use SLURM; `batchtools` basically wraps SLURM's `sbatch` command | ||
and handles producing a job script for an R callable, submitting the script to | ||
the cluster for execution, and collecting the results to be returned to R. The | ||
template for the job submission script can be found at | ||
`./cluster_config/slurm.tmpl`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# contains shared state for interacting with the job dispatch system | ||
|
||
box::use( | ||
batchtools[makeRegistry], | ||
future.batchtools[...], | ||
future[plan, future, value] | ||
) | ||
|
||
.on_load <- function(ns) { | ||
options(future.cache.path = "/opt/shared-jobs/.future", future.delete = TRUE) | ||
|
||
# create a registry | ||
dir.create("/opt/shared-jobs/jobs-scratch", recursive = TRUE, showWarnings = FALSE) | ||
# reg <- makeRegistry(file.dir = NA, work.dir = "/opt/shared-jobs/jobs-scratch") | ||
# call plan() | ||
plan( | ||
batchtools_slurm, | ||
template = "/app/cluster_config/slurm.tmpl", | ||
resources = list(nodes = 1, cpus = 1, walltime=2700, ncpus=1, memory=1000) | ||
) | ||
} | ||
|
||
#' Takes in a block of code and runs it asynchronously, returning the future | ||
#' @param callable a function that will be run asynchronously in a slurm job | ||
#' @param work.dir the directory to run the code in, which should be visible to worker nodes | ||
#' @return a future object representing the asynchronous job | ||
dispatch <- function(callable, work.dir="/opt/shared-jobs/jobs-scratch") { | ||
# ensure we run jobs in a place where slurm nodes can access them, too | ||
setwd(work.dir) | ||
future(callable()) | ||
} | ||
|
||
box::export(dispatch) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
box::use( | ||
analyses = api/models/analyses, | ||
api/cluster[dispatch] | ||
) | ||
|
||
#' Dispatch an analysis, i.e. create a record for it in the database and submit | ||
#' it to the cluster for processing | ||
#' @param name the name of the analysis | ||
#' @param type the type of the analysis | ||
#' @return the id of the new analysis | ||
dispatchAnalysis <- function(name, type) { | ||
# create the analysis record | ||
analysis_id <- analyses$db_submit_analysis(name, type) | ||
|
||
# print to the error log that we're dispatching this | ||
cat("Dispatching analysis", analysis_id, "\n") | ||
|
||
# dispatch the analysis async (to slurm, or wherever) | ||
promise <- dispatch(function() { | ||
|
||
tryCatch({ | ||
# do the analysis | ||
analyses$db_update_analysis_status(analysis_id, "analyzing") | ||
|
||
# FIXME: implement calls to the molevolvr package to perform the | ||
# analysis. we may fire off additional 'dispatch()' calls if we | ||
# need to parallelize things. | ||
|
||
# --- begin testing section which should be removed --- | ||
|
||
# for now, just do a "task" | ||
Sys.sleep(1) # pretend we're doing something | ||
|
||
# if type is "break", raise an error to test the handler | ||
if (type == "break") { | ||
stop("test error") | ||
} | ||
|
||
# --- end testing section --- | ||
|
||
# finalize when we're done | ||
analyses$db_update_analysis_status(analysis_id, "complete") | ||
}, error = function(e) { | ||
# on error, log the error and update the status | ||
analyses$db_update_analysis_status(analysis_id, "error", reason=e$message) | ||
cat("Error in analysis ", analysis_id, ": ", e$message, "\n") | ||
flush() | ||
}) | ||
|
||
cat("Analysis", analysis_id, " completed\n") | ||
}) | ||
|
||
return(analysis_id) | ||
} | ||
|
||
box::export(dispatchAnalysis) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#' Helpers for returning error responses from the API | ||
|
||
api_404_if_empty <- function(result, res, error_message="Not found") { | ||
if (isTRUE(nrow(result) == 0 || is.null(result) || length(result) == 0)) { | ||
cat("Returning 404\n") | ||
res$status <- 404 | ||
return(error_message) | ||
} | ||
|
||
return(result) | ||
} | ||
|
||
box::export(api_404_if_empty) |
File renamed without changes.
Oops, something went wrong.