Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds draft backend API, db integration, db schema migration #5

Merged
merged 7 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .env.TEMPLATE
Original file line number Diff line number Diff line change
@@ -1,8 +1,21 @@
# INSTRUCTIONS:
# --------------
# this file is a template for the .env file, a file that contains secrets
# and should never be committed to the repository.
# you should make a copy of this file called `.env` and fill in the values
# that are missing, e.g. passwords and other options.
# for the passwords, you're free to choose whatever you like; the passwords
# are automatically shared with the services that need them.

# sets a default environment to use when launching run_stack.sh without a
# specified environment
DEFAULT_ENV=dev

# if 0, doesn't open a browser to the frontend webapp on a normal stack launch
DO_OPEN_BROWSER=1

# app database (postgres)
POSTGRES_USER=molevolvr
POSTGRES_PASSWORD=
POSTGRES_DB=molevolvr
POSTGRES_HOST=db-${DEFAULT_ENV}
29 changes: 29 additions & 0 deletions backend/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# MolEvolvR Backend

The backend is implemented as a RESTful API over the following entities:

- `User`: Represents a user of the system. At the moment logins aren't
required, so all regular users are the special "Anonymous" user. Admins
have individual accounts.
- `Analysis`: Represents an analysis submitted by a user. Each analysis has a unique ID
and is associated with a user. analyses contain the following sub-entities:
vincerubinetti marked this conversation as resolved.
Show resolved Hide resolved
- `Submission`: Represents the submission of a Analysis, e.g. the data
vincerubinetti marked this conversation as resolved.
Show resolved Hide resolved
itself as well the submission's parameters (both selected by the
user and supplied by the system).
- `AnalysisStatus`: Represents the status of a Analysis. Each Analysis has a status
associated with it, which is updated as the Analysis proceeds through its
processing stages.
- `AnalysisResult`: Represents the result of a Analysis.
vincerubinetti marked this conversation as resolved.
Show resolved Hide resolved
- `Cluster`: Represents the status of the overall cluster, including
how many analyses have been completed, how many are in the queue,
and other statistics related to the processing of analyses.

## Implementation

The backend is implemented in Plumber, a package for R that allows for the
creation of RESTful APIs. The API is defined in the `api/router.R` file, which
contains the endpoints for the API. Supporting files are found in
`api/resources/`.

The API is then run using the `launch_api.R` file, which starts the Plumber
server.
67 changes: 67 additions & 0 deletions backend/api/db.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#' Provides a connection to the database

#' Gets a connection to the postgres database
#' @export
getCon <- function() {
POSTGRES_HOST = Sys.getenv("POSTGRES_HOST", "db")
POSTGRES_DB = Sys.getenv("POSTGRES_DB", "molevolvr")
POSTGRES_USER = Sys.getenv("POSTGRES_USER")
POSTGRES_PASSWORD = Sys.getenv("POSTGRES_PASSWORD")

# raise an exception if user or password is unset
if (POSTGRES_USER == "" || POSTGRES_PASSWORD == "") {
stop("DB_USER and DB_PASSWORD must be set")
}

con <- DBI::dbConnect(
RPostgres::Postgres(),
dbname = POSTGRES_DB,
host = POSTGRES_HOST,
user = POSTGRES_USER,
password = POSTGRES_PASSWORD
)

return(con)
}

# TODO: implement connection pooling?

# ----------------
# --- helpers
# ----------------

#' Insert a record into a table and return <id_col>
#'
#' Note that this uses a postgres-specific feature, "INSERT ... RETURNING <col>",
#' to retrieve the generated UUID without having to make a separate query.
#'
#' @param target_table The table into which to insert
#' @param new_record A named list of values to insert
#' @param id_col The name of the column to return (default: "id")
#' @param con An existing database connection to use; if NULL, creates a new one (default: NULL)
#' @return The value of the <id_col> column for the inserted record
#' @export
insert_get_id <- function(target_table, new_record, id_col="id", con=NULL) {
if (is.null(con)) {
con <- getCon()
on.exit(DBI::dbDisconnect(con))
}

# Generate the INSERT statement using DBI
# and append a postgres-specific feature, "RETURNING <col>",
# so that we can retrieve the generated UUID without
# having to make a separate query
sql <- paste(
DBI::sqlAppendTableTemplate(
con, target_table, new_record,
prefix="$", pattern="1", row.names=FALSE
), "RETURNING ", DBI::dbQuoteIdentifier(con, id_col)
)

# Execute the query and retrieve the generated UUID
# from the first (hopefully only) resulting record
result <- DBI::dbGetQuery(con, sql, params = unname(new_record))
generated_uuid <- result[[id_col]][1]

return(generated_uuid)
}
61 changes: 61 additions & 0 deletions backend/api/endpoints/analyses.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# endpoints for submitting and checking information about analyses.
# included by the router aggregator in ./plumber.R; all these endpoints are
# prefixed with /analysis/ by the aggregator.

box::use(
analyses = api/models/analyses,
tibble[tibble],
dplyr[select, any_of, mutate],
dbplyr[`%>%`]
)

#* @apiTitle analysis Management

#* Query for all analyses
#* @tag Analyses
#* @serializer jsonExt list(verbose_checks=TRUE)
#* @get /
analysis_list <- function() {
result <- analyses$db_get_analyses()

# postprocess types in the result
# result <- result %>%
# mutate(
# status = as.character(status),
# info = as.character(info)
# )

result
}

#* Query the database for an analysis's status
#* @tag Analyses
#* @serializer jsonExt
#* @get /<id:str>/status
analysis_status <- function(id) {
result <- analyses$db_get_analysis_by_id(id)
result$status
}


#* Query the database for an analysis's complete information.
#* @tag Analyses
#* @serializer jsonExt
#* @get /<id:str>
analysis_by_id <- function(id){
result <- analyses$db_get_analysis_by_id(id)
# result is a tibble with one row, so just
# return that row rather than the entire tibble
result
}

#* Submit a new MolEvolvR analysis, returning the analysis ID
#* @tag Analyses
#* @serializer jsonExt
#* @post /
analysis_submit <- function(name, type) {
# submit the analysis
result <- analyses$db_submit_analysis(name, type)
# the result is a scalar in a vector, so just return the scalar
# result[[1]]
}
13 changes: 13 additions & 0 deletions backend/api/endpoints/stats.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# endpoints for checking on the status of the cluster as a whole.
# included by the router aggregator in ./plumber.R; all these endpoints are
# prefixed with /cluster/ by the aggregator.

#* @apiTitle Cluster Management

#* Query for all jobs; since this is currently not allowed, returns an error
#* @tag Statistics
#* @get /
status <- function(res){
res$status <- 405
list(error = "Remains to be implemented")
}
58 changes: 58 additions & 0 deletions backend/api/models/analyses.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
box::use(
dplyr[select, any_of, mutate, filter, collect, tbl],
dbplyr[`%>%`],
api/db[getCon, insert_get_id]
)

#' submit a new analysis, which starts in the "submitted" state
#' @param name the name of the analysis
#' @param type the type of the analysis
#' @return the id of the new analysis
#' @export
db_submit_analysis <- function(name, type, con=NULL) {
if (is.null(con)) {
con <- getCon()
on.exit(DBI::dbDisconnect(con))
}

# construct our new entry
new_entry <- data.frame(
name = name,
type = type
)

return(insert_get_id("analyses", new_entry, con=con))
}

#' query the 'analyses' table using dbplyr for all analyses
#' @return a data frame containing all analyses
#' @export
db_get_analyses <- function(con=NULL) {
if (is.null(con)) {
con <- getCon()
on.exit(DBI::dbDisconnect(con))
}

analyses <- tbl(con, "analyses")
result <- collect(analyses)

return(result)
}

#' query the 'analyses' table using dbplyr
#' @export
db_get_analysis_by_id <- function(id, con=NULL) {
if (is.null(con)) {
con <- getCon()
on.exit(DBI::dbDisconnect(con))
}

analyses <- tbl(con, "analyses")
analyses %>%
filter(id == !!id) %>%
collect()

# FIXME: perform a join against analysis_event
# and then somehow tuck it into the 'events'
# field. man, i wish i had a real ORM...
}
56 changes: 56 additions & 0 deletions backend/api/plumber.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# plumber.R

box::use(
plumber[...],
api/support/custom_serializers[setup_custom_serializers]
)

# bring in custom serializers
setup_custom_serializers()

#* @apiTitle MolEvolvR 2.0 API
#* @apiTag Meta - Metadata about the API
#* @apiTag Analyses - Operations on analyses
#* @apiTag Statistics - Operations on the cluster as a whole

# allows cross-origin requests from anywhere
#* @filter cors
cors <- function(res) {
res$setHeader("Access-Control-Allow-Origin", "*")
plumber::forward()
}

#* An index of top-level endpoints in the API + metadata
#* @tag Meta
#* @get /
index <- function() {
# return a list of endpoints
list(
analysis = "/analyses/",
docs = "/__docs__/",
stats = "/stats/",
version = "2.0.0"
)
}

# Define a custom error handler that includes a traceback
custom_error_handler <- function(req, res, err) {
# Capture the traceback
traceback <- paste(capture.output(traceback()), collapse = "\n")

# Set the response status code and body
res$status <- 500
list(
error = err$message,
traceback = traceback
)
}

#' @plumber
function(pr) {
pr %>%
pr_set_debug(TRUE) %>%
pr_set_error(custom_error_handler) %>%
pr_mount("/analyses", pr("./endpoints/analyses.R")) %>%
pr_mount("/stats", pr("./endpoints/stats.R"))
}
74 changes: 74 additions & 0 deletions backend/api/support/custom_serializers.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#' Custom JSON serialization functions
#' Implements handlers for types returned by DBI/Rpostgres that can't
#' be serialized by jsonlite by default

box::use(
plumber[register_serializer, serializer_content_type],
api/support/string_helpers[inline_str_list]
)

#' Register custom serializers, e.g. for JSON with specific defaults
setup_custom_serializers <- function() {
# ------------------------------------------------------
# --- jsonExt: json + default options
# ------------------------------------------------------

# Register a custom serializer, 'jsonExt', for JSON that supplies a lot of the defaults
# that we'd otherwise be supplying to every endpoint in the API.
register_serializer(
"jsonExt",
function (
verbose_checks=FALSE,
force = TRUE,
simplifyVector = TRUE,
auto_unbox = TRUE,
na = "null",
pretty = TRUE,
..., type = "application/json"
) {
serializer_content_type(type, function(val) {
# convert other args to list, if specified
other_args <- list(...)

# show additional args passed on to toJSON if verbose_checks is TRUE
if (verbose_checks && length(other_args) > 0) {
message(paste("jsonForceExt extra toJSON opts: ", inline_str_list(other_args)))
}

# wrap toJSON so we can use it both in the verbose and non-verbose cases
encodeJSON <- function(val) {
jsonlite::toJSON(
val,
force = force, simplifyVector = simplifyVector, auto_unbox = auto_unbox, na = na, pretty = pretty,
...
)
}

# if verbose_checks is TRUE, wrap the toJSON call in a tryCatch block.
# this shows serialization errors early, instead of causing plumber to
# throw a cryptic message about [index='status'] not being available,
# presumably because the response object is malformed and '$status' is
# thus not available on it.
if (verbose_checks) {
result <- tryCatch(
{ encodeJSON(val) },
error = function(e) {
# FIXME: perhaps we should just stop() rather than displaying a message?
message(toString(e))
return(NULL)
}
)
}
else {
result <- encodeJSON(val)
}

return(result)
})
}
)

# ------------------------------------------------------
# --- define any extra custom serializers below
# ------------------------------------------------------
}
Loading