Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom model tutorial+notebook #366

Merged
merged 4 commits into from
Jul 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 152 additions & 1 deletion docs/src/advanced/2_custom_model.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,153 @@

# Custom models
# Working with custom models

It may happen that the intuitive representation of your data does not really
match what is supported by a given COBRA package. COBREXA.jl attempts to avoid
this problem by providing a flexible framework for containing any data
structure that can, somehow, represent the constraint-based model.

The task of having such a polymorphic model definition can be split into 2
separate concerns:

- How to allow the analysis functions to gather the required information from
any user-specified model data structure?
- How to make the reconstruction functions (i.e., reaction or gene deletions)
work properly on any data structure?

To solve the first concern, COBREXA.jl specifies a set of generic accessors
that work over the abstract type [`MetabolicModel`](@ref). To use your data
structure in a model, you just make it a subtype of [`MetabolicModel`](@ref)
and overload the required accessors. The accessors are functions that extract
some relevant information, such as [`stoichiometry`](@ref) and
[`bounds`](@ref), returning a fixed simple data type that can be further used
by COBREXA. You may see a complete list of accessors
[here](../functions#Base-Types).

A good solution to the second concern is a slightly more involved, as writing
generic data modifiers is notoriously hard. Still, there is support for easily
making small changes to the model using the modifications system, with
functions such as [`with_added_reactions`](@ref) and [`with_set_bound`](@ref).

!!! tip "Notebook available"
A better example of using a custom model structure is available
[in a separate notebook](../notebooks/8_custom_model.md).

## Writing the generic accessors

Let's write a data structure that represents a very small model that contains N
metabolites that are converted in a circle through N linear, coupled reactions.
(E.g., for N=3, we would have a conversion of metabolites A, B and C ordered as
A → B → C → A.) This may be useful for testing purposes; we will use it for a
simple demonstration.

The whole model can thus be specified with a single integer N that represents
the length of the reaction cycle:

```julia
struct CircularModel <: MetabolicModel
size::Int
end
```

First, define the reactions and metabolites:

```julia
COBREXA.n_reactions(m::CircularModel) = m.size
COBREXA.n_metabolites(m::CircularModel) = m.size

COBREXA.reactions(m::CircularModel) = ["rxn$i" for i in 1:n_reactions(m)]
COBREXA.metabolites(m::CircularModel) = ["met$i" for i in 1:n_metabolites(m)]
```

It is useful to re-use the already defined functions, as that improves the code
maintainability.

We can continue with the actual linear model properties:

```julia
function COBREXA.objective(m::CircularModel)
c = spzeros(n_reactions(m))
c[1] = 1 #optimize the first reaction
return c
end

COBREXA.bounds(m::CircularModel) = (
spzeros(n_reactions(m)), # lower bounds
sparse(ones(n_reactions(m))), # upper bounds
)

function COBREXA.stoichiometry(m::CircularModel)
nr = n_reactions(m)
stoi(i,j) =
i == j ? 1.0 :
(i % nr + 1) == j ? -1.0 :
0.0

sparse([stoi(i,j) for i in 1:nr, j in 1:nr])
end
```

You may check that the result now works just as with [`CoreModel`](@ref) and
[`StandardModel`](@ref):

```julia
julia> m = CircularModel(5)
Metabolic model of type CircularModel

1.0 -1.0 ⋅ ⋅ ⋅
⋅ 1.0 -1.0 ⋅ ⋅
⋅ ⋅ 1.0 -1.0 ⋅
⋅ ⋅ ⋅ 1.0 -1.0
-1.0 ⋅ ⋅ ⋅ 1.0
Number of reactions: 5
Number of metabolites: 5

```

This interface is sufficient to run most of the basic analyses, especially the flux balance finding ones:

```julia
julia> flux_balance_analysis_dict(m, Tulip.Optimizer)
Dict{String, Float64} with 5 entries:
"rxn5" => 1.0
"rxn2" => 1.0
"rxn1" => 1.0
"rxn3" => 1.0
"rxn4" => 1.0

```

## Writing generic model modifications

The custom model structure can also be made compatible with many of the
existing variant-generating functions and analysis modifiers.

The functions prepared for use as "variants" in [`screen`](@ref), usually
prefixed by `with_`, have their generic variants that only call simpler,
overloadable functions for each specific model. This choice is based on the
overloading dispatch system of Julia. For
example,[`with_removed_metabolites`](@ref) is implemented very generically by
reducing the problem to some specific [`remove_metabolites`](@ref) functions
selected by the dispatch, as follows:

```julia
with_removed_metabolites(args...; kwargs...) =
m -> remove_metabolites(m, args...; kwargs...)
```

To be able to use [`with_removed_metabolites`](@ref) in your model, we can just
overload the actual inner function. For the simple circular model, the
modification might as well look like this:

```julia
COBREXA.remove_metabolites(m::CircularModel, n::Int) =
return CircularModel(m.size - n)
```

!!! danger "Functions that generate model variants must be pure"
Notice that the function is "pure", i.e., does not make any in-place
modifications to the original model structure. That property is required
for [`screen`](@ref) and other functions to properly and predictably apply
the modifications to the model. To expose potential in-place modifications
to your codebase, you should instead overload the "bang" counterpart of
remove metabolites, called [`remove_metabolites!`](@ref).
2 changes: 0 additions & 2 deletions docs/src/advanced/3_custom_models_on_hpc.md

This file was deleted.

4 changes: 0 additions & 4 deletions docs/src/advanced/4_advanced_screening.md

This file was deleted.

2 changes: 0 additions & 2 deletions docs/src/advanced/5_debugging.md

This file was deleted.

135 changes: 135 additions & 0 deletions docs/src/notebooks/8_custom_model.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@

# # Using a custom model data structure

#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)

# This notebooks shows how to utilize the generic accessors and modification
# functions in COBREXA.jl to run the analysis on any custom model type. We will
# create a simple dictionary-style structure that describes the model, allow
# COBREXA to run a FVA on it, and create a simple reaction-removing
# modification.

# First, let's define a very simple stoichiometry-only structure for the model:

using COBREXA

mutable struct MyReaction
max_rate::Float64 # maximum absolute conversion rate
stoi::Dict{String,Float64} # stoichimetry of the reaction

MyReaction() = new(0.0, Dict{String,Float64}())
end

mutable struct MyModel <: MetabolicModel
optimization_target::String # the "objective" reaction name
reactions::Dict{String,MyReaction} # dictionary of reactions

MyModel() = new("", Dict{String,MyReaction}())
MyModel(o, r) = new(o, r)
end

# With this, we can start defining the accessors:

COBREXA.n_reactions(m::MyModel) = length(m.reactions)
COBREXA.reactions(m::MyModel) = sort(collect(keys(m.reactions)))

# Metabolites are defined only very implicitly, so let's just make a function
# that collects all names. [`n_metabolites`](@ref) can be left at the default
# definition that just measures the output of [`metabolites`](@ref).

function COBREXA.metabolites(m::MyModel)
mets = Set{String}()
for (_, r) in m.reactions
for (m, _) in r.stoi
push!(mets, m)
end
end
return sort(collect(mets))
end

# Now, the extraction of the linear model. Remember the order of element in the
# vectors must match the order in the output of [`reactions`](@ref) and
# [`metabolites`](@ref).

using SparseArrays

function COBREXA.bounds(m::MyModel)
max_rates = [m.reactions[r].max_rate for r in reactions(m)]
(sparse(-max_rates), sparse(max_rates))
end

function COBREXA.objective(m::MyModel)
if m.optimization_target in keys(m.reactions)
c = spzeros(n_reactions(m))
c[first(indexin([m.optimization_target], reactions(m)))] = 1.0
c
else
throw(
DomainError(
m.optimization_target,
"The target reaction for flux optimization not found",
),
)
end
end

function COBREXA.stoichiometry(m::MyModel)
sparse([
get(m.reactions[rxn].stoi, met, 0.0) for met in metabolites(m), rxn in reactions(m)
])
end

# Now the model is complete! We can generate a random one and see how it
# performs
import Random
Random.seed!(123)

rxn_names = ["Reaction $i" for i = 'A':'Z'];
metabolite_names = ["Metabolite $i" for i = 1:20];

m = MyModel();
for i in rxn_names
m.reactions[i] = MyReaction()
end

for i = 1:50
rxn = rand(rxn_names)
met = rand(metabolite_names)
m.reactions[rxn].stoi[met] = rand([-3, -2, -1, 1, 2, 3])
m.reactions[rxn].max_rate = rand()
end

# Let's see what the model looks like now:
m

# We can run most of the standard function on the model data right away:
using Tulip
m.optimization_target = "Reaction A"
flux_balance_analysis_dict(m, Tulip.Optimizer)


# To be able to use the model conveniently in functions such as
# [`screen`](@ref), you usually want to be able to easily specify the
# modifications. In this example, we enable use of
# [`with_removed_reactions`](@ref) by overloading the internal
# [`remove_reactions`](@ref) for this specific model type:
#
# We need to make an as-shallow-as-possible copy of the model that allows us to
# remove the reactions without breaking the original model.

function COBREXA.remove_reactions(m::MyModel, rxns::AbstractVector{String})
m = MyModel(m.optimization_target, copy(m.reactions))
delete!.(Ref(m.reactions), rxns)
return m
end

# The screening is ready now!

reactions_to_remove = ("Reaction $i" for i = 'B':'Z')

reactions_to_remove .=> screen_variants(
m,
[[with_removed_reactions([r])] for r in reactions_to_remove],
m -> flux_balance_analysis_dict(m, Tulip.Optimizer)["Reaction A"],
)