LCSB-BioCore · exaexa · Jul 7, 2021 · Jul 6, 2021 · Jul 6, 2021 · Jul 7, 2021
diff --git a/docs/src/advanced/2_custom_model.md b/docs/src/advanced/2_custom_model.md
@@ -1,2 +1,153 @@
 
-# Custom models
+# Working with custom models
+
+It may happen that the intuitive representation of your data does not really
+match what is supported by a given COBRA package. COBREXA.jl attempts to avoid
+this problem by providing a flexible framework for containing any data
+structure that can, somehow, represent the constraint-based model.
+
+The task of having such a polymorphic model definition can be split into 2
+separate concerns:
+
+- How to allow the analysis functions to gather the required information from
+  any user-specified model data structure?
+- How to make the reconstruction functions (i.e., reaction or gene deletions)
+  work properly on any data structure?
+
+To solve the first concern, COBREXA.jl specifies a set of generic accessors
+that work over the abstract type [`MetabolicModel`](@ref). To use your data
+structure in a model, you just make it a subtype of [`MetabolicModel`](@ref)
+and overload the required accessors. The accessors are functions that extract
+some relevant information, such as [`stoichiometry`](@ref) and
+[`bounds`](@ref), returning a fixed simple data type that can be further used
+by COBREXA.  You may see a complete list of accessors
+[here](../functions#Base-Types).
+
+A good solution to the second concern is a slightly more involved, as writing
+generic data modifiers is notoriously hard. Still, there is support for easily
+making small changes to the model using the modifications system, with
+functions such as [`with_added_reactions`](@ref) and [`with_set_bound`](@ref).
+
+!!! tip "Notebook available"
+    A better example of using a custom model structure is available
+    [in a separate notebook](../notebooks/8_custom_model.md).
+
+## Writing the generic accessors
+
+Let's write a data structure that represents a very small model that contains N
+metabolites that are converted in a circle through N linear, coupled reactions.
+(E.g., for N=3, we would have a conversion of metabolites A, B and C ordered as
+A → B → C → A.) This may be useful for testing purposes; we will use it for a
+simple demonstration.
+
+The whole model can thus be specified with a single integer N that represents
+the length of the reaction cycle:
+
+```julia
+struct CircularModel <: MetabolicModel
+    size::Int
+end
+```
+
+First, define the reactions and metabolites:
+
+```julia
+COBREXA.n_reactions(m::CircularModel) = m.size
+COBREXA.n_metabolites(m::CircularModel) = m.size
+
+COBREXA.reactions(m::CircularModel) = ["rxn$i" for i in 1:n_reactions(m)]
+COBREXA.metabolites(m::CircularModel) = ["met$i" for i in 1:n_metabolites(m)]
+```
+
+It is useful to re-use the already defined functions, as that improves the code
+maintainability.
+
+We can continue with the actual linear model properties:
+
+```julia
+function COBREXA.objective(m::CircularModel)
+    c = spzeros(n_reactions(m))
+    c[1] = 1 #optimize the first reaction
+    return c
+end
+
+COBREXA.bounds(m::CircularModel) = (
+    spzeros(n_reactions(m)), # lower bounds
+    sparse(ones(n_reactions(m))), # upper bounds
+)
+
+function COBREXA.stoichiometry(m::CircularModel)
+    nr = n_reactions(m)
+    stoi(i,j) =
+        i == j ? 1.0 :
+        (i % nr + 1) == j  ? -1.0 :
+        0.0 
+
+    sparse([stoi(i,j) for i in 1:nr, j in 1:nr])
+end
+```
+
+You may check that the result now works just as with [`CoreModel`](@ref) and
+[`StandardModel`](@ref):
+
+```julia
+julia> m = CircularModel(5)
+Metabolic model of type CircularModel
+
+  1.0  -1.0    ⋅     ⋅     ⋅ 
+   ⋅    1.0  -1.0    ⋅     ⋅ 
+   ⋅     ⋅    1.0  -1.0    ⋅ 
+   ⋅     ⋅     ⋅    1.0  -1.0
+ -1.0    ⋅     ⋅     ⋅    1.0
+Number of reactions: 5
+Number of metabolites: 5
+
+```
+
+This interface is sufficient to run most of the basic analyses, especially the flux balance finding ones:
+
+```julia
+julia> flux_balance_analysis_dict(m, Tulip.Optimizer)
+Dict{String, Float64} with 5 entries:
+  "rxn5" => 1.0
+  "rxn2" => 1.0
+  "rxn1" => 1.0
+  "rxn3" => 1.0
+  "rxn4" => 1.0
+
+```
+
+## Writing generic model modifications
+
+The custom model structure can also be made compatible with many of the
+existing variant-generating functions and analysis modifiers.
+
+The functions prepared for use as "variants" in [`screen`](@ref), usually
+prefixed by `with_`, have their generic variants that only call simpler,
+overloadable functions for each specific model. This choice is based on the
+overloading dispatch system of Julia. For
+example,[`with_removed_metabolites`](@ref) is implemented very generically by
+reducing the problem to some specific [`remove_metabolites`](@ref) functions
+selected by the dispatch, as follows:
+
+```julia
+with_removed_metabolites(args...; kwargs...) =
+    m -> remove_metabolites(m, args...; kwargs...)
+```
+
+To be able to use [`with_removed_metabolites`](@ref) in your model, we can just
+overload the actual inner function. For the simple circular model, the
+modification might as well look like this:
+
+```julia
+COBREXA.remove_metabolites(m::CircularModel, n::Int) =
+    return CircularModel(m.size - n)
+```
+
+!!! danger "Functions that generate model variants must be pure"
+    Notice that the function is "pure", i.e., does not make any in-place
+    modifications to the original model structure. That property is required
+    for [`screen`](@ref) and other functions to properly and predictably apply
+    the modifications to the model. To expose potential in-place modifications
+    to your codebase, you should instead overload the "bang" counterpart of
+    remove metabolites, called [`remove_metabolites!`](@ref).
diff --git a/docs/src/advanced/3_custom_models_on_hpc.md b/docs/src/advanced/3_custom_models_on_hpc.md
diff --git a/docs/src/advanced/4_advanced_screening.md b/docs/src/advanced/4_advanced_screening.md
diff --git a/docs/src/advanced/5_debugging.md b/docs/src/advanced/5_debugging.md
diff --git a/docs/src/notebooks/8_custom_model.jl b/docs/src/notebooks/8_custom_model.jl
@@ -0,0 +1,135 @@
+
+# # Using a custom model data structure
+
+#md # [![](https://mybinder.org/badge_logo.svg)](@__BINDER_ROOT_URL__/notebooks/@__NAME__.ipynb)
+#md # [![](https://img.shields.io/badge/show-nbviewer-579ACA.svg)](@__NBVIEWER_ROOT_URL__/notebooks/@__NAME__.ipynb)
+
+# This notebooks shows how to utilize the generic accessors and modification
+# functions in COBREXA.jl to run the analysis on any custom model type. We will
+# create a simple dictionary-style structure that describes the model, allow
+# COBREXA to run a FVA on it, and create a simple reaction-removing
+# modification.
+
+# First, let's define a very simple stoichiometry-only structure for the model:
+
+using COBREXA
+
+mutable struct MyReaction
+    max_rate::Float64 # maximum absolute conversion rate
+    stoi::Dict{String,Float64} # stoichimetry of the reaction
+
+    MyReaction() = new(0.0, Dict{String,Float64}())
+end
+
+mutable struct MyModel <: MetabolicModel
+    optimization_target::String # the "objective" reaction name
+    reactions::Dict{String,MyReaction} # dictionary of reactions
+
+    MyModel() = new("", Dict{String,MyReaction}())
+    MyModel(o, r) = new(o, r)
+end
+
+# With this, we can start defining the accessors:
+
+COBREXA.n_reactions(m::MyModel) = length(m.reactions)
+COBREXA.reactions(m::MyModel) = sort(collect(keys(m.reactions)))
+
+# Metabolites are defined only very implicitly, so let's just make a function
+# that collects all names. [`n_metabolites`](@ref) can be left at the default
+# definition that just measures the output of [`metabolites`](@ref).
+
+function COBREXA.metabolites(m::MyModel)
+    mets = Set{String}()
+    for (_, r) in m.reactions
+        for (m, _) in r.stoi
+            push!(mets, m)
+        end
+    end
+    return sort(collect(mets))
+end
+
+# Now, the extraction of the linear model. Remember the order of element in the
+# vectors must match the order in the output of [`reactions`](@ref) and
+# [`metabolites`](@ref).
+
+using SparseArrays
+
+function COBREXA.bounds(m::MyModel)
+    max_rates = [m.reactions[r].max_rate for r in reactions(m)]
+    (sparse(-max_rates), sparse(max_rates))
+end
+
+function COBREXA.objective(m::MyModel)
+    if m.optimization_target in keys(m.reactions)
+        c = spzeros(n_reactions(m))
+        c[first(indexin([m.optimization_target], reactions(m)))] = 1.0
+        c
+    else
+        throw(
+            DomainError(
+                m.optimization_target,
+                "The target reaction for flux optimization not found",
+            ),
+        )
+    end
+end
+
+function COBREXA.stoichiometry(m::MyModel)
+    sparse([
+        get(m.reactions[rxn].stoi, met, 0.0) for met in metabolites(m), rxn in reactions(m)
+    ])
+end
+
+# Now the model is complete! We can generate a random one and see how it
+# performs
+import Random
+Random.seed!(123)
+
+rxn_names = ["Reaction $i" for i = 'A':'Z'];
+metabolite_names = ["Metabolite $i" for i = 1:20];
+
+m = MyModel();
+for i in rxn_names
+    m.reactions[i] = MyReaction()
+end
+
+for i = 1:50
+    rxn = rand(rxn_names)
+    met = rand(metabolite_names)
+    m.reactions[rxn].stoi[met] = rand([-3, -2, -1, 1, 2, 3])
+    m.reactions[rxn].max_rate = rand()
+end
+
+# Let's see what the model looks like now:
+m
+
+# We can run most of the standard function on the model data right away:
+using Tulip
+m.optimization_target = "Reaction A"
+flux_balance_analysis_dict(m, Tulip.Optimizer)
+
+
+# To be able to use the model conveniently in functions such as
+# [`screen`](@ref), you usually want to be able to easily specify the
+# modifications. In this example, we enable use of
+# [`with_removed_reactions`](@ref) by overloading the internal
+# [`remove_reactions`](@ref) for this specific model type:
+#
+# We need to make an as-shallow-as-possible copy of the model that allows us to
+# remove the reactions without breaking the original model.
+
+function COBREXA.remove_reactions(m::MyModel, rxns::AbstractVector{String})
+    m = MyModel(m.optimization_target, copy(m.reactions))
+    delete!.(Ref(m.reactions), rxns)
+    return m
+end
+
+# The screening is ready now!
+
+reactions_to_remove = ("Reaction $i" for i = 'B':'Z')
+
+reactions_to_remove .=> screen_variants(
+    m,
+    [[with_removed_reactions([r])] for r in reactions_to_remove],
+    m -> flux_balance_analysis_dict(m, Tulip.Optimizer)["Reaction A"],
+)