Network-guided greedy decision forest for feature subset selection

Paper

https://www.nature.com/articles/s41598-022-21417-8

Installation

The DFNET R-package can be installed using devtools.

install.packages("devtools")
devtools::install_github("pievos101/DFNET")

Usage

See our examples using synthetic data sets or real world cancer data.

Generally speaking, DFNET follows a four step process:

Preparing the input data (graph and features)
Training the forest.
Finding useful decision trees.
Using these trees for evaluation.

Preparing input data

DFNET expects an igraph::igraph and a 2D or 3D feature array, as well as a target vector with the same number of rows as the array. The vertex names of the graph should be the same as the column names of the array. When in doubt, use launder or related functions to prepare the input data.

Training the forest

Once you have your graph and features, you can train your forest like so:

forest <- train(,
    graph, features, target,
    ...
)

If you have a pre-trained forest, you can use that for training as well:

forest <- train(forest,
    graph, features, target,
    ...
)

Finding useful trees

Since DFNET performs greedy optimization, the last generation of trees is the best according to the provided test metric. DFNET provides overrides for the standard R methods head and tail, which return generation.

# get the selected modules
last_gen <- tail(forest, 1)
tree_imp <- attr(last_gen, "last.performance")

Note, that performance metrics for earlier generations are not kept. Several importance scores can be derived from these metrics.

e_imp <- edge_importance(graph, last_gen$trees, tree_imp)

f_imp <- feature_importance(last_gen, features)

m_imp <- module_importance(
    graph,
    last_gen$modules,
    e_imp,
    tree_imp
)

The module importance is particularly useful for feature selection, as it combines the importance of edges within a module with the overall accuracy of the decision tree. You can use it to order decision trees or simply extract the best one.

best <- which.max(as.numeric(m_imp[, "total"]))
best.tree <- last_gen$trees[[best]]

by_importance <- order(m_imp[, "total"], decreasing = TRUE)
last_gen$trees[by_importance]

Using these trees for evaluation

DFNET provides an override for the predict method, that functions much like ranger's.

# Predict using the best DT
pred_best = predict(best.tree, test_data)$predictions

# predict using all detected modules
pred_all = predict(last_gen, test_data)$predictions

You can use ModelMetrics to evaluate the accuracy, precision, recall, or other performance metrics.

ModelMetrics::auc(pred_best, test_target)
ModelMetrics::auc(pred_all, test_target)

Now, lets check the performance of that module on the independent test data set. We compare the results with the performance of all trees selected.

# Prepare test data
colnames(mRNA_test)  = paste(colnames(mRNA_test),"$","mRNA", sep="")
colnames(Methy_test) = paste(colnames(Methy_test),"$","Methy", sep="")
DATA_test = as.data.frame(cbind(mRNA_test, Methy_test))

# Predict using the best DT
pred_best = predict(best_DT, DATA_test)$predictions

# predict using all detected modules
pred_all = predict(last_gen, DATA_test)$predictions

pred_best
pred_all

# Check the performance of the predictions
ModelMetrics::auc(pred_best, target[test_ids])
ModelMetrics::auc(pred_all, target[test_ids])

Finally, we provide an extension to compute tree-based SHAP values via treeshap.

forest_unified = dfnet.unify(last_gen$trees, test_data)
forest_shap = treeshap(forest_unified, test_data)

BibTeX Citation

@article{pfeifer2022multi,
  title={Multi-omics disease module detection with an explainable Greedy Decision Forest},
  author={Pfeifer, Bastian and Baniecki, Hubert and Saranti, Anna and Biecek, Przemyslaw and Holzinger, Andreas},
  journal={Scientific Reports},
  volume={12},
  number={1},
  pages={1--15},
  year={2022},
  publisher={Nature Publishing Group}
}

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
R		R
examples		examples
extensions		extensions
man		man
tests		tests
tools		tools
.Rbuildignore		.Rbuildignore
COPYING		COPYING
DESCRIPTION		DESCRIPTION
DFNET_logo.png		DFNET_logo.png
DFNET_plot.png		DFNET_plot.png
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Network-guided greedy decision forest for feature subset selection

Paper

https://www.nature.com/articles/s41598-022-21417-8

Installation

Usage

Preparing input data

Training the forest

Finding useful trees

Using these trees for evaluation

BibTeX Citation

About

Releases

Packages

Languages

License

pievos101/DFNET

Folders and files

Latest commit

History

Repository files navigation

Network-guided greedy decision forest for feature subset selection

Paper

https://www.nature.com/articles/s41598-022-21417-8

Installation

Usage

Preparing input data

Training the forest

Finding useful trees

Using these trees for evaluation

BibTeX Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages