diff --git a/joss.06321/10.21105.joss.06321.crossref.xml b/joss.06321/10.21105.joss.06321.crossref.xml new file mode 100644 index 0000000000..cb026a492f --- /dev/null +++ b/joss.06321/10.21105.joss.06321.crossref.xml @@ -0,0 +1,283 @@ + + + + 20240327T234716-53783c8ea0ec973b06e30e6b878e3b11d4af281c + 20240327234716 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 03 + 2024 + + + 9 + + 95 + + + + TDApplied: An R package for machine learning and +inference with persistence diagrams + + + + Shael + Brown + https://orcid.org/0000-0001-8868-2867 + + + Reza + Farivar-Mohseni + https://orcid.org/0000-0002-3123-2627 + + + + 03 + 27 + 2024 + + + 6321 + + + 10.21105/joss.06321 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.10814141 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/6321 + + + + 10.21105/joss.06321 + https://joss.theoj.org/papers/10.21105/joss.06321 + + + https://joss.theoj.org/papers/10.21105/joss.06321.pdf + + + + + + TDA: Statistical tools for topological data +analysis + Fasy + 2021 + Fasy, B. T., Kim, J., Lecci, F., +Maria, C., Millman, D. L., & Rouvreau., V. (2021). TDA: Statistical +tools for topological data analysis. +https://CRAN.R-project.org/package=TDA + + + TDAstats: Pipeline for topological data +analysis + Wadhwa + 2019 + Wadhwa, R., Dhawan, A., Williamson, +D., & Scott, J. (2019). TDAstats: Pipeline for topological data +analysis. https://github.com/rrrlw/TDAstats + + + TDAstats: R pipeline for computing persistent +homology in topological data analysis + Wadhwa + Journal of Open Source +Software + 28 + 3 + 10.21105/joss.00860 + 2018 + Wadhwa, R. R., Williamson, D. F. K., +Dhawan, A., & Scott, J. G. (2018). TDAstats: R pipeline for +computing persistent homology in topological data analysis. Journal of +Open Source Software, 3(28), 860. +https://doi.org/10.21105/joss.00860 + + + Topological persistence and +simplification + Edelsbrunner + Discrete & Computational +Geometry + 28 + 10.1007/s00454-002-2885-2 + 2000 + Edelsbrunner, H., Letscher, D., & +Zomorodian, A. (2000). Topological persistence and simplification. +Discrete & Computational Geometry, 28, 511–533. +https://doi.org/10.1007/s00454-002-2885-2 + + + Computing persistent homology + Zomorodian + Discrete and Computational +Geometry + 33 + 10.1007/s00454-004-1146-y + 2005 + Zomorodian, A., & Carlsson, G. +(2005). Computing persistent homology. Discrete and Computational +Geometry, 33, 249–274. +https://doi.org/10.1007/s00454-004-1146-y + + + devtools: Tools to make developing R packages +easier + Wickham + 2021 + Wickham, H., Hester, J., Chang, W., +& Bryan, J. (2021). devtools: Tools to make developing R packages +easier. +https://CRAN.R-project.org/package=devtools + + + Hypothesis testing for topological data +analysis + Robinson + Journal of Applied and Computational +Topology + 1 + 10.1007/s41468-017-0008-7 + 2017 + Robinson, A., & Turner, K. +(2017). Hypothesis testing for topological data analysis. Journal of +Applied and Computational Topology, 1. +https://doi.org/10.1007/s41468-017-0008-7 + + + Persistence Fisher kernel: A Riemannian +manifold kernel for persistence diagrams + Le + Advances in neural information processing +systems + 31 + 10.48550/arXiv.1802.03569 + 2018 + Le, T., & Yamada, M. (2018). +Persistence Fisher kernel: A Riemannian manifold kernel for persistence +diagrams. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. +Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information +processing systems (Vol. 31). Curran Associates, Inc. +https://doi.org/10.48550/arXiv.1802.03569 + + + Topological data analysis reveals robust +alterations in the whole-brain and frontal lobe functional connectomes +in attention-deficit/hyperactivity disorder + Gracia-Tabuenca + eneuro + 10.1523/eneuro.0543-19.2020 + 2020 + Gracia-Tabuenca, Z., Diaz-Patino, J. +C., Arelio, I., & Alcauter, S. (2020). Topological data analysis +reveals robust alterations in the whole-brain and frontal lobe +functional connectomes in attention-deficit/hyperactivity disorder. +Eneuro. +https://doi.org/10.1523/eneuro.0543-19.2020 + + + testthat: Get started with +testing + Wickham + The R Journal + 3 + 10.32614/rj-2011-002 + 2011 + Wickham, H. (2011). testthat: Get +started with testing. The R Journal, 3, 5–10. +https://doi.org/10.32614/rj-2011-002 + + + Machine learning with persistent homology and +chemical word embeddings improves prediction accuracy and +interpretability in metal-organic frameworks + Krishnapriyan + Nature Scientific Report + 11 + 10.1038/s41598-021-88027-8 + 2021 + Krishnapriyan, A. S. et al. (2021). +Machine learning with persistent homology and chemical word embeddings +improves prediction accuracy and interpretability in metal-organic +frameworks. Nature Scientific Report, 11. +https://doi.org/10.1038/s41598-021-88027-8 + + + Unsupervised geometric and topological +approaches for cross-lingual sentence representation and +comparison + Haim Meirom + Proceedings of the 7th workshop on +representation learning for NLP + 10.18653/v1/2022.repl4nlp-1.18 + 2022 + Haim Meirom, S., & Bobrowski, O. +(2022). Unsupervised geometric and topological approaches for +cross-lingual sentence representation and comparison. Proceedings of the +7th Workshop on Representation Learning for NLP, 173–183. +https://doi.org/10.18653/v1/2022.repl4nlp-1.18 + + + Topological data analysis in medical imaging: +Current state of the art + Singh + Insights into Imaging + 1 + 14 + 10.1186/s13244-023-01413-w + 2023 + Singh, Y., Farrelly, C. M., Hathaway, +Q. A., Leiner, T., Jagtap, J., Carlsson, G. E., & Erickson, B. J. +(2023). Topological data analysis in medical imaging: Current state of +the art. Insights into Imaging, 14(1), 58. +https://doi.org/10.1186/s13244-023-01413-w + + + Multidimensional scaling + Cox + Handbook of data +visualization + 10.1007/978-3-540-33037-0_14 + 978-3-540-33037-0 + 2008 + Cox, M. A. A., & Cox, T. F. +(2008). Multidimensional scaling. In Handbook of data visualization (pp. +315–347). Springer Berlin Heidelberg. +https://doi.org/10.1007/978-3-540-33037-0_14 + + + + + + diff --git a/joss.06321/10.21105.joss.06321.jats b/joss.06321/10.21105.joss.06321.jats new file mode 100644 index 0000000000..759c241021 --- /dev/null +++ b/joss.06321/10.21105.joss.06321.jats @@ -0,0 +1,469 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6321 +10.21105/joss.06321 + +TDApplied: An R package for machine learning and +inference with persistence diagrams + + + +https://orcid.org/0000-0001-8868-2867 + +Brown +Shael + + + + +https://orcid.org/0000-0002-3123-2627 + +Farivar-Mohseni +Reza + + + + + +Department of Quantitative Life Sciences, McGill +University, Montreal, Canada + + + + +McGill Vision Research, Department of Opthamology, McGill +University, Montreal, Canada + + + + +24 +1 +2024 + +9 +95 +6321 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +R +topological data analysis +persistent homology + + + + + + Summary +

Topological data analysis is a collection of tools, based on the + mathematical fields of topology and geometry, for finding structure in + whole datasets. Its main tool, persistent homology + (Edelsbrunner + et al., 2000; + Zomorodian + & Carlsson, 2005), computes a shape descriptor of a dataset + called a persistence diagram which encodes information about holes + that exist in the dataset (example applications span a variety of + areas, see for example Gracia-Tabuenca et al. + (2020), + Haim Meirom & Bobrowski + (2022), + and Krishnapriyan + (2021)). + These types of features cannot be identified by other methods, making + persistence diagrams a unique and valuable data science object for + studying and comparing datasets. The two most popular data science + tools for analyzing multiple objects are machine learning and + inference, but to date there has been no open source implementation of + published methods for machine learning and inference of persistence + diagrams.

+
+ + Statement of need +

TDApplied is the first R package for machine + learning and inference of persistence diagrams, building on the main R + packages for the calculation of persistence diagrams + TDA + (Fasy et + al., 2021) and TDAstats + (R. + Wadhwa et al., 2019; + R. + R. Wadhwa et al., 2018) and publications of applied analysis + methods for persistence diagrams + (Le + & Yamada, 2018; + Robinson + & Turner, 2017). TDApplied is + intended to be used by academic researchers and industry professionals + wanting to integrate persistence diagrams into their analysis + workflows. An example TDApplied workflow, in + which the topological differences between three datasets are + visualized in 2D using multidimensional scaling (MDS) + (Cox + & Cox, 2008), is visualized in + [fig:software]:

+ +

An example TDApplied workflow. A + dataset (D1, left) contains one loop (yellow) and two clusters (the + loop forms one cluster and the three points on the bottom are + another cluster, and clusters are denoted by the color red). These + topological features are captured with persistent homology in a + persistence diagram PD1 (middle top), and two other data sets, D2 + and D3 (not shown), have their persistence diagrams, PD2 and PD3, + computed (middle center and middle bottom). PD1 and PD2 are not very + topologically different in terms of their loops, with both + containing a loop with similar birth and death values, and this is + represented by a dashed-line relationship. On the other hand, PD2 + and PD3 are topologically different in terms of their loops because + PD3 does not contain a loop, and this is represented by a + dotted-line relationship. TDApplied can + quantify these topological differences and use MDS to project the + persistence diagrams into three points in a 2D embedding space + (right) where interpoint distances reflect the topological + differences between the persistence diagrams. +

+ +
+

The TDApplied package is built on three main + pillars:

+ + +

User-friendly – internal preprocessing of persistence diagrams + that would normally be left to R users to figure out ad hoc, and + functions designed to easily flow from input diagrams to output + metrics.

+
+ +

Efficient – parallelization, C code, computational tricks and + storage of reusable and cumbersome calculations significantly + increases the feasibility of topological analyses (compared to + existing R packages).

+
+ +

Flexible – ability to interface with other data science + packages to create personalized analyses.

+
+
+

TDApplied has already been featured in a + conference + workshop and a + conference + tutorial, utilized in a journal publication + (Singh + et al., 2023) and downloaded over 4400 times. Therefore, we + propose TDApplied as a user-friendly, efficient + and flexible R package for the analysis of multiple datasets using + machine learning and inference via topological data analysis.

+
+ + Project Management +

Installation and availability: + TDApplied can be installed directly from CRAN + using the command + install.packages("TDApplied"), or + from GitHub using the devtools package + (Wickham + et al., 2021). TDApplied is distributed + under the GPL-3 license.

+

Code quality: Code has been tested using the + testthat package + (Wickham, + 2011), with 91.45% coverage of R code when not skipping tests + involving Python code (or 88.44% coverage when skipping the Python + tests).

+

Documentation: TDApplied + contains five main vignettes:

+ + +

“TDApplied Theory and Practice” provides example function usage + on simulated data as well as mathematical background and + intuition,

+
+ +

“Human Connectome Project Analysis” demonstrates an applied + example analysis of neurological data,

+
+ +

“Benchmarking and Speedups” outlines the package’s optimization + strategies and highlights performance gains compared to other + packages,

+
+ +

“Personalized Analyses with TDApplied” demonstrates how to + interface TDApplied with other data science + packages, and

+
+ +

“Comparing Distance Calculations” accounts for differences in + computed distance values between persistence diagrams across + comparable packages.

+
+
+
+ + Acknowledgements +

We acknowledge funding from the CIHR 2016 grant for cortical + mechanisms of 3-D scene and object recognition in the primate + brain.

+
+ + + + + + + FasyBrittany T. + KimJisu + LecciFabrizio + MariaClement + MillmanDavid L. + Rouvreau.Vincent + + TDA: Statistical tools for topological data analysis + 2021 + https://CRAN.R-project.org/package=TDA + + + + + + WadhwaRaoul + DhawanAndrew + WilliamsonDrew + ScottJacob + + TDAstats: Pipeline for topological data analysis + 2019 + https://github.com/rrrlw/TDAstats + + + + + + WadhwaRaoul R. + WilliamsonDrew F. K. + DhawanAndrew + ScottJacob G. + + TDAstats: R pipeline for computing persistent homology in topological data analysis + Journal of Open Source Software + 2018 + 3 + 28 + https://doi.org/10.21105/joss.00860 + 10.21105/joss.00860 + 860 + + + + + + + EdelsbrunnerHerbert + LetscherDavid + ZomorodianAfra + + Topological persistence and simplification + Discrete & Computational Geometry + 2000 + 28 + 10.1007/s00454-002-2885-2 + 511 + 533 + + + + + + ZomorodianAfra + CarlssonGunnar + + Computing persistent homology + Discrete and Computational Geometry + 200502 + 33 + 10.1007/s00454-004-1146-y + 249 + 274 + + + + + + WickhamHadley + HesterJim + ChangWinston + BryanJennifer + + devtools: Tools to make developing R packages easier + 2021 + https://CRAN.R-project.org/package=devtools + + + + + + RobinsonAndrew + TurnerKatharine + + Hypothesis testing for topological data analysis + Journal of Applied and Computational Topology + 2017 + 1 + 10.1007/s41468-017-0008-7 + + + + + + LeTam + YamadaMakoto + + Persistence Fisher kernel: A Riemannian manifold kernel for persistence diagrams + Advances in neural information processing systems + + BengioS. + WallachH. + LarochelleH. + GraumanK. + Cesa-BianchiN. + GarnettR. + + Curran Associates, Inc. + 2018 + 31 + https://proceedings.neurips.cc/paper/2018/file/959ab9a0695c467e7caf75431a872e5c-Paper.pdf + 10.48550/arXiv.1802.03569 + + + + + + + + Gracia-TabuencaZeus + Diaz-PatinoJuan Carlos + ArelioIsaac + AlcauterSarael + + Topological data analysis reveals robust alterations in the whole-brain and frontal lobe functional connectomes in attention-deficit/hyperactivity disorder + eneuro + 2020 + 10.1523/eneuro.0543-19.2020 + + + + + + WickhamHadley + + testthat: Get started with testing + The R Journal + 2011 + 3 + https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf + 10.32614/rj-2011-002 + 5 + 10 + + + + + + KrishnapriyanAditi S. et al + + Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks + Nature Scientific Report + 202104 + 11 + 10.1038/s41598-021-88027-8 + + + + + + + + Haim MeiromShaked + BobrowskiOmer + + Unsupervised geometric and topological approaches for cross-lingual sentence representation and comparison + Proceedings of the 7th workshop on representation learning for NLP + Association for Computational Linguistics + Dublin, Ireland + 202205 + https://aclanthology.org/2022.repl4nlp-1.18 + 10.18653/v1/2022.repl4nlp-1.18 + 173 + 183 + + + + + + SinghYashbir + FarrellyColleen M. + HathawayQuincy A. + LeinerTim + JagtapJaidip + CarlssonGunnar E. + EricksonBradley J. + + Topological data analysis in medical imaging: Current state of the art + Insights into Imaging + 2023 + 14 + 1 + 10.1186/s13244-023-01413-w + 58 + + + + + + + CoxMichael A. A. + CoxTrevor F. + + Multidimensional scaling + Handbook of data visualization + Springer Berlin Heidelberg + Berlin, Heidelberg + 2008 + 978-3-540-33037-0 + https://doi.org/10.1007/978-3-540-33037-0_14 + 10.1007/978-3-540-33037-0_14 + 315 + 347 + + + + +
diff --git a/joss.06321/10.21105.joss.06321.pdf b/joss.06321/10.21105.joss.06321.pdf new file mode 100644 index 0000000000..5b8dfe891a Binary files /dev/null and b/joss.06321/10.21105.joss.06321.pdf differ diff --git a/joss.06321/media/software.pdf b/joss.06321/media/software.pdf new file mode 100644 index 0000000000..b3758b2964 Binary files /dev/null and b/joss.06321/media/software.pdf differ