Internal TDA pipeline used in Peter Bubenik's group at UF. Currently use's modified code from the following packages:
- The Ripser package by Ulrich Bauer: https://github.com/Ripser/ripser
- The fork of the Ripser package which adds some easier interfacing by Chris Tralie: https://github.com/ctralie/ripser
- The Persistence Landscape toolbox by Paweł Dłotko: https://www.math.upenn.edu/~dlotko/persistenceLandscape.html
-
Since the package is not on CRAN, you will need to install devtools on your system. To do this just open an R session and run: install.packages(‘devtools’)
-
Next clone the github repo either using git clone https://github.com/jjbouza/tda-tools.git or just download from Github using the green button.
-
Run the install.sh bash script in the cloned repo using ./install.sh or bash install.sh, this script will compile the C++ code and install a package “tdatools” to your R package directory along with all of the dependencies.
Now you should be able to load the package normally from an R session using library(‘tdatools’)
The tdatools
package supports two major persistent homology operations: Persistence Diagram computation and Persistence
Landscape computation.
The primary function to be aware of is
diagram <- function(input, type, dim_max=3, threshold=Inf, modulus=2, do_cocycles=0)
The parameters are mostly self explanatory. Say we want to compute first degree homology of points sampled from the unit circle. We can start by defining a function to uniformly sample from the circle.
CircleUnif <- function(n,rad=1,centre=c(0,0)){
x0 <- centre[1] ; y0 <- centre[2]
u <- 2*pi*runif(n)
rad*cbind(x=cos(u)+x0, y=sin(u)+y0)
}
X <- CircleUnif(100)
This function generated an N by 2 matrix corresponding to the point cloud sampled from the circle. Now the persistence diagram calculation is simple:
pd <- diagram(X, 'point-cloud', dim_max=1, threshold=0.5)
Since we are satisfied with the other default parameters, we only change the dim_max
parameter, which tells Ripser
the top dimension to compute homology in, and the threshold
parameter, which tells Ripser when to stop the filtration.
The pd
object is a class with three members, which can be accessed by
pd$member
. The first class member is pd$pairs
, which contains the persistence pairs calculated from X,
indexed in a list by homology degree. So, for example, to view the degree 0 homology of X we can call
print(pd$pairs[[1]])
You should see that all persitence pairs in degree 0 are born at 0 and die shortly after, which is what we expect. The
last pair in degree 0 homology should persist to Inf
.
Note that R indexing causes 0 degree homology to be stored at index 1. In general n degree homology is stored in index n+1.
Now lets look at homology in degree 1, which is what we were originally interested in.
print(pd$pairs[[2]])
Now you should see a persistence pair born early and never dying. For example, I get
[,1] [,2]
[1,] 0.3216245 Inf
Meaning that a degree 1 persistence cycle was born at 0.3216 and did not die before the threshold value.
The pd
object also contains an pd$edges
member which returns the number of edges used in the calculation by ripser. This can be useful for debuggining long calculation times. Finally pd$reps
usually stores cocycle representitives, although in our case it will be empty since do_cocycles
was not set to 1. A section on this feature will be added in the future.
Here we will mainly use the function
landscape <- function(PersistenceDiagram, exact=FALSE, dx=0.1, min_x= 0, max_x=10, threshold=-1)
which is the primary constructor for landscape objects. This function will not return an R array, but rather an object
that encapsulates the landscape operations and data. This allows us to perform operations on different types of
landscapes without worrying about preprocessing the landscape representations. At any point the underlying landscape can
be extracted using landscape$getInternal()
, which returns a three dimensional array of shape [levels, values, 2]
. Lets see an example of this by continuing from the last example. First we will construct a discrete persistence
landscape object with dx=0.05
on the interval [0,1]
, with homology in degree 1 from the previous example as
the input diagram.
pl <- landscape(pd$pairs[[2]], exact=FALSE, dx=0.1, min_x=0, max_x=1)
Now lets see what the underlying landscape looks like:
print(dim(pl$getInternal()))
I get 1 11 2
which means that our landscape has 1 level with 11 x, y
pairs. Concretely,
print(pl$getInternal()[1,,])
gives us the values of the first (and only) level. I get
[,1] [,2]
[1,] 0.0 0.00000000
[2,] 0.1 0.00000000
[3,] 0.2 0.00000000
[4,] 0.3 0.00000000
[5,] 0.4 0.07286244
[6,] 0.5 0.17286244
[7,] 0.6 0.27286244
[8,] 0.7 0.37286244
[9,] 0.8 0.47286244
[10,] 0.9 0.57286244
[11,] 1.0 0.67286244
So we can see that the grid is spaced by dx=0.1
and the lanscape is increasing along the input interval, as
expected. We can also use the built in plotting function to visualize the landscape.
PLplot(pl)
Now let us operate on landscapes. We can start by sampling from a circle again.
X2 <- CircleUnif(100)
pd2 <- diagram(X2, 'point-cloud', dim_max=1, threshold=0.5)
pl2 <- landscape(pd2$pairs[[2]], exact=FALSE, dx=0.1, min_x=0, max_x=1)
Lets calculate the sum of this landscape with the previous one.
pl_sum <- PLsum(pl,pl2)
This function checks that pl
and pl2
are compatible, i.e. are defined on the same grid and then adds the
underlying landscapes. Now lets scale by 0.5
to compute the average of the two landscapes and plot.
pl_average <- PLscale(0.5, pl_sum)
PLplot(pl_average)
We should expect the average to be close to the originals. To test this we can compute the norm of the difference using
the built in PLinner
function.
diff <- PLsum(pl,PLscale(-1,pl_average))
inner <- PLinner(diff,diff)
print(inner)
As expected, inner
is fairly small.
Note that all operations can also be done in-place as methods on landscapes. For example, PLsum(pl,pl2)
is equivalent to pl$sum(pl2)
.