Features
-
The method
branch_partition
has been added to theLevelSetTree
. This method assigns each point a label corresponding to the highest density node to which the point belongs in the level set tree. -
The tree table now prints when the tree is called by itself. Now both
print(tree)
andtree
print the tree's summary table to the console.
Bugfixes
-
Tree node colors should now match between the figure and the
color_nodes
output of theLevelSetTree
plot method. -
The
knn_density
function in theutils
module has more informative error messages and warnings for numerical issues, typically resulting from high-dimensional data.
This release is a major overhaul of DeBaCl. The primary goal is to make the level set trees (LSTs) easier to use, by removing much of the experimental and quasi-analysis code from my dissertation work, adding unit tests to improve code robustness, and simplifying the level set tree API. The experimental code will not vanish; I will move it to separate branches or a new repository. - Brian (papayawarrior)
Logistics
-
No more dependency on igraph. Graph computation is now done with Networkx.
-
No more dependency on Pandas. Level set tree printing is now done with Prettytable.
-
The dependencies on Scipy and Matplotlib are now recommended but optional. Scipy is now used only for constructing similarity graphs by brute force.
-
Saving and loading now use cPickle instead of scipy.io’s
loadmat
function. -
The level set tree constructor functions and the
LevelSetTree
are now accessible directly from thedebacl
namespace.
Level set tree construction
-
The main level set tree class
GeomTree
has been renamed toLevelSetTree
. -
The similarity graph LST constructor
constructTree
has been renamed toconstruct_tree_from_graph
. -
construct_tree_from_graph
now takes the similarity graph in the form of an adjacency list, rather than an adjacency matrix. -
construct_tree_from_graph
no longer requires the user to pre-compute density levels and "background sets" of instances. The function now requires only an adjacency list (to represent a similarity graph) and a density estimate for each data instance. -
LevelSetTree
objects contain the density estimate for each input instance, rather than a collection of background sets. -
The similarity graph utilities
knn_graph
andepsilon_graph
now return adjacency lists rather than adjacency matrices. -
The
constructDensityGrid
utility has been split into to two functions:define_density_mass_grid
anddefine_density_level_grid
. The LST constructors use the mass option, but the density level option is left for legacy purposes. -
The
gaussianGraph
utility has been removed.
Level set tree printing and plotting
-
Changed tree table column names from 'lambda1', 'lambda2', 'alpha1', and 'alpha2' to 'start_level', 'end_level', 'start_mass', 'end_mass'.
-
The level set tree plot forms have been renamed from 'lambda', 'alpha', and 'kappa' to 'density', 'mass', and 'branch-mass'.
-
The 'width' parameter in the
LevelSetTree.plot
method has been renamed to 'horizontal_spacing', and the 'mass' option for this parameter has been renamed to 'proportional'. -
Added a tree method ‘get_leaf_nodes’ which just returns the indices of the leaf nodes.
-
Tree plotting now returns the color assigned to each node.
-
Tree plotting no longer returns the ‘segmap’ and ‘splitmap’ objects.
-
Tree plot objects ‘segments’ and ‘splits’ have been renamed to ‘node_coords’ and ‘split_coords’.
-
The interactive plotting tools
ComponentGUI
andClusterGUI
have been removed. -
Plotting utilities (
Palette
,plot_foreground
,make_color_matrix
, andsetPlotParams
) have been removed. -
The
clusterHistogram
utility for illustrating the level set tree method on 1D data has been removed. -
The
plot
method ofLevelSetTree
objects no longer accept the 'gap' parameter for adding extra whitespace on the bottom of the plot. -
The 'old' form of level set tree plots has been removed.
-
The
plot
method ofLevelSetTree
objects no longer accept the 'sort' parameter; the branches are always sorted now from highest to lowest mass.
Level set tree pruning
-
Level set tree pruning can now be done directly in the tree constructors. There’s no need to call the
prune
method separately (although it's still a valid pattern). -
The
prune
method returns a new, prunedLevelSetTree
object. This means pruning at various thresholds can be done from the same level set tree, without re-building the tree each time. -
The
prune
no longer takes a method parameter. It assumes the 'merge-by-size' method. -
LevelSetTree
objects now have aprune_threshold
attribute.
Level set tree clustering
-
Changed the name of
get_cluster_labels
toget_clusters
. -
Changed the name 'all-mode' clustering to 'leaf' clustering.
-
Added the ‘fill_backround’ flag to
get_clusters
to fill the background points with -1. -
Changed all clustering methods to return only cluster labels, not the list of active nodes.
-
An instance's cluster label is now the index of the level set tree node that is "activated" by a given clustering method and to which the instance belongs. Previously cluster labels were consecutive integers.
-
Added a utility function
reindex_cluster_labels
to re-index cluster labels to be consecutive integers. -
The
assignBackgroundPoints
utility function for assigning low-density points to clusters has bee removed. Any classifier (in scikit-learn, for example) can be used for this task.
Bugfixes
-
External library imports are now hidden to avoid namespace pollution.
-
The
num_levels
attribute is now correctly populated.
Miscellaneous
-
Use Python built-in logging module instead of print statements.
-
The
subgraphs
attribute of aLevelSetTree
is now hidden from the user. -
Helper
LevelSetTree
methods are now hidden from the user. -
The cd_tree.py module containing the original level set tree algorithm (Chaudhuri & Dasgupta, 2010) tree has been removed.
-
The
drawSample
utility has been removed. This can be done now with Numpy.
Initial release