How to create conda environment for benchmarking
conda create -n bench -c rapidsai -c conda-forge python=3.7 scikit-learn cuml pandas cudf tqdm
You can launch benchmarks for each algorithm separately. The tables below list all supported parameters for each algorithm:
Parameter Name
Type
Default Value
Description
num-threads
int
-1
The number of threads to use
arch
str
?
Achine architecture, for bookkeeping
batch
str
?
Batch ID, for bookkeeping
prefix
str
sklearn
Prefix string, for bookkeeping
header
action
False
Output CSV header
verbose
action
False
Output extra debug messages
data-format
str
numpy
Data formats: numpy , pandas or cudf
data-order
str
C
Data order: C (row-major, default) or F (column-major)
dtype
np.dtype
np.float64
Data type: float64 (default) or float32
check-finiteness
action
False
Check finiteness in sklearn input check(disabled by default)
output-format
str
csv
Output format: csv (default) or json '
time-method
str
mean_min
Method used for time mesurements
box-filter-measurements
int
100
Maximum number of measurements in box filter
inner-loops
int
100
Maximum inner loop iterations. (we take the mean over inner iterations)
outer-loops
int
100
Maximum outer loop iterations. (we take the min over outer iterations)
time-limit
float
10
Target time to spend to benchmark
goal-outer-loops
int
10
The number of outer loops to aim while automatically picking number of inner loops. If zero, do not automatically decide number of inner loops
seed
int
12345
Seed to pass as random_state
dataset-name
str
None
Dataset name
parameter Name
Type
default value
description
epsilon
float
10
Radius of neighborhood of a point
min_samples
int
5
The minimum number of samples required in a 'neighborhood to consider a point a core point
parameter Name
Type
default value
description
criterion
str
gini
gini or entropy . The function to measure the quality of a split
split-algorithm
str
hist
hist or global_quantile . The algorithm to determine how nodes are split in the tree
num-trees
int
100
The number of trees in the forest
max-features
float_or_int
None
Upper bound on features used at each split
max-depth
int
None
Upper bound on depth of constructed trees
min-samples-split
float_or_int
2
Minimum samples number for node splitting
max-leaf-nodes
int
None
Maximum leaf nodes per tree
min-impurity-decrease
float
0
Needed impurity decrease for node splitting
no-bootstrap
store_false
True
Don't control bootstraping
parameter Name
Type
default value
description
criterion
str
gini
gini or entropy . The function to measure the quality of a split
split-algorithm
str
hist
hist or global_quantile . The algorithm to determine how nodes are split in the tree
num-trees
int
100
The number of trees in the forest
max-features
float_or_int
None
Upper bound on features used at each split
max-depth
int
None
Upper bound on depth of constructed trees
min-samples-split
float_or_int
2
Minimum samples number for node splitting
max-leaf-nodes
int
None
Maximum leaf nodes per tree
min-impurity-decrease
float
0
Needed impurity decrease for node splitting
no-bootstrap
action
True
Don't control bootstraping
parameter Name
Type
default value
description
init
str
Initial clusters
tol
float
0
Absolute threshold
maxiter
int
100
Maximum number of iterations
samples-per-batch
int
32768
The number of samples per batch
n-clusters
int
The number of clusters
parameter Name
Type
default value
description
n-neighbors
int
5
The number of neighbors to use
weights
str
uniform
Weight function used in prediction
method
str
brute
Algorithm used to compute the nearest neighbors
metric
str
euclidean
Distance metric to use
parameter Name
Type
default value
description
no-fit-intercept
action
True
Don't fit intercept (assume data already centered)
solver
str
eig
eig or svd . Solver used for training
parameter Name
Type
default value
description
no-fit-intercept
action
True
Don't fit intercept
solver
str
qn
qn , owl . Solver to use
maxiter
int
100
Maximum iterations for the iterative solver
C
float
1.0
Regularization parameter
tol
float
None
Tolerance for solver
parameter Name
Type
default value
description
svd-solver
str
full
auto , full or jacobi . SVD solver to use
n-components
int
None
The number of components to find
whiten
action
False
Perform whitening
parameter Name
Type
default value
description
no-fit-intercept
action
True
Don't fit intercept (assume data already centered)
solver
str
eig
eig , cd or svd . Solver used for training
alpha
float
1.0
Regularization strength
parameter Name
Type
default value
description
C
float
0.01
SVM slack parameter
kernel
str
linear
linear or rbf . SVM kernel function
gamma
float
None
Parameter for kernel="rbf"
max-cache-size
int
64
Maximum cache size for SVM.
tol
float
1e-16
Tolerance passed to sklearn.svm.SVC
probability
action
True
Use probability for SVC
parameter Name
Type
default value
description
n-components
int
2
Dimension of the embedded space
early-exaggeration
float
12.0
This factor increases the attractive forces between points and allows points to move around more freely finding their nearest neighbors more easily
learning-rate
float
200.0
The learning rate for t-SNE is usually in the range [10.0, 1000.0]
angle
float
0.5
Angular size. This is the trade-off between speed and accuracy
min-grad-norm
float
1e-7
If the gradient norm is below this threshold, the optimization is stopped
random-state
int
1234
Determines the random number generator
parameter Name
Type
default value
description
train-size
float
0.75
Size of training subset
test-size
float
0.25
Size of testing subset
do-not-shuffle
action
False
Do not perform data shuffle before splitting