Skip to content

Lazy, functional machine learning algorithms and data structures

Notifications You must be signed in to change notification settings

jamesmcnamara/ml

Repository files navigation

Learn.py

By James McNamara

Learn.py is a general purpose ETL and machine learning library written in python3 with a focus on lazy, functional style. It currently includes various decision trees, regression tools, and text classifiers and work has already begun on neural nets, support vector machines, and EM clustering.

The required libraries are included in requirements.txt, and can be installed with:

	pip install -r requirements.txt

Examples

Most classes support the same API, and thus can be used through:

from ml.module import MLClass

clf = MLClass(data=my_training_data, results=Training_results)
predictions = clf.predict(test_data)

It should be noted that output is an iterable, and is thus single use, and calculated by need.

CLI

The project has a command line interface accessible through learn.py:

python learn.py [-h] [-r RANGE RANGE RANGE] [-m META] 
			    [-cv CROSS] [-t TREE] [-d] [-cf] [-b] 
			    infile

Positional Arguments

Name Usage
infile CSV file with training data

Optional Arguments

Name Usage
-h, --help Show this help message and exit
-r RANGE RANGE RANGE,
--range RANGE RANGE RANGE
Range of η values to use for cross validation. The first value is start, the second is end, and the last is interval
-m META,
--meta META
Meta file containing JSON formatted descriptions of the data
-cv CROSS,
--cross CROSS
Set the parameter for k-fold cross validation. Default 10.
-t TREE,
--tree TREE
What type of decision tree to build for the data. Options are 'entropy', 'regression', or 'categorical'. Default 'entropy'
-d, --debug Use sci-kit learn instead of learn.py, to test that the behavior is correct
-cf, --with-confusion Include a confusion matrix in the output
-b, --binary-splits Convert a multi-way categorical matrix to a binary matrix

Examples

Perform 10-fold cross validation on the iris dataset over η mins of 5, 10, 15, 20 & 25:

    python learn.py -r 5 25 5 data/iris.csv

Generate confusion matricies for η mins of 5 10 15 over the mushroom dataset using multiway splits:

    python learn.py -r 5 15 5 -t categorical -cf data/mushroom.csv

Convert the mushroom dataset to a binary dataset and perform cross validation at 1-10:

    python learn.py -r 1 10 1 -t categorical -b data/mushroom.csv

Regress the housing dataset using 15-fold cross validation over η of 5, 10 & 15:

    python learn.py -r 5 15 5 -t regression -cv 15 data/housing.csv

About

Lazy, functional machine learning algorithms and data structures

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages