GitHub - jamesmcnamara/ml: Lazy, functional machine learning algorithms and data structures

Learn.py

By James McNamara

Learn.py is a general purpose ETL and machine learning library written in python3 with a focus on lazy, functional style. It currently includes various decision trees, regression tools, and text classifiers and work has already begun on neural nets, support vector machines, and EM clustering.

The required libraries are included in requirements.txt, and can be installed with:

	pip install -r requirements.txt

Examples

Most classes support the same API, and thus can be used through:

from ml.module import MLClass

clf = MLClass(data=my_training_data, results=Training_results)
predictions = clf.predict(test_data)

It should be noted that output is an iterable, and is thus single use, and calculated by need.

CLI

The project has a command line interface accessible through learn.py:

python learn.py [-h] [-r RANGE RANGE RANGE] [-m META] 
			    [-cv CROSS] [-t TREE] [-d] [-cf] [-b] 
			    infile

Positional Arguments

Name	Usage
infile	CSV file with training data

Optional Arguments

Name	Usage
-h, --help	Show this help message and exit
-r RANGE RANGE RANGE, --range RANGE RANGE RANGE	Range of η values to use for cross validation. The first value is start, the second is end, and the last is interval
-m META, --meta META	Meta file containing JSON formatted descriptions of the data
-cv CROSS, --cross CROSS	Set the parameter for k-fold cross validation. Default 10.
-t TREE, --tree TREE	What type of decision tree to build for the data. Options are 'entropy', 'regression', or 'categorical'. Default 'entropy'
-d, --debug	Use sci-kit learn instead of learn.py, to test that the behavior is correct
-cf, --with-confusion	Include a confusion matrix in the output
-b, --binary-splits	Convert a multi-way categorical matrix to a binary matrix

Examples

Perform 10-fold cross validation on the iris dataset over η mins of 5, 10, 15, 20 & 25:

    python learn.py -r 5 25 5 data/iris.csv

Generate confusion matricies for η mins of 5 10 15 over the mushroom dataset using multiway splits:

    python learn.py -r 5 15 5 -t categorical -cf data/mushroom.csv

Convert the mushroom dataset to a binary dataset and perform cross validation at 1-10:

    python learn.py -r 1 10 1 -t categorical -b data/mushroom.csv

Regress the housing dataset using 15-fold cross validation over η of 5, 10 & 15:

    python learn.py -r 5 15 5 -t regression -cv 15 data/housing.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learn.py

Examples

CLI

Positional Arguments

Optional Arguments

Examples

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
classifier		classifier
clustering		clustering
data		data
regression		regression
text_classification		text_classification
utils		utils
README.md		README.md
__init__.py		__init__.py
learn.py		learn.py
requirements.txt		requirements.txt
tests.py		tests.py

jamesmcnamara/ml

Folders and files

Latest commit

History

Repository files navigation

Learn.py

Examples

CLI

Positional Arguments

Optional Arguments

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages