Datasets

This repository contains some of the common dataset that are used. The datasets available here are -

Datasets with only two classes

Adult (adult.csv)
Credit (credit.csv)
Diabetes (diabetes.csv)
Haberman (haberman.csv)
Indian Liver Patient (indianLiverPatient.csv)
Magic (magic.csv)
Mammographic (mammographic.csv)
Pulsar (pulsar.csv) - Only data
Heart Diseases (heart.csv)
Connectionist Bench (Sonar, Mines vs. Rocks) (sonar.csv)
SVM guide (svmguide3.csv)
Liver Disorder (liver_disorder.csv)
German credit data (german_numer.csv)
Yearbook dataset (Yearbook/portraits_1905_1954.mat, portraits_1955_1974.mat, portraits_1975_1994.mat, portraits_1995_2013.mat)

Datasets with multiple classes

Ecoli (ecoli.csv)
Forest Cover (forestcov.csv)
Glass (glass.csv)
Iris (iris.csv)
Letter Recognition (letter-recognition.csv)
Optical Digit Recognition (optdigits.csv)
Wine Quality (redwine.csv)
Satellite (satellite.csv)
Image Segmentation (segment.csv)
Vehicle Silhouettes (vehicle.csv)
Pulsar (pulsar.csv) - Only data
DomainNet (domain_4_clases.mat) The “DomainNet” dataset contains six different domains with decreasing realism and the goal is to predict if an image is an airplane, bus, ambulance, or police car. The sequence of tasks corresponds to the six domains: real, painting, infograph, clipart, sketch, and quickdraw.
Dry bean

The datasets are in the data folder and their description is available in the folder descr

Example

The repo also contains some functions in the file load.py to load these datasets as a numpy matrix. The file example.py gives an example of the usage of these functions. You can run that file to load and see the output of any of these datasets by passing the name of the dataset file as the command line argument -

python example.py datasetname

In order to load a dataset, you can call the corresponding function (load_<datasetname>) available in the file load.py. For example, to load the dataset adult, you need to call the function load_adult(True). Note: you need to pass True as parameter to the function if you want the function to return the dataset and its labels as numpy matrix and vector respectively.

Reference

These datasets are taken from the UCI machine learning repository and LIBSVM Data repository

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
__pycache__		__pycache__
data		data
descr		descr
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
__init__.py		__init__.py
example.py		example.py
load.py		load.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Datasets

Datasets with only two classes

Datasets with multiple classes

Example

Reference

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

MachineLearningBCAM/Datasets

Folders and files

Latest commit

History

Repository files navigation

Datasets

Datasets with only two classes

Datasets with multiple classes

Example

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages