Feature Meta-learning on Relational Data

The common order of processes in data preparation for machine learning is:

      (feature engineering)             (feature selection)
data -----------------------> features ---------------------> features

But we managed to swap the order of feature engineering with feature selection:

      (feature selection)             (feature engineering)
data ---------------------> features -----------------------> features

What is the gain? Based on the performed experiments, it is enough to engineer only the top decile of all candidate features to get accuracy comparable to accuracy obtained on all features. That means that you can reduce the runtime of feature engineering ~10 times without sacrificing the accuracy of the model.

Applications Large or complex databases where it is impossible or inconvenient to calculate all features.

How? With application of meta-learning. The meta-learner utilizes two sources of knowledge that guide the feature selection:

External (acquired from other databases)
Internal (acquired from the database)

Utilized meta-features There are three categories of meta-features, based on which the meta-learner estimates utility of features:

Landmark performance of a few selected features
Properties of feature (generative) functions
Properties of the data

Feature utility We prefer features that are:

Relevant to the task (evaluated with Chi2 in case of classification)
Fast to calculate
Non-redundant

The estimated feature utility is then an amalgam if these three estimates.

Dependencies

RapidMiner (to train the meta-learner)
MATLAB (to run experiments)
Excel (to see the measured values from the experiments)
R (to interpret the experiment results)

Limitations

Implemented only for relational data (specifically in SQL databases).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
excel		excel
matlab		matlab
r		r
rapidminer		rapidminer
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature Meta-learning on Relational Data

Dependencies

Limitations

About

Releases

Packages

Languages

License

janmotl/metalearning

Folders and files

Latest commit

History

Repository files navigation

Feature Meta-learning on Relational Data

Dependencies

Limitations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages