Visualization of model hyperparameter optimization curves

Summary

The goal of this project is to provide users of the mlr package with a way of visualizing what happens during the tuning process that identifies the best hyperparameters for given data. This will enable users to assess the impact of different parameters and provide pointers to authors of learning methods what parameters have an impact in practice and how to improve their approaches.

Description

Many machine learning algorithms have lots of parameters that need to be set in order to achieve optimal performance on a given data set. Doing this manually is a tedious and error-prone task. The mlr package implements not only a interface to dozens of different learning algorithms in R, but also a set of generic hyperparameter optimisation methods -- given a learner, its parameters and data, it will automatically identify the best parameter setting for the particular case.

While good parameter settings can be determined efficiently, mlr currently provides no means of visualizing this process. The user is given a result without much explanation of how this result was arrived at. Understanding what happens during the process is not only interesting from the user's point of view, but also crucial for understanding what happens and linking this back to an understanding of the behaviour of the machine learning algorithm on the data. Such understanding can inform improvements for the particular approach.

This project will create visualizations of hyperparameter tuning for mlr. It will allow the plotting of a hyperparameter against a scoring function, showing the effect of tuning the specified hyperparameter. It will furthermore include support for plotting multiple hyperparameters and scoring functions, along with ablation analysis (a method for identifying the most important parameters).

Technical Details

The path taken from the starting parameter configuration to the end result is stored in an optimization path data structure that is part of the ParamHelpers package. The data structure should contain all the necessary information, but may need to be extended to accommodate more detail.

The plotting should use ggplot2/ggvis, in line with the other visualizations in mlr. Providing interactive functionality, e.g. through shiny, would be desirable.

Skills Required

Applicants should have:

Experience using or developing in R, and development tools such as git.
Experience with visualization methods.
A background in computer science or engineering will be beneficial.

Test

Implement a simple visualization that plots the points on an optimization path with respect to the achieved performance. The mlr tutorial gives details on how to get started.

Visualizing Hyperparameter Optimization by Mason

Mentors

Bernd Bischl (bernd_bischl@gmx.net) is one of the primary author of mlr and ParamHelpers and has mentored for GSoC before.

Lars Kotthoff (larsko@cs.ubc.ca) is one of the primary authors of mlr and has mentored for GSoC before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly