This is my personal repository for the course Advanced Machine Learning (Napredno strojno učenje) as taken at the Faculty of Mathematics and Physics, University of Ljubljana, in the 2nd semester of 2022/2023. This repository contains, of course, the code I wrote during the course of this course, such as code from exercise classes and homework assignments.
-
- Class 1: Fundamentals of Machine Learning in Python
- Class 2: Meta-learning
- Class 3: Meta-learning, and Hyperparameter Optimisation
- Class 4: Hyperparameter Optimisation with
hyperopt
- Class 7: Machine Learning on Complex Data Structures, Part 1
- Class 10: Equation Discovery
- Class 11: Equation Discovery with
ProGED
Below, a broad overview of exercise classes is given. The exact instructions are not part of this repository.
Corresponding code can be found in appropriate subdirectories of ex/
.
- Exercise A: Data Processing
- Load data
- Extract basic statistics
- Handle
NaN
values - Visualise data
- Encode categorical features
sklearn.preprocessing.OneHotEncoder
sklearn.compose.make_column_transformer
- Exercise B: Binary Classification
- Train model on entire dataset
sklearn.neighbors.KNeighborsClassifier
- Evaluate accuracy of model
- Split dataset into train data and test data
sklearn.model_selection.train_test_split
- Scale features, analyze hyperparameters
sklearn.preprocessing.StandardScaler
sklearn.model_selection.validation_curve
- Calculate alternative metrics
sklearn.metrics.confusion_matrix
sklearn.metrics.precision_recall_curve
sklearn.metrics.roc_curve
sklearn.metrics.roc_auc_score
- Train model on entire dataset
- Exercise C: Linear Regression
sklearn.linear_model.LinearRegression
- Calculate regression metrics
sklearn.metrics.mean_squared_error
sklearn.metrics.r2_score
- Cross-validate and compare models
sklearn.model_selection.cross_validate
sklearn.svm.SVR
sklearn.ensemble.RandomForestRegressor
sklearn.neighbors.KNeighborsRegressor
- Exercise A: Obtaining Data from OpenML
- List dataset info
openml.datasets.list_datasets
- Load datasets
openml.datasets.get_datasets
- Filter datasets
pandas.api.types.is_numeric
- List dataset info
- Exercise B: Preparing Target Variables
- Compare model accuracies for datasets
sklearn.tree.DecisionTreeClassifier
sklearn.naive_bayes.GaussianNB
- Compare model accuracies for datasets
- Exercise C: Preparing Meta-features
- Extract meta-features from dataset
pymfe.mfe.MFE.fit
pymfe.mfe.MFE.extract
- Extract meta-features from fitted model
pymfe.mfe.MFE.extract_from_model
- Extract meta-features for all datasets
- Extract meta-features from dataset
- Exercise A: Meta-classification, Meta-regression
- Preprocess data
- Cross-validate and compare meta-models
sklearn.ensemble.RandomForestClassifier
sklearn.dummy.DummyClassifier
- Compare features by importance
sklearn.ensemble.RandomForestClassifier.feature_importances_
numpy.argsort
- Use a regression meta-model to predict accuracy
- Exercise B: Hyperparameter Optimisation
sklearn.*.model.get_params
- Vary hyperparameters in a decision tree model
- Perform a grid search
sklearn.model_selection.GridSearchCV
sklearn.model_selection.GridSearchCV.best_params_
sklearn.model_selection.GridSearchCV.best_estimator_
- Visualise grid search results
grid_search.param_grid
Class 4: Hyperparameter Optimisation with hyperopt
- Exercise A: Hyperparameter Optimisation with hyperopt
- Minimise function of one variable, uniform distribution
hyperopt.fmin
hyperopt.Trials
hyperopt.tpe.suggest
hyperopt.hp.uniform
- Minimise function of two variables, normal distribution
hyperopt.hp.normal
- Find best algorithm
hyperopt.hp.choice
- Define hyperparameter space
- Compare algorithm performance with default and optimised hyperparameters
- Minimise function of one variable, uniform distribution
- Exercise A: Analyzing the Given Code Template
- Exercise B: Filling in Missing Code
- Complete
razsiri_z_hours
- Complete
razsiri_z_attributes
- Complete the final for-loop
- Complete
- Exercise C: Contemplating the Inclusion of Tables
REVIEWS
andUSERS
- Exercise A: Equation Discovery with Linear Regression
- Implement linear regression
sklearn.preprocessing.PolynomialFeatures
- Test linear regression on given data
- Handle noise with ridge regression
- Handle noise with lasso regression
- Implement linear regression
- Exercise B: Equation Discovery with the BACON algorithm
- Implement the BACON algorithm
- Test the BACON algorithm on given data
Class 10: Equation Discovery with ProGED
- Exercise A: Probabilistic Grammatics and
ProGED
- Discover Newton's Second Law
ProGED.EqDisco.generate_models
ProGED.EqDisco.fit_models
ProGED.EqDisco.get_results
- Discover a linear function
ProGED.generators.GeneratorGrammar
- Discover the energy conservation law
- Discover Newton's Second Law
Below, a broad overview of homework assignments is given. The exact instructions are not part of this repository.
Corresponding code can be found in appropriate subdirectories of hw/
.
- Problem 1: Method Selection and Hyperparameter Optimisation
- Manual approach
- Automated approach
- Problem 2: Meta-learning
- Method selection with meta-learning