diff --git a/README.md b/README.md index c5bef1a6..6c81d6b5 100644 --- a/README.md +++ b/README.md @@ -1,59 +1,51 @@ - - -# LightAutoML - automatic model creation framework + +[![GitHub License](https://img.shields.io/github/license/sb-ai-lab/LightAutoML)](https://github.com/sb-ai-lab/RePlay/blob/main/LICENSE) +[![PyPI - Version](https://img.shields.io/pypi/v/lightautoml)](https://pypi.org/project/lightautoml) +![pypi - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=green) [![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml) -![PyPI - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=orange&style=plastic) -![Read the Docs](https://img.shields.io/readthedocs/lightautoml?style=plastic) -[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) +
+[![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/sb-ai-lab/lightautoml/CI.yml)](https://github.com/sb-ai-lab/lightautoml/actions/workflows/CI.yml?query=branch%3Amain) ![Poetry-Lock](https://img.shields.io/github/workflow/status/sb-ai-lab/LightAutoML/Poetry%20run/master?label=Poetry-Lock) - +![Read the Docs](https://img.shields.io/readthedocs/lightautoml) +[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: - binary classification -- multiclass classification +- multiclass classification +- multilabel classification - regression Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. -Multitable datasets and sequences are a work in progress :) - -**Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models. **Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Rinchin Damdinov, Vasilii Bunakov, Alexander Kirilin, Pavel Shvets. -**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. - -# (New features) GPU and Spark pipelines -Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: -- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) -- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) # Table of Contents -* [Installation LightAutoML from PyPI](#installation) +* [Installation](#installation) +* [Documentation](https://lightautoml.readthedocs.io/) * [Quick tour](#quicktour) * [Resources](#examples) -* [Contributing to LightAutoML](#contributing) -* [License](#apache) -* [For developers](#developers) +* [Advanced features](#advancedfeatures) * [Support and feature requests](#support) +* [Contributing to LightAutoML](#contributing) +* [License](#license) + +**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. + # Installation -To install LAMA framework on your machine from PyPI, execute following commands: +To install LAMA framework on your machine from PyPI: ```bash - -# Install base functionality: - +# Base functionality: pip install -U lightautoml -# For partial installation use corresponding option. -# Extra dependecies: [nlp, cv, report] -# Or you can use 'all' to install everything - +# For partial installation use corresponding option +# Extra dependecies: [nlp, cv, report] or use 'all' to install all dependecies pip install -U lightautoml[nlp] - ``` Additionally, run following commands to enable pdf report generation: @@ -77,7 +69,7 @@ sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 # Quick tour Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: -* Use ready preset for tabular data: +### Use ready preset for tabular data ```python import pandas as pd from sklearn.metrics import f1_score @@ -105,9 +97,82 @@ pd.DataFrame({ }).to_csv('submit.csv', index = False) ``` -LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. +### LightAutoML as a framework: build your own custom pipeline -[Back to top](#toc) +```python +import pandas as pd +from sklearn.metrics import f1_score + +from lightautoml.automl.presets.tabular_presets import TabularAutoML +from lightautoml.tasks import Task + +df_train = pd.read_csv('../input/titanic/train.csv') +df_test = pd.read_csv('../input/titanic/test.csv') +N_THREADS = 4 + +reader = PandasToPandasReader(Task("binary"), cv=5, random_state=42) + +# create a feature selector +selector = ImportanceCutoffSelector( + LGBSimpleFeatures(), + BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'seed': 42, 'num_threads': N_THREADS} + ), + ModelBasedImportanceEstimator(), + cutoff=0 +) + +# build first level pipeline for AutoML +pipeline_lvl1 = MLPipeline([ + # first model with hyperparams tuning + ( + BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 128, + 'seed': 1, 'num_threads': N_THREADS} + ), + OptunaTuner(n_trials=20, timeout=30) + ), + # second model without hyperparams tuning + BoostLGBM( + default_params={'learning_rate': 0.025, 'num_leaves': 64, + 'seed': 2, 'num_threads': N_THREADS} + ) +], pre_selection=selector, features_pipeline=LGBSimpleFeatures(), post_selection=None) + +# build second level pipeline for AutoML +pipeline_lvl2 = MLPipeline( + [ + BoostLGBM( + default_params={'learning_rate': 0.05, 'num_leaves': 64, + 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, + freeze_defaults=True + ) + ], + pre_selection=None, + features_pipeline=LGBSimpleFeatures(), + post_selection=None +) + +# build AutoML pipeline +automl = AutoML(reader, [ + [pipeline_lvl1], + [pipeline_lvl2], + ], + skip_conn=False +) + +# train AutoML and get predictions +oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) +test_pred = automl.predict(df_test) + +pd.DataFrame({ + 'PassengerId':df_test.PassengerId, + 'Survived': (test_pred.data[:, 0] > 0.5)*1 +}).to_csv('submit.csv', index = False) +``` + +LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. # Resources @@ -165,96 +230,25 @@ LighAutoML framework has a lot of ready-to-use parts and extensive customization - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) -[Back to top](#toc) + +# Advanced features +### GPU and Spark pipelines +Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for: +- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU) +- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA) # Contributing to LightAutoML If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started. -[Back to top](#toc) - - -# License -This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. - -[Back to top](#toc) - - -# For developers - -## Build your own custom pipeline: - -```python -import pandas as pd -from sklearn.metrics import f1_score - -from lightautoml.automl.presets.tabular_presets import TabularAutoML -from lightautoml.tasks import Task - -df_train = pd.read_csv('../input/titanic/train.csv') -df_test = pd.read_csv('../input/titanic/test.csv') - -# define that machine learning problem is binary classification -task = Task("binary") - -reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE) - -# create a feature selector -model0 = BoostLGBM( - default_params={'learning_rate': 0.05, 'num_leaves': 64, - 'seed': 42, 'num_threads': N_THREADS} -) -pipe0 = LGBSimpleFeatures() -mbie = ModelBasedImportanceEstimator() -selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0) - -# build first level pipeline for AutoML -pipe = LGBSimpleFeatures() -# stop after 20 iterations or after 30 seconds -params_tuner1 = OptunaTuner(n_trials=20, timeout=30) -model1 = BoostLGBM( - default_params={'learning_rate': 0.05, 'num_leaves': 128, - 'seed': 1, 'num_threads': N_THREADS} -) -model2 = BoostLGBM( - default_params={'learning_rate': 0.025, 'num_leaves': 64, - 'seed': 2, 'num_threads': N_THREADS} -) -pipeline_lvl1 = MLPipeline([ - (model1, params_tuner1), - model2 -], pre_selection=selector, features_pipeline=pipe, post_selection=None) - -# build second level pipeline for AutoML -pipe1 = LGBSimpleFeatures() -model = BoostLGBM( - default_params={'learning_rate': 0.05, 'num_leaves': 64, - 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, - freeze_defaults=True -) -pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1, - post_selection=None) - -# build AutoML pipeline -automl = AutoML(reader, [ - [pipeline_lvl1], - [pipeline_lvl2], -], skip_conn=False) - -# train AutoML and get predictions -oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) -test_pred = automl.predict(df_test) - -pd.DataFrame({ - 'PassengerId':df_test.PassengerId, - 'Survived': (test_pred.data[:, 0] > 0.5)*1 -}).to_csv('submit.csv', index = False) -``` - -[Back to top](#toc) - # Support and feature requests Seek prompt advice at [Telegram group](https://t.me/lightautoml). Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). + + +# License +This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details. + +[Back to top](#toc) diff --git a/imgs/GENERALL2X2.jpg b/docs/imgs/GENERALL2X2.jpg similarity index 100% rename from imgs/GENERALL2X2.jpg rename to docs/imgs/GENERALL2X2.jpg diff --git a/imgs/lime.jpg b/docs/imgs/lime.jpg similarity index 100% rename from imgs/lime.jpg rename to docs/imgs/lime.jpg diff --git a/imgs/LightAutoML_logo_big.png b/imgs/LightAutoML_logo_big.png deleted file mode 100644 index 2e799956..00000000 Binary files a/imgs/LightAutoML_logo_big.png and /dev/null differ diff --git a/imgs/LightAutoML_logo_small.png b/imgs/LightAutoML_logo_small.png deleted file mode 100644 index 8d268e39..00000000 Binary files a/imgs/LightAutoML_logo_small.png and /dev/null differ diff --git a/imgs/Star_scheme_tables.png b/imgs/Star_scheme_tables.png deleted file mode 100644 index c275d3f5..00000000 Binary files a/imgs/Star_scheme_tables.png and /dev/null differ diff --git a/imgs/TabularAutoML_model_descr.png b/imgs/TabularAutoML_model_descr.png deleted file mode 100644 index 4c24cada..00000000 Binary files a/imgs/TabularAutoML_model_descr.png and /dev/null differ diff --git a/imgs/TabularUtilizedAutoML_model_descr.png b/imgs/TabularUtilizedAutoML_model_descr.png deleted file mode 100644 index c2330881..00000000 Binary files a/imgs/TabularUtilizedAutoML_model_descr.png and /dev/null differ diff --git a/imgs/autoint.png b/imgs/autoint.png deleted file mode 100644 index e898ee92..00000000 Binary files a/imgs/autoint.png and /dev/null differ diff --git a/imgs/denselight.png b/imgs/denselight.png deleted file mode 100644 index 6e58464a..00000000 Binary files a/imgs/denselight.png and /dev/null differ diff --git a/imgs/densenet.png b/imgs/densenet.png deleted file mode 100644 index 86757951..00000000 Binary files a/imgs/densenet.png and /dev/null differ diff --git a/imgs/fttransformer.png b/imgs/fttransformer.png deleted file mode 100644 index 61e3712c..00000000 Binary files a/imgs/fttransformer.png and /dev/null differ diff --git a/imgs/node.png b/imgs/node.png deleted file mode 100644 index ca0a4805..00000000 Binary files a/imgs/node.png and /dev/null differ diff --git a/imgs/resnet.png b/imgs/resnet.png deleted file mode 100644 index 5d809448..00000000 Binary files a/imgs/resnet.png and /dev/null differ diff --git a/imgs/swa.png b/imgs/swa.png deleted file mode 100644 index 63d6df84..00000000 Binary files a/imgs/swa.png and /dev/null differ diff --git a/imgs/tutorial_11_case_problem_statement.png b/imgs/tutorial_11_case_problem_statement.png deleted file mode 100644 index 6b08b010..00000000 Binary files a/imgs/tutorial_11_case_problem_statement.png and /dev/null differ diff --git a/imgs/tutorial_11_general_problem_statement.png b/imgs/tutorial_11_general_problem_statement.png deleted file mode 100644 index c95b6e16..00000000 Binary files a/imgs/tutorial_11_general_problem_statement.png and /dev/null differ diff --git a/imgs/tutorial_11_history_step_params.png b/imgs/tutorial_11_history_step_params.png deleted file mode 100644 index 3fa11113..00000000 Binary files a/imgs/tutorial_11_history_step_params.png and /dev/null differ diff --git a/imgs/tutorial_11_transformers_params.png b/imgs/tutorial_11_transformers_params.png deleted file mode 100644 index 5212a24f..00000000 Binary files a/imgs/tutorial_11_transformers_params.png and /dev/null differ diff --git a/imgs/tutorial_1_initial_report.png b/imgs/tutorial_1_initial_report.png deleted file mode 100644 index 2648e9dc..00000000 Binary files a/imgs/tutorial_1_initial_report.png and /dev/null differ diff --git a/imgs/tutorial_1_laml_big.png b/imgs/tutorial_1_laml_big.png deleted file mode 100644 index e4de6247..00000000 Binary files a/imgs/tutorial_1_laml_big.png and /dev/null differ diff --git a/imgs/tutorial_1_ml_pipeline.png b/imgs/tutorial_1_ml_pipeline.png deleted file mode 100644 index ffc24bf3..00000000 Binary files a/imgs/tutorial_1_ml_pipeline.png and /dev/null differ diff --git a/imgs/tutorial_1_pipeline.png b/imgs/tutorial_1_pipeline.png deleted file mode 100644 index ce0ce896..00000000 Binary files a/imgs/tutorial_1_pipeline.png and /dev/null differ diff --git a/imgs/tutorial_1_unfolded_report.png b/imgs/tutorial_1_unfolded_report.png deleted file mode 100644 index e7517033..00000000 Binary files a/imgs/tutorial_1_unfolded_report.png and /dev/null differ diff --git a/imgs/tutorial_2_initial_report.png b/imgs/tutorial_2_initial_report.png deleted file mode 100644 index 277f37ac..00000000 Binary files a/imgs/tutorial_2_initial_report.png and /dev/null differ diff --git a/imgs/tutorial_2_pipeline.png b/imgs/tutorial_2_pipeline.png deleted file mode 100644 index e50a29e6..00000000 Binary files a/imgs/tutorial_2_pipeline.png and /dev/null differ diff --git a/imgs/tutorial_2_unfolded_report.png b/imgs/tutorial_2_unfolded_report.png deleted file mode 100644 index fde6fce2..00000000 Binary files a/imgs/tutorial_2_unfolded_report.png and /dev/null differ diff --git a/imgs/tutorial_3_initial_report.png b/imgs/tutorial_3_initial_report.png deleted file mode 100644 index c6639742..00000000 Binary files a/imgs/tutorial_3_initial_report.png and /dev/null differ diff --git a/imgs/tutorial_3_unfolded_report.png b/imgs/tutorial_3_unfolded_report.png deleted file mode 100644 index 87a67d37..00000000 Binary files a/imgs/tutorial_3_unfolded_report.png and /dev/null differ diff --git a/imgs/tutorial_blackbox_pipeline.png b/imgs/tutorial_blackbox_pipeline.png deleted file mode 100644 index 6c55fc7f..00000000 Binary files a/imgs/tutorial_blackbox_pipeline.png and /dev/null differ diff --git a/imgs/tutorial_whitebox_report_1.png b/imgs/tutorial_whitebox_report_1.png deleted file mode 100644 index 17317f31..00000000 Binary files a/imgs/tutorial_whitebox_report_1.png and /dev/null differ diff --git a/imgs/tutorial_whitebox_report_2.png b/imgs/tutorial_whitebox_report_2.png deleted file mode 100644 index c92067d0..00000000 Binary files a/imgs/tutorial_whitebox_report_2.png and /dev/null differ diff --git a/imgs/tutorial_whitebox_report_3.png b/imgs/tutorial_whitebox_report_3.png deleted file mode 100644 index eaa094ce..00000000 Binary files a/imgs/tutorial_whitebox_report_3.png and /dev/null differ diff --git a/imgs/tutorial_whitebox_report_4.png b/imgs/tutorial_whitebox_report_4.png deleted file mode 100644 index 3b42350a..00000000 Binary files a/imgs/tutorial_whitebox_report_4.png and /dev/null differ