From 7bd8f979e616dbb94d856cc0511427dff9e503fc Mon Sep 17 00:00:00 2001 From: dev-rinchin Date: Fri, 9 Aug 2024 17:45:40 +0300 Subject: [PATCH] upd readme --- README.md | 210 ++++-------------- ..._10_relational_data_with_star_scheme.ipynb | 4 +- .../tutorials/Tutorial_11_time_series.ipynb | 8 +- examples/tutorials/Tutorial_1_basics.ipynb | 10 +- .../Tutorial_2_WhiteBox_AutoWoE.ipynb | 10 +- .../Tutorial_3_sql_data_source.ipynb | 4 +- examples/tutorials/Tutorial_5_uplift.ipynb | 2 +- .../Tutorial_6_custom_pipeline.ipynb | 4 +- ...utorial_7_ICE_and_PDP_interpretation.ipynb | 2 +- examples/tutorials/Tutorial_8_CV_preset.ipynb | 2 +- .../Tutorial_9_neural_networks.ipynb | 16 +- 11 files changed, 77 insertions(+), 195 deletions(-) diff --git a/README.md b/README.md index b3999479..7800bb33 100644 --- a/README.md +++ b/README.md @@ -1,180 +1,34 @@ -[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/lightautoml)](https://pypi.org/project/lightautoml/) +[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/lightautoml)](https://pypi.org/project/lightautoml) [![PyPI - Version](https://img.shields.io/pypi/v/lightautoml)](https://pypi.org/project/lightautoml) ![pypi - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=green) -[![GitHub License](https://img.shields.io/github/license/sb-ai-lab/LightAutoML)](https://github.com/sb-ai-lab/RePlay/blob/main/LICENSE) -[![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml) -
[![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/sb-ai-lab/lightautoml/CI.yml)](https://github.com/sb-ai-lab/lightautoml/actions/workflows/CI.yml?query=branch%3Amain) ![Read the Docs](https://img.shields.io/readthedocs/lightautoml) -[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) +### [Documentation](https://lightautoml.readthedocs.io/) | [Installation](#installation) | [Examples](#resources) | [Telegram chat](https://t.me/joinchat/sp8P7sdAqaU0YmRi) | [Telegram channel](https://t.me/lightautoml) -LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks: -- binary classification -- multiclass classification -- multilabel classification -- regression - -Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**. +LightAutoML (LAMA) allows you to create machine learning models using just a few lines of code, or build your own custom pipeline using ready blocks. It supports tabular, time series, image and text data. **Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Rinchin Damdinov, Vasilii Bunakov, Alexander Kirilin, Pavel Shvets. - - -# Table of Contents - -* [Installation](#installation) -* [Documentation](https://lightautoml.readthedocs.io/) -* [Quick tour](#quicktour) -* [Resources](#examples) -* [Advanced features](#advancedfeatures) -* [Support and feature requests](#support) -* [Contributing to LightAutoML](#contributing) -* [License](#license) - -**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it. - - - -# Installation -To install LAMA framework on your machine from PyPI: -```bash -# Base functionality: -pip install -U lightautoml - -# For partial installation use corresponding option -# Extra dependecies: [nlp, cv, report] or use 'all' to install all dependecies -pip install -U lightautoml[nlp] -``` - -Additionally, run following commands to enable pdf report generation: - -```bash -# MacOS -brew install cairo pango gdk-pixbuf libffi - -# Debian / Ubuntu -sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info - -# Fedora -sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 - -# Windows -# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows -``` -[Back to top](#toc) - # Quick tour -Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML: -### Use ready preset for tabular data -```python -import pandas as pd -from sklearn.metrics import f1_score - -from lightautoml.automl.presets.tabular_presets import TabularAutoML -from lightautoml.tasks import Task - -df_train = pd.read_csv('../input/titanic/train.csv') -df_test = pd.read_csv('../input/titanic/test.csv') - -automl = TabularAutoML( - task = Task( - name = 'binary', - metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1)) -) -oof_pred = automl.fit_predict( - df_train, - roles = {'target': 'Survived', 'drop': ['PassengerId']} -) -test_pred = automl.predict(df_test) - -pd.DataFrame({ - 'PassengerId':df_test.PassengerId, - 'Survived': (test_pred.data[:, 0] > 0.5)*1 -}).to_csv('submit.csv', index = False) -``` +There are two main ways to solve machine learning problems using LightAutoML: +* Ready to use preset + ```python + from lightautoml.automl.presets.tabular_presets import TabularAutoML + from lightautoml.tasks import Task -### LightAutoML as a framework: build your own custom pipeline - -```python -import pandas as pd -from sklearn.metrics import f1_score - -from lightautoml.automl.presets.tabular_presets import TabularAutoML -from lightautoml.tasks import Task - -df_train = pd.read_csv('../input/titanic/train.csv') -df_test = pd.read_csv('../input/titanic/test.csv') -N_THREADS = 4 - -reader = PandasToPandasReader(Task("binary"), cv=5, random_state=42) - -# create a feature selector -selector = ImportanceCutoffSelector( - LGBSimpleFeatures(), - BoostLGBM( - default_params={'learning_rate': 0.05, 'num_leaves': 64, - 'seed': 42, 'num_threads': N_THREADS} - ), - ModelBasedImportanceEstimator(), - cutoff=0 -) - -# build first level pipeline for AutoML -pipeline_lvl1 = MLPipeline([ - # first model with hyperparams tuning - ( - BoostLGBM( - default_params={'learning_rate': 0.05, 'num_leaves': 128, - 'seed': 1, 'num_threads': N_THREADS} - ), - OptunaTuner(n_trials=20, timeout=30) - ), - # second model without hyperparams tuning - BoostLGBM( - default_params={'learning_rate': 0.025, 'num_leaves': 64, - 'seed': 2, 'num_threads': N_THREADS} - ) -], pre_selection=selector, features_pipeline=LGBSimpleFeatures(), post_selection=None) - -# build second level pipeline for AutoML -pipeline_lvl2 = MLPipeline( - [ - BoostLGBM( - default_params={'learning_rate': 0.05, 'num_leaves': 64, - 'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS}, - freeze_defaults=True - ) - ], - pre_selection=None, - features_pipeline=LGBSimpleFeatures(), - post_selection=None -) - -# build AutoML pipeline -automl = AutoML(reader, [ - [pipeline_lvl1], - [pipeline_lvl2], - ], - skip_conn=False -) - -# train AutoML and get predictions -oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']}) -test_pred = automl.predict(df_test) - -pd.DataFrame({ - 'PassengerId':df_test.PassengerId, - 'Survived': (test_pred.data[:, 0] > 0.5)*1 -}).to_csv('submit.csv', index = False) -``` + automl = TabularAutoML(task = Task(name = 'binary', metric = 'auc')) + oof_preds = automl.fit_predict(train_df, roles = {'target': 'my_target', 'drop': ['column_to_drop']}).data + test_preds = automl.predict(test_df).data + ``` -LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section. +* As a framework
+ LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#resources) section. - + # Resources ### Kaggle kernel examples of LightAutoML usage: @@ -230,6 +84,35 @@ LighAutoML framework has a lot of ready-to-use parts and extensive customization - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936) - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g) + +# Installation +To install LAMA framework on your machine from PyPI: +```bash +# Base functionality: +pip install -U lightautoml + +# For partial installation use corresponding option +# Extra dependecies: [nlp, cv, report] or use 'all' to install all dependecies +pip install -U lightautoml[nlp] +``` + +Additionally, run following commands to enable pdf report generation: + +```bash +# MacOS +brew install cairo pango gdk-pixbuf libffi + +# Debian / Ubuntu +sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info + +# Fedora +sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2 + +# Windows +# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows +``` + + # Advanced features ### GPU and Spark pipelines @@ -243,9 +126,8 @@ If you are interested in contributing to LightAutoML, please read the [Contribut # Support and feature requests -Seek prompt advice at [Telegram group](https://t.me/lightautoml). - -Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). +- Seek prompt advice at [Telegram group](https://t.me/joinchat/sp8P7sdAqaU0YmRi). +- Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues). # License diff --git a/examples/tutorials/Tutorial_10_relational_data_with_star_scheme.ipynb b/examples/tutorials/Tutorial_10_relational_data_with_star_scheme.ipynb index 29948ae9..7f159aad 100644 --- a/examples/tutorials/Tutorial_10_relational_data_with_star_scheme.ipynb +++ b/examples/tutorials/Tutorial_10_relational_data_with_star_scheme.ipynb @@ -9,7 +9,7 @@ "source": [ "# Tutorial 10: Relational datasets (with star scheme)\n", "\n", - "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/39cb56feae6766464d39dd2349480b97099d2535/imgs/LightAutoML_logo_big.png)\n", + "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/39cb56feae6766464d39dd2349480b97099d2535/docs/imgs/lightautoml_logo_color.png)\n", "\n" ] }, @@ -110,7 +110,7 @@ "source": [ "Consider an example of data with a star scheme organization. The dataset contains data on the sale of meals in the restaurant chain, consists of three tables: the main one containing information about completed orders (`train` and `test` parts), and two auxiliary tables containing information about restaurants (`fulfilment_center_info`) and available dishes (`meal_info`). The tables and the scheme of their organization are shown in the image below.\n", "\n", - "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/imgs/Star_scheme_tables.png)" + "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/docs/imgs/Star_scheme_tables.png)" ] }, { diff --git a/examples/tutorials/Tutorial_11_time_series.ipynb b/examples/tutorials/Tutorial_11_time_series.ipynb index 37a70974..948273c7 100644 --- a/examples/tutorials/Tutorial_11_time_series.ipynb +++ b/examples/tutorials/Tutorial_11_time_series.ipynb @@ -461,7 +461,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"Time" + "\"Time" ] }, { @@ -475,7 +475,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"Time" + "\"Time" ] }, { @@ -489,7 +489,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"History" + "\"History" ] }, { @@ -503,7 +503,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"Transformers" + "\"Transformers" ] }, { diff --git a/examples/tutorials/Tutorial_1_basics.ipynb b/examples/tutorials/Tutorial_1_basics.ipynb index 69047fe6..1e464ef5 100644 --- a/examples/tutorials/Tutorial_1_basics.ipynb +++ b/examples/tutorials/Tutorial_1_basics.ipynb @@ -16,7 +16,7 @@ "id": "35c56a11", "metadata": {}, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { @@ -986,7 +986,7 @@ "\n", "Let's look at how the LightAutoML model is arranged and what it consists in general.\n", "\n", - "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/imgs/tutorial_1_laml_big.png)\n", + "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/docs/imgs/tutorial_1_laml_big.png)\n", "\n", "#### 1.3.1 Reader object\n", "\n", @@ -1008,7 +1008,7 @@ "\n", "As a result, after analyzing and processing the data, the ```Reader``` object forms and returns a ```LAMA Dataset```. It contains the original data and markup with metainformation. In this dataset it is possible to see the roles defined by the ```Reader``` object, selected features etc. Then ML pipelines are trained on this data. \n", "\n", - "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/imgs/tutorial_1_ml_pipeline.png)\n", + "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/master/docs/imgs/tutorial_1_ml_pipeline.png)\n", "\n", "Each such pipeline is one or more machine learning algorithms that share one post-processing block and one validation scheme. Several such pipelines can be trained in parallel on one dataset, and they form a level. Number of levels can be unlimited as possible. List of all levels of AutoML pipeline is available via ```.levels``` attribute of ```AutoML``` instance. Level predictions can be inputs to other models or ML pipelines (i. e. stacking scheme). As inputs for subsequent levels, it is possible to use the original data by setting ```skip_conn``` argument in ```True``` when initializing preset instance. At the last level, if there are several pipelines, blending is used to build a prediction. \n", "\n", @@ -1036,11 +1036,11 @@ "\n", "Here is a default AutoML pipeline for binary classification and regression tasks (```TabularAutoML``` preset):\n", "\n", - "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/imgs/tutorial_blackbox_pipeline.png)\n", + "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/docs/imgs/tutorial_blackbox_pipeline.png)\n", "\n", "Another example:\n", "\n", - "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/imgs/tutorial_1_pipeline.png)\n", + "![](https://raw.githubusercontent.com/sb-ai-lab/LightAutoML/ac3c1b38873437eb74354fb44e68a449a0200aa6/docs/imgs/tutorial_1_pipeline.png)\n", "\n", "Let's discuss some of the params we can setup:\n", "- `task` - the type of the ML task (the only **must have** parameter)\n", diff --git a/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb b/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb index f4305f16..3cd06a36 100644 --- a/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb +++ b/examples/tutorials/Tutorial_2_WhiteBox_AutoWoE.ipynb @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { @@ -34,7 +34,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![WB0](../../imgs/tutorial_whitebox_report_1.png)" + "![WB0](../../docs/imgs/tutorial_whitebox_report_1.png)" ] }, { @@ -48,7 +48,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![WB1](../../imgs/tutorial_whitebox_report_2.png)" + "![WB1](../../docs/imgs/tutorial_whitebox_report_2.png)" ] }, { @@ -62,7 +62,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![WB2](../../imgs/tutorial_whitebox_report_3.png)" + "![WB2](../../docs/imgs/tutorial_whitebox_report_3.png)" ] }, { @@ -76,7 +76,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![WB3](../../imgs/tutorial_whitebox_report_4.png)" + "![WB3](../../docs/imgs/tutorial_whitebox_report_4.png)" ] }, { diff --git a/examples/tutorials/Tutorial_3_sql_data_source.ipynb b/examples/tutorials/Tutorial_3_sql_data_source.ipynb index 3009f93d..1ccfa239 100644 --- a/examples/tutorials/Tutorial_3_sql_data_source.ipynb +++ b/examples/tutorials/Tutorial_3_sql_data_source.ipynb @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { @@ -864,7 +864,7 @@ "source": [ "To create AutoML model here we use `TabularAutoML` preset, which looks like:\n", "\n", - "![TabularAutoML preset pipeline](../../imgs/tutorial_2_pipeline.png)\n", + "![TabularAutoML preset pipeline](../../docs/imgs/tutorial_2_pipeline.png)\n", "\n", "All params we set above can be send inside preset to change its configuration:" ] diff --git a/examples/tutorials/Tutorial_5_uplift.ipynb b/examples/tutorials/Tutorial_5_uplift.ipynb index 6f5357ea..536b7f56 100644 --- a/examples/tutorials/Tutorial_5_uplift.ipynb +++ b/examples/tutorials/Tutorial_5_uplift.ipynb @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { diff --git a/examples/tutorials/Tutorial_6_custom_pipeline.ipynb b/examples/tutorials/Tutorial_6_custom_pipeline.ipynb index cdb34dd1..1a6bb40e 100644 --- a/examples/tutorials/Tutorial_6_custom_pipeline.ipynb +++ b/examples/tutorials/Tutorial_6_custom_pipeline.ipynb @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { @@ -744,7 +744,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![AutoML pipeline for this task](../../imgs/tutorial_1_pipeline.png)" + "![AutoML pipeline for this task](../../docs/imgs/tutorial_1_pipeline.png)" ] }, { diff --git a/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb b/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb index 47afffa6..36b9fc7b 100644 --- a/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb +++ b/examples/tutorials/Tutorial_7_ICE_and_PDP_interpretation.ipynb @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { diff --git a/examples/tutorials/Tutorial_8_CV_preset.ipynb b/examples/tutorials/Tutorial_8_CV_preset.ipynb index e23668a5..772ca3a6 100644 --- a/examples/tutorials/Tutorial_8_CV_preset.ipynb +++ b/examples/tutorials/Tutorial_8_CV_preset.ipynb @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { diff --git a/examples/tutorials/Tutorial_9_neural_networks.ipynb b/examples/tutorials/Tutorial_9_neural_networks.ipynb index 76126d60..e220ac40 100644 --- a/examples/tutorials/Tutorial_9_neural_networks.ipynb +++ b/examples/tutorials/Tutorial_9_neural_networks.ipynb @@ -24,7 +24,7 @@ "tags": [] }, "source": [ - "\"LightAutoML" + "\"LightAutoML" ] }, { @@ -318,12 +318,12 @@ "- `hidden_size` - define hidden layer dimensions\n", "\n", "### 1.2 Dense Light (`\"denselight\"`)\n", - "\n", + "\n", "\n", "- `hidden_size` - define hidden layer dimensions\n", "\n", "### 1.3 Dense (`\"dense\"`)\n", - "\n", + "\n", "\n", "- `block_config` - set number of blocks and layers within each block\n", "- `compression` - portion of neuron to drop after `DenseBlock`\n", @@ -331,7 +331,7 @@ "- `bn_factor` - size of intermediate fc is increased times this factor in layer\n", "\n", "### 1.4 Resnet (`\"resnet\"`)\n", - "\n", + "\n", "\n", "- `hid_factor` - size of intermediate fc is increased times this factor in layer\n", "\n", @@ -339,13 +339,13 @@ "- `hidden_size` - define hidden layer dimensions\n", "\n", "### 1.5 NODE (`\"node\"`)\n", - "\n", + "\n", "\n", "### 1.5 AutoInt (`\"autoint\"`)\n", - "\n", + "\n", "\n", "### 1.5 FTTransformer (`\"fttransformer\"`)\n", - "\n", + "\n", "\n", "- `pooling` - Pooling used for the last step.\n", "- `n_out` - Output dimension, 1 for binary prediction.\n", @@ -657,7 +657,7 @@ "id": "1b0633e5", "metadata": {}, "source": [ - "" + "" ] }, {