update readme

sb-ai-lab · Aug 7, 2024 · 4122697 · 4122697
1 parent 2dc9d8d
commit 4122697
Show file tree

Hide file tree

Showing 34 changed files with 110 additions and 116 deletions.
diff --git a/README.md b/README.md
@@ -1,59 +1,51 @@
-<img src=https://github.com/AILab-MLTools/LightAutoML/raw/master/imgs/LightAutoML_logo_big.png />
-
-# LightAutoML - automatic model creation framework
+<img src=docs/imgs/lightautoml_logo_color.png />
 
+[![GitHub License](https://img.shields.io/github/license/sb-ai-lab/LightAutoML)](https://github.com/sb-ai-lab/RePlay/blob/main/LICENSE)
+[![PyPI - Version](https://img.shields.io/pypi/v/lightautoml)](https://pypi.org/project/lightautoml)
+![pypi - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=green)
 [![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml)
-![PyPI - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=orange&style=plastic)
-![Read the Docs](https://img.shields.io/readthedocs/lightautoml?style=plastic)
-[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+<br>
+[![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/sb-ai-lab/lightautoml/CI.yml)](https://github.com/sb-ai-lab/lightautoml/actions/workflows/CI.yml?query=branch%3Amain)
 ![Poetry-Lock](https://img.shields.io/github/workflow/status/sb-ai-lab/LightAutoML/Poetry%20run/master?label=Poetry-Lock)
-
+![Read the Docs](https://img.shields.io/readthedocs/lightautoml)
+[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
 LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks:
 - binary classification
-- multiclass  classification
+- multiclass classification
+- multilabel classification
 - regression
 
 Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**.
-Multitable datasets and sequences are a work in progress :)
-
-**Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models.
 
 **Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Rinchin Damdinov, Vasilii Bunakov, Alexander Kirilin, Pavel Shvets.
 
-**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it.
-
-# (New features) GPU and Spark pipelines
-Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:
-- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU)
-- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA)
 
 <a name="toc"></a>
 # Table of Contents
 
-* [Installation LightAutoML from PyPI](#installation)
+* [Installation](#installation)
+* [Documentation](https://lightautoml.readthedocs.io/)
 * [Quick tour](#quicktour)
 * [Resources](#examples)
-* [Contributing to LightAutoML](#contributing)
-* [License](#apache)
-* [For developers](#developers)
+* [Advanced features](#advancedfeatures)
 * [Support and feature requests](#support)
+* [Contributing to LightAutoML](#contributing)
+* [License](#license)
+
+**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it.
+
 
 <a name="installation"></a>
 # Installation
-To install LAMA framework on your machine from PyPI, execute following commands:
+To install LAMA framework on your machine from PyPI:
 ```bash
-
-# Install base functionality:
-
+# Base functionality:
 pip install -U lightautoml
 
-# For partial installation use corresponding option.
-# Extra dependecies: [nlp, cv, report]
-# Or you can use 'all' to install everything
-
+# For partial installation use corresponding option
+# Extra dependecies: [nlp, cv, report] or use 'all' to install all dependecies
 pip install -U lightautoml[nlp]
-
 ```
 
 Additionally, run following commands to enable pdf report generation:
@@ -77,7 +69,7 @@ sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2
 # Quick tour
 
 Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML:
-* Use ready preset for tabular data:
+### Use ready preset for tabular data
 ```python
 import pandas as pd
 from sklearn.metrics import f1_score
@@ -105,9 +97,82 @@ pd.DataFrame({
 }).to_csv('submit.csv', index = False)
 ```
 
-LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section.
+### LightAutoML as a framework: build your own custom pipeline
 
-[Back to top](#toc)
+```python
+import pandas as pd
+from sklearn.metrics import f1_score
+
+from lightautoml.automl.presets.tabular_presets import TabularAutoML
+from lightautoml.tasks import Task
+
+df_train = pd.read_csv('../input/titanic/train.csv')
+df_test = pd.read_csv('../input/titanic/test.csv')
+N_THREADS = 4
+
+reader = PandasToPandasReader(Task("binary"), cv=5, random_state=42)
+
+# create a feature selector
+selector = ImportanceCutoffSelector(
+    LGBSimpleFeatures(), 
+    BoostLGBM(
+        default_params={'learning_rate': 0.05, 'num_leaves': 64,
+        'seed': 42, 'num_threads': N_THREADS}
+    ),
+    ModelBasedImportanceEstimator(), 
+    cutoff=0
+)
+
+# build first level pipeline for AutoML
+pipeline_lvl1 = MLPipeline([
+    # first model with hyperparams tuning
+    (
+        BoostLGBM(
+            default_params={'learning_rate': 0.05, 'num_leaves': 128,
+            'seed': 1, 'num_threads': N_THREADS}
+        ), 
+        OptunaTuner(n_trials=20, timeout=30)
+    ),
+    # second model without hyperparams tuning
+    BoostLGBM(
+        default_params={'learning_rate': 0.025, 'num_leaves': 64,
+        'seed': 2, 'num_threads': N_THREADS}
+    )
+], pre_selection=selector, features_pipeline=LGBSimpleFeatures(), post_selection=None)
+
+# build second level pipeline for AutoML
+pipeline_lvl2 = MLPipeline(
+    [
+        BoostLGBM(
+            default_params={'learning_rate': 0.05, 'num_leaves': 64,
+            'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
+            freeze_defaults=True
+        )
+    ], 
+    pre_selection=None, 
+    features_pipeline=LGBSimpleFeatures(),
+    post_selection=None
+)
+
+# build AutoML pipeline
+automl = AutoML(reader, [
+        [pipeline_lvl1],
+        [pipeline_lvl2],
+    ],
+    skip_conn=False
+)
+
+# train AutoML and get predictions
+oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
+test_pred = automl.predict(df_test)
+
+pd.DataFrame({
+    'PassengerId':df_test.PassengerId,
+    'Survived': (test_pred.data[:, 0] > 0.5)*1
+}).to_csv('submit.csv', index = False)
+```
+
+LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section.
 
 <a name="examples"></a>
 # Resources
@@ -165,96 +230,25 @@ LighAutoML framework has a lot of ready-to-use parts and extensive customization
     - (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936)
     - (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g)
 
-[Back to top](#toc)
+<a name="advancedfeatures"></a>
+# Advanced features
+### GPU and Spark pipelines
+Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:
+- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU)
+- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA)
 
 <a name="contributing"></a>
 # Contributing to LightAutoML
 If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started.
 
-[Back to top](#toc)
-
-<a name="apache"></a>
-# License
-This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details.
-
-[Back to top](#toc)
-
-<a name="developers"></a>
-# For developers
-
-## Build your own custom pipeline:
-
-```python
-import pandas as pd
-from sklearn.metrics import f1_score
-
-from lightautoml.automl.presets.tabular_presets import TabularAutoML
-from lightautoml.tasks import Task
-
-df_train = pd.read_csv('../input/titanic/train.csv')
-df_test = pd.read_csv('../input/titanic/test.csv')
-
-# define that machine learning problem is binary classification
-task = Task("binary")
-
-reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE)
-
-# create a feature selector
-model0 = BoostLGBM(
-    default_params={'learning_rate': 0.05, 'num_leaves': 64,
-    'seed': 42, 'num_threads': N_THREADS}
-)
-pipe0 = LGBSimpleFeatures()
-mbie = ModelBasedImportanceEstimator()
-selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0)
-
-# build first level pipeline for AutoML
-pipe = LGBSimpleFeatures()
-# stop after 20 iterations or after 30 seconds
-params_tuner1 = OptunaTuner(n_trials=20, timeout=30)
-model1 = BoostLGBM(
-    default_params={'learning_rate': 0.05, 'num_leaves': 128,
-    'seed': 1, 'num_threads': N_THREADS}
-)
-model2 = BoostLGBM(
-    default_params={'learning_rate': 0.025, 'num_leaves': 64,
-    'seed': 2, 'num_threads': N_THREADS}
-)
-pipeline_lvl1 = MLPipeline([
-    (model1, params_tuner1),
-    model2
-], pre_selection=selector, features_pipeline=pipe, post_selection=None)
-
-# build second level pipeline for AutoML
-pipe1 = LGBSimpleFeatures()
-model = BoostLGBM(
-    default_params={'learning_rate': 0.05, 'num_leaves': 64,
-    'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
-    freeze_defaults=True
-)
-pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1,
- post_selection=None)
-
-# build AutoML pipeline
-automl = AutoML(reader, [
-    [pipeline_lvl1],
-    [pipeline_lvl2],
-], skip_conn=False)
-
-# train AutoML and get predictions
-oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
-test_pred = automl.predict(df_test)
-
-pd.DataFrame({
-    'PassengerId':df_test.PassengerId,
-    'Survived': (test_pred.data[:, 0] > 0.5)*1
-}).to_csv('submit.csv', index = False)
-```
-
-[Back to top](#toc)
-
 <a name="support"></a>
 # Support and feature requests
 Seek prompt advice at [Telegram group](https://t.me/lightautoml).
 
 Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues).
+
+<a name="license"></a>
+# License
+This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details.
+
+[Back to top](#toc)
diff --git a/imgs/GENERALL2X2.jpg → docs/imgs/GENERALL2X2.jpg b/imgs/GENERALL2X2.jpg → docs/imgs/GENERALL2X2.jpg
diff --git a/imgs/lime.jpg → docs/imgs/lime.jpg b/imgs/lime.jpg → docs/imgs/lime.jpg
diff --git a/imgs/LightAutoML_logo_big.png b/imgs/LightAutoML_logo_big.png
diff --git a/imgs/LightAutoML_logo_small.png b/imgs/LightAutoML_logo_small.png
diff --git a/imgs/Star_scheme_tables.png b/imgs/Star_scheme_tables.png
diff --git a/imgs/TabularAutoML_model_descr.png b/imgs/TabularAutoML_model_descr.png
diff --git a/imgs/TabularUtilizedAutoML_model_descr.png b/imgs/TabularUtilizedAutoML_model_descr.png
diff --git a/imgs/autoint.png b/imgs/autoint.png
diff --git a/imgs/denselight.png b/imgs/denselight.png
diff --git a/imgs/densenet.png b/imgs/densenet.png
diff --git a/imgs/fttransformer.png b/imgs/fttransformer.png
diff --git a/imgs/node.png b/imgs/node.png
diff --git a/imgs/resnet.png b/imgs/resnet.png
diff --git a/imgs/swa.png b/imgs/swa.png
diff --git a/imgs/tutorial_11_case_problem_statement.png b/imgs/tutorial_11_case_problem_statement.png
diff --git a/imgs/tutorial_11_general_problem_statement.png b/imgs/tutorial_11_general_problem_statement.png
diff --git a/imgs/tutorial_11_history_step_params.png b/imgs/tutorial_11_history_step_params.png
diff --git a/imgs/tutorial_11_transformers_params.png b/imgs/tutorial_11_transformers_params.png
diff --git a/imgs/tutorial_1_initial_report.png b/imgs/tutorial_1_initial_report.png
diff --git a/imgs/tutorial_1_laml_big.png b/imgs/tutorial_1_laml_big.png
diff --git a/imgs/tutorial_1_ml_pipeline.png b/imgs/tutorial_1_ml_pipeline.png
diff --git a/imgs/tutorial_1_pipeline.png b/imgs/tutorial_1_pipeline.png
diff --git a/imgs/tutorial_1_unfolded_report.png b/imgs/tutorial_1_unfolded_report.png
diff --git a/imgs/tutorial_2_initial_report.png b/imgs/tutorial_2_initial_report.png
diff --git a/imgs/tutorial_2_pipeline.png b/imgs/tutorial_2_pipeline.png
diff --git a/imgs/tutorial_2_unfolded_report.png b/imgs/tutorial_2_unfolded_report.png
diff --git a/imgs/tutorial_3_initial_report.png b/imgs/tutorial_3_initial_report.png
diff --git a/imgs/tutorial_3_unfolded_report.png b/imgs/tutorial_3_unfolded_report.png
diff --git a/imgs/tutorial_blackbox_pipeline.png b/imgs/tutorial_blackbox_pipeline.png
diff --git a/imgs/tutorial_whitebox_report_1.png b/imgs/tutorial_whitebox_report_1.png
diff --git a/imgs/tutorial_whitebox_report_2.png b/imgs/tutorial_whitebox_report_2.png
diff --git a/imgs/tutorial_whitebox_report_3.png b/imgs/tutorial_whitebox_report_3.png
diff --git a/imgs/tutorial_whitebox_report_4.png b/imgs/tutorial_whitebox_report_4.png