Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
dev-rinchin committed Aug 7, 2024
1 parent 2dc9d8d commit 4122697
Show file tree
Hide file tree
Showing 34 changed files with 110 additions and 116 deletions.
226 changes: 110 additions & 116 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,51 @@
<img src=https://github.com/AILab-MLTools/LightAutoML/raw/master/imgs/LightAutoML_logo_big.png />

# LightAutoML - automatic model creation framework
<img src=docs/imgs/lightautoml_logo_color.png />

[![GitHub License](https://img.shields.io/github/license/sb-ai-lab/LightAutoML)](https://github.com/sb-ai-lab/RePlay/blob/main/LICENSE)
[![PyPI - Version](https://img.shields.io/pypi/v/lightautoml)](https://pypi.org/project/lightautoml)
![pypi - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=green)
[![Telegram](https://img.shields.io/badge/chat-on%20Telegram-2ba2d9.svg)](https://t.me/lightautoml)
![PyPI - Downloads](https://img.shields.io/pypi/dm/lightautoml?color=green&label=PyPI%20downloads&logo=pypi&logoColor=orange&style=plastic)
![Read the Docs](https://img.shields.io/readthedocs/lightautoml?style=plastic)
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
<br>
[![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/sb-ai-lab/lightautoml/CI.yml)](https://github.com/sb-ai-lab/lightautoml/actions/workflows/CI.yml?query=branch%3Amain)
![Poetry-Lock](https://img.shields.io/github/workflow/status/sb-ai-lab/LightAutoML/Poetry%20run/master?label=Poetry-Lock)

![Read the Docs](https://img.shields.io/readthedocs/lightautoml)
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks:
- binary classification
- multiclass classification
- multiclass classification
- multilabel classification
- regression

Current version of the package handles datasets that have independent samples in each row. I.e. **each row is an object with its specific features and target**.
Multitable datasets and sequences are a work in progress :)

**Note**: we use [`AutoWoE`](https://pypi.org/project/autowoe) library to automatically create interpretable models.

**Authors**: [Alexander Ryzhkov](https://kaggle.com/alexryzhkov), [Anton Vakhrushev](https://kaggle.com/btbpanda), [Dmitry Simakov](https://kaggle.com/simakov), Rinchin Damdinov, Vasilii Bunakov, Alexander Kirilin, Pavel Shvets.

**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it.

# (New features) GPU and Spark pipelines
Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:
- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU)
- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA)

<a name="toc"></a>
# Table of Contents

* [Installation LightAutoML from PyPI](#installation)
* [Installation](#installation)
* [Documentation](https://lightautoml.readthedocs.io/)
* [Quick tour](#quicktour)
* [Resources](#examples)
* [Contributing to LightAutoML](#contributing)
* [License](#apache)
* [For developers](#developers)
* [Advanced features](#advancedfeatures)
* [Support and feature requests](#support)
* [Contributing to LightAutoML](#contributing)
* [License](#license)

**Documentation** of LightAutoML is available [here](https://lightautoml.readthedocs.io/), you can also [generate](https://github.com/AILab-MLTools/LightAutoML/blob/master/.github/CONTRIBUTING.md#building-documentation) it.


<a name="installation"></a>
# Installation
To install LAMA framework on your machine from PyPI, execute following commands:
To install LAMA framework on your machine from PyPI:
```bash

# Install base functionality:

# Base functionality:
pip install -U lightautoml

# For partial installation use corresponding option.
# Extra dependecies: [nlp, cv, report]
# Or you can use 'all' to install everything

# For partial installation use corresponding option
# Extra dependecies: [nlp, cv, report] or use 'all' to install all dependecies
pip install -U lightautoml[nlp]

```

Additionally, run following commands to enable pdf report generation:
Expand All @@ -77,7 +69,7 @@ sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2
# Quick tour

Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML:
* Use ready preset for tabular data:
### Use ready preset for tabular data
```python
import pandas as pd
from sklearn.metrics import f1_score
Expand Down Expand Up @@ -105,9 +97,82 @@ pd.DataFrame({
}).to_csv('submit.csv', index = False)
```

LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section.
### LightAutoML as a framework: build your own custom pipeline

[Back to top](#toc)
```python
import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')
N_THREADS = 4

reader = PandasToPandasReader(Task("binary"), cv=5, random_state=42)

# create a feature selector
selector = ImportanceCutoffSelector(
LGBSimpleFeatures(),
BoostLGBM(
default_params={'learning_rate': 0.05, 'num_leaves': 64,
'seed': 42, 'num_threads': N_THREADS}
),
ModelBasedImportanceEstimator(),
cutoff=0
)

# build first level pipeline for AutoML
pipeline_lvl1 = MLPipeline([
# first model with hyperparams tuning
(
BoostLGBM(
default_params={'learning_rate': 0.05, 'num_leaves': 128,
'seed': 1, 'num_threads': N_THREADS}
),
OptunaTuner(n_trials=20, timeout=30)
),
# second model without hyperparams tuning
BoostLGBM(
default_params={'learning_rate': 0.025, 'num_leaves': 64,
'seed': 2, 'num_threads': N_THREADS}
)
], pre_selection=selector, features_pipeline=LGBSimpleFeatures(), post_selection=None)

# build second level pipeline for AutoML
pipeline_lvl2 = MLPipeline(
[
BoostLGBM(
default_params={'learning_rate': 0.05, 'num_leaves': 64,
'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
freeze_defaults=True
)
],
pre_selection=None,
features_pipeline=LGBSimpleFeatures(),
post_selection=None
)

# build AutoML pipeline
automl = AutoML(reader, [
[pipeline_lvl1],
[pipeline_lvl2],
],
skip_conn=False
)

# train AutoML and get predictions
oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
test_pred = automl.predict(df_test)

pd.DataFrame({
'PassengerId':df_test.PassengerId,
'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)
```

LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the [resources](#Resources) section.

<a name="examples"></a>
# Resources
Expand Down Expand Up @@ -165,96 +230,25 @@ LighAutoML framework has a lot of ready-to-use parts and extensive customization
- (English) [LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)](https://alexmryzhkov.medium.com/lightautoml-preset-usage-tutorial-2cce7da6f936)
- (English) [Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)](https://analyticsindiamag.com/hands-on-python-guide-to-lama-an-automatic-ml-model-creation-framework/?fbclid=IwAR0f0cVgQWaLI60m1IHMD6VZfmKce0ZXxw-O8VRTdRALsKtty8a-ouJex7g)

[Back to top](#toc)
<a name="advancedfeatures"></a>
# Advanced features
### GPU and Spark pipelines
Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:
- GPU pipeline is [available here](https://github.com/Rishat-skoltech/LightAutoML_GPU)
- Spark pipeline is [available here](https://github.com/sb-ai-lab/SLAMA)

<a name="contributing"></a>
# Contributing to LightAutoML
If you are interested in contributing to LightAutoML, please read the [Contributing Guide](.github/CONTRIBUTING.md) to get started.

[Back to top](#toc)

<a name="apache"></a>
# License
This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details.

[Back to top](#toc)

<a name="developers"></a>
# For developers

## Build your own custom pipeline:

```python
import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

# define that machine learning problem is binary classification
task = Task("binary")

reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE)

# create a feature selector
model0 = BoostLGBM(
default_params={'learning_rate': 0.05, 'num_leaves': 64,
'seed': 42, 'num_threads': N_THREADS}
)
pipe0 = LGBSimpleFeatures()
mbie = ModelBasedImportanceEstimator()
selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0)

# build first level pipeline for AutoML
pipe = LGBSimpleFeatures()
# stop after 20 iterations or after 30 seconds
params_tuner1 = OptunaTuner(n_trials=20, timeout=30)
model1 = BoostLGBM(
default_params={'learning_rate': 0.05, 'num_leaves': 128,
'seed': 1, 'num_threads': N_THREADS}
)
model2 = BoostLGBM(
default_params={'learning_rate': 0.025, 'num_leaves': 64,
'seed': 2, 'num_threads': N_THREADS}
)
pipeline_lvl1 = MLPipeline([
(model1, params_tuner1),
model2
], pre_selection=selector, features_pipeline=pipe, post_selection=None)

# build second level pipeline for AutoML
pipe1 = LGBSimpleFeatures()
model = BoostLGBM(
default_params={'learning_rate': 0.05, 'num_leaves': 64,
'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
freeze_defaults=True
)
pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1,
post_selection=None)

# build AutoML pipeline
automl = AutoML(reader, [
[pipeline_lvl1],
[pipeline_lvl2],
], skip_conn=False)

# train AutoML and get predictions
oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
test_pred = automl.predict(df_test)

pd.DataFrame({
'PassengerId':df_test.PassengerId,
'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)
```

[Back to top](#toc)

<a name="support"></a>
# Support and feature requests
Seek prompt advice at [Telegram group](https://t.me/lightautoml).

Open bug reports and feature requests on GitHub [issues](https://github.com/AILab-MLTools/LightAutoML/issues).

<a name="license"></a>
# License
This project is licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/AILab-MLTools/LightAutoML/blob/master/LICENSE) file for more details.

[Back to top](#toc)
File renamed without changes
File renamed without changes
Binary file removed imgs/LightAutoML_logo_big.png
Binary file not shown.
Binary file removed imgs/LightAutoML_logo_small.png
Binary file not shown.
Binary file removed imgs/Star_scheme_tables.png
Binary file not shown.
Binary file removed imgs/TabularAutoML_model_descr.png
Binary file not shown.
Binary file removed imgs/TabularUtilizedAutoML_model_descr.png
Binary file not shown.
Binary file removed imgs/autoint.png
Binary file not shown.
Binary file removed imgs/denselight.png
Binary file not shown.
Binary file removed imgs/densenet.png
Binary file not shown.
Binary file removed imgs/fttransformer.png
Binary file not shown.
Binary file removed imgs/node.png
Binary file not shown.
Binary file removed imgs/resnet.png
Binary file not shown.
Binary file removed imgs/swa.png
Binary file not shown.
Binary file removed imgs/tutorial_11_case_problem_statement.png
Binary file not shown.
Binary file removed imgs/tutorial_11_general_problem_statement.png
Binary file not shown.
Binary file removed imgs/tutorial_11_history_step_params.png
Binary file not shown.
Binary file removed imgs/tutorial_11_transformers_params.png
Binary file not shown.
Binary file removed imgs/tutorial_1_initial_report.png
Binary file not shown.
Binary file removed imgs/tutorial_1_laml_big.png
Binary file not shown.
Binary file removed imgs/tutorial_1_ml_pipeline.png
Binary file not shown.
Binary file removed imgs/tutorial_1_pipeline.png
Binary file not shown.
Binary file removed imgs/tutorial_1_unfolded_report.png
Binary file not shown.
Binary file removed imgs/tutorial_2_initial_report.png
Binary file not shown.
Binary file removed imgs/tutorial_2_pipeline.png
Binary file not shown.
Binary file removed imgs/tutorial_2_unfolded_report.png
Diff not rendered.
Binary file removed imgs/tutorial_3_initial_report.png
Diff not rendered.
Binary file removed imgs/tutorial_3_unfolded_report.png
Diff not rendered.
Binary file removed imgs/tutorial_blackbox_pipeline.png
Diff not rendered.
Binary file removed imgs/tutorial_whitebox_report_1.png
Diff not rendered.
Binary file removed imgs/tutorial_whitebox_report_2.png
Diff not rendered.
Binary file removed imgs/tutorial_whitebox_report_3.png
Diff not rendered.
Binary file removed imgs/tutorial_whitebox_report_4.png
Diff not rendered.

0 comments on commit 4122697

Please sign in to comment.