Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Implement an LGBM->ONNX model conversion + inferencing #147

Open
jfomhover opened this issue Nov 5, 2021 · 0 comments · May be fixed by #271
Open

Implement an LGBM->ONNX model conversion + inferencing #147

jfomhover opened this issue Nov 5, 2021 · 0 comments · May be fixed by #271
Assignees
Milestone

Comments

@jfomhover
Copy link
Contributor

jfomhover commented Nov 5, 2021

The goal of this task is to add another variant to the inferencing benchmark for LightGBM. We already are comparing lightgbm python, lightgbm C, treelite. We'd like to try onnxruntime as it seems to be applicable.

In particular, we'd like to reproduce the results in this post on hummindbird and onnxruntime for classical ML models.

Feel free to reach out to the posters of the blog for collaboration.

The expected impact of this task:

  • increase the value of the benchmark for the lightgbm community, in particular for production scenarios
  • identify better production inferencing technologies

⚠️ It is unknown at this point if hummingbird allows the conversion of lightgbm>=v3 models to onnx. If that was impossible, it's still a good think to know, and to report in the hummingbird issues.

Learning Goals

By working on this project you'll be able to learn:

  • how to use onnxruntime for classical ML models
  • how to compare inferencing technologies in a benchmark
  • how to write components and pipelines for AzureML (component sdk + shrike)

Expected Deliverable:

To complete this task, you need to deliver:

  • 2 working python script: one to convert lightgbm models into onnx (using hummingbird?), one to use onnxruntime for inferencing
  • their corresponding working AzureML component
  • a successful run of the lightgbm inferencing benchmark pipeline

Instructions

Prepare for coding

  1. Follow the installation process, please report any issue you meet, that will help!
  2. Clone this repo, create your own branch username/onnxruntime (or something) for your own work (commit often!).
  3. In src/scripts/model_transformation create a folder lightgbm_to_onnx/ and copy the content of src/scripts/samples/ in it.

Local development

Let's start locally first.

To iterate on your python script, you need to consider a couple of constraints:

  • Follow the instructions in the sample script to modify and make your own.
  • Please consider using inputs and outputs that are provided as directories, not single files. There's a helper function to let you automatically select the unique file contained in a directory (see src/common/io.py function input_file_path)

Here's a couple of links to get you started:

Feel free to check out the current treelite modules (model_conversion/treelite_compile and inferencing/treelite_python). They have a similar behavior. You can also implement some unit tests from tests/scripts/test_treelite_python.py.

Develop for AzureML

Component specification

  1. First, unit tests. Edit tests/aml/test_components.py and watch for the list of components. Add the relative path to your component spec in this list.
    You can test your component by running

    pytest tests/aml/test_components.py -v -k name_of_component
  2. Edit the file spec.yaml in the directory of your component (copied from sample) and align its arguments with the expected arguments of your component until you pass the unit tests.

Integration in the inferencing pipeline

WORK IN PROGRESS

@jfomhover jfomhover added the good first issue Good for newcomers label Nov 5, 2021
@jfomhover jfomhover added this to the Expansion milestone Nov 5, 2021
@majercakdavid majercakdavid linked a pull request Oct 17, 2022 that will close this issue
@majercakdavid majercakdavid self-assigned this Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants