[REVIEW] Faster Treelite serialization #2263

hcho3 · 2020-05-14T04:48:40Z

Summary

Speed up serialization of Treelite model objects and reduce overhead in multi-GPU RF prediction.

Features

Flat object layout: now Tree is an array of POD objects (Node). As a consequence, all queries for a give node should now be made via methods of Tree class.
Binary serialization without depending on Protobuf
Eliminate the use of tempfile.

Benchmark setup

Using this script by @Salonijain27. See the link for instructions.
AWS EC2 instance g4dn.12xlarge, T4 GPU (16 GB GDDR6) X 4
Benchmark setting: n_gpus=2, n_gb=2, n_features=20, depth=25, n_estimators=10. This leads to a forest consisting of 10 depth-25 trees, and we run through 25 million data rows.

Benchmark Results

Aggregate

Before	After
269.0 sec	109.4 sec (2.46x speedup)

Breakdown by components

Notice that some of the overhead is still yet to be explained. Also note that the actual prediction time is a small portion of the total run time. Due to the nature of distributed algorithm, timing measures are approximate.

Current cuML

	Master	Worker 0	Worker 1
RF->Treelite		32.6	32.8
Treelite->Protobuf		10.8	11.3
(Copy models workers -> master)
Protobuf->Treelite		26.6	23.8
Concatenate Treelite handles	23.3
(Copy model master -> workers)
Protobuf->Treelite		52.2	52.3
Treelite->FIL		8.6	8.5
FIL predict		2.9	1.1

This PR

	Master	Worker 0	Worker 1
RF->Treelite		22.1	22.5
Treelite->Bytes		< 0.001	< 0.001
(Copy models workers -> master)
Bytes->Treelite		< 0.001	< 0.001
Concatenate Treelite handles	3.1
(Copy model master -> workers)
Bytes->Treelite		< 0.001	< 0.001
Treelite->FIL		4.8	4.7
FIL predict		4.9	1.1

cpp/cmake/Dependencies.cmake

JohnZed · 2020-05-14T16:26:07Z

Very promising! I suspect that the RF -> treelite can be accelerated a lot in a future rev too... I don't know why it should have to be much longer than treelite-> fil in general if we move to convert the representation efficiently.

JohnZed

I did a fairly quick pass for the first rev... will review again in more detail as things get wrapped up.
Overall, I think it's great and looks pretty clear. The changes to the core cython code are smaller than I expected - it fits in well with the existing pattern, so I think it's pretty understandable.
I don't fully get the serialization to frames - looks like it simply passing through the binary format used within tl? Seems reasonable to me, but I haven't used this style of conversion before with Py_buffers etc.

cpp/cmake/Dependencies.cmake

cpp/src/fil/fil.cu

cpp/test/sg/rf_treelite_test.cu

JohnZed · 2020-05-14T23:35:04Z

python/cuml/ensemble/randomforestclassifier.pyx

@@ -515,18 +509,21 @@ class RandomForestClassifier(Base):
           to a shared file. Cuml issue #1854 has been created to track this.
    """
    def _tl_model_handles(self, model_bytes):
-        cdef ModelHandle cuml_model_ptr = NULL
+        cdef uintptr_t tl_handle_int


In another pr, I proposed renaming this something like _alloc_and_convert_model to make it clear that the caller needs to free the result.

JohnZed · 2020-05-14T23:37:23Z

python/cuml/ensemble/randomforestregressor.pyx

        """
-        Returns the self.model_pbuf_bytes.
+        Returns the self.model_bytes.


Maybe clarify the type and update the rest of docstring: "Returns the treelite binary format representation of this model."

hcho3 · 2020-05-14T23:56:09Z

I don't fully get the serialization to frames - looks like it simply passing through the binary format used within tl?

Treelite objects now exposes the Python buffer protocol interface, so that we can transparently convert Treelite objects to memory views with zero overhead. In init_from_frames(), we fetch the buffer interface from the Treelite object and cast it into memory view. It is O(1) because it amounts to pointer casting.

@jakirkham gave me valuable advice for implementing the Python buffer protocol.

Salonijain27

Looks good, its great to see the time required for predict drop so much! I have a few suggestions and questions

cpp/test/sg/fil_test.cu

python/cuml/ensemble/randomforestclassifier.pyx

python/cuml/ensemble/randomforestregressor.pyx

python/cuml/ensemble/randomforestclassifier.pyx

cpp/src/decisiontree/decisiontree_impl.cuh

python/cuml/benchmark/bench_helper_funcs.py

python/cuml/ensemble/randomforest_shared.pyx

python/cuml/ensemble/randomforest_shared.pxd

hcho3 · 2020-05-20T02:40:14Z

This is quite a strange error:



=================================== FAILURES ===================================

_________________________ test_real_algos_runner[FIL] __________________________



algo_name = 'FIL'



    @pytest.mark.parametrize('algo_name', ['UMAP-Supervised',

                                           'DBSCAN',

                                           'LogisticRegression',

                                           'ElasticNet',

                                           'FIL'])

    def test_real_algos_runner(algo_name):

        pair = algorithms.algorithm_by_name(algo_name)

    

        if (algo_name == 'UMAP' and not has_umap()) or \

           (algo_name == 'FIL' and not has_xgboost()):

            pytest.xfail()

    

        runner = AccuracyComparisonRunner(

            [20], [5], dataset_name='classification', test_fraction=0.20

        )

        results = runner.run(pair)[0]

        print(results)

>       assert results["cuml_acc"] is not None

E       KeyError: 'cuml_acc'



cuml/test/test_benchmark.py:190: KeyError

hcho3 · 2020-05-21T22:41:20Z

I managed to fix the failing benchmark test. Marking this as ready for review.

python/cuml/ensemble/randomforest_shared.pyx

python/cuml/ensemble/randomforestclassifier.pyx

cpp/src/fil/fil.cu

python/cuml/dask/ensemble/base.py

hcho3 · 2020-06-14T10:47:39Z

List of changes made to Treelite:

Refactor CMake build; create treelite_runtime Python pkg (Refactor CMake build; create treelite_runtime Python pkg dmlc/treelite#167)
Upgrade to PEP 8 style (Re-enable formatting check dmlc/treelite#175)
Refactor Python tests with Pytest (Refactor Python tests with Pytest dmlc/treelite#176)
Re-factor scikit-learn converter to use mixins (Refactor scikit-learn converter dmlc/treelite#179)
Fast binary serialization of Treelite models (Fast binary serialization of Treelite models dmlc/treelite#178)

Once all tests pass, I will go ahead and release 0.92 version of Treelite.

ci/gpu/build.sh

jakirkham

Exciting to see all of the progress here, @hcho3! 😄

Added a couple of comments about how we might simplify things a bit here. Please let me know if you have any questions 🙂

python/cuml/ensemble/randomforest_shared.pyx

python/cuml/ensemble/randomforestregressor.pyx

python/cuml/dask/ensemble/base.py

python/cuml/ensemble/randomforest_shared.pyx

python/cuml/ensemble/randomforestclassifier.pyx

hcho3 · 2020-06-16T02:34:13Z

I submitted Treelite 0.92 to conda-forge: conda-forge/staged-recipes#11926. Fingers crossed.

JohnZed

Looks great! I have only small questions/suggestions.
Also, are there additional unit tests that would be helpful here? Serializing+deserializing model variants (e.g. classification/regression/multiclass) and ensuring we get the properties right? Not sure...

cpp/cmake/Dependencies.cmake

JohnZed · 2020-06-16T02:51:52Z

cpp/src/randomforest/randomforest.cu

+  if (task_category > 2) {
+    // Multi-class classification
+    TREELITE_CHECK(TreeliteModelBuilderSetModelParam(
+      model_builder, "pred_transform", "max_index"));


Isn't multiclass currently disabled until #2248 goes in?

I think so. I included the line because it was already part of the current codebase.

python/cuml/ensemble/randomforestclassifier.pyx

hcho3 · 2020-06-16T21:16:01Z

@JohnZed The Treelite repo contains several round-trip tests for the new serializer: https://github.com/dmlc/treelite/blob/master/tests/cpp/test_serializer.cc

hcho3 · 2020-06-16T21:26:49Z

@jakirkham I addressed all your comments, except the one about casting buffer frames.

jakirkham

Grouping together suggested format_str encoding changes for clearer discussion.

python/cuml/ensemble/randomforest_shared.pyx

jakirkham · 2020-06-16T22:21:41Z

python/cuml/ensemble/randomforest_shared.pyx

+    model: uintptr_t
+) -> Dict[str, Union[List[str], List[np.ndarray]]]:
+    frames = get_frames(model)
+    header = {'format_str': [x.format.encode('utf-8') for x in frames],


Suggested change

header = {'format_str': [x.format.encode('utf-8') for x in frames],

header = {'format_str': [x.format for x in frames],

I went ahead and applied your suggestion.

Is it preferable to pass str to pickle, rather than bytes? I'd like to understand your reasoning behind this suggestion.

Was thinking about this in the context of moving from pickle to Dask serialization down the road (assuming that is still the plan). Typically the header consists of things like int, str, dict, list, and tuple. Generally things that are MsgPack serializable. Typically bytes and memoryviews are reserved for frames instead.

Was unsure at first whether bytes would work in the header. However after playing with things a bit bytes may work. MsgPack is at least able to handle them with the flags that Dask is using.

Good point. Thanks for your explanation. I agree that built-in types like str would be well supported by Dask serializer.

Co-authored-by: jakirkham <jakirkham@gmail.com>

jakirkham · 2020-06-16T23:29:21Z

Thanks for all of the work here @hcho3! 😄 Looks good

Sounds like we are going to handle switching to the treelite Conda package in another PR. Is that right?

hcho3 · 2020-06-16T23:30:32Z

Sounds like we are going to handle switching to the treelite Conda package in another PR. Is that right?

Yes, I’ll work on it after this PR is merged.

hcho3 · 2020-06-17T01:30:50Z

The failing test in the CI should be fixed by #2432

JohnZed

Looks great!

…e_serializer

hcho3 commented May 14, 2020

View reviewed changes

cpp/cmake/Dependencies.cmake Outdated Show resolved Hide resolved

dantegd added the 2 - In Progress Currenty a work in progress label May 14, 2020

JohnZed requested review from Salonijain27 and JohnZed May 14, 2020 19:47

JohnZed reviewed May 14, 2020

View reviewed changes

Salonijain27 suggested changes May 15, 2020

View reviewed changes

hcho3 marked this pull request as ready for review May 15, 2020 20:45

hcho3 requested review from a team as code owners May 15, 2020 20:45

Salonijain27 added the 4 - Waiting on Author Waiting for author to respond to review label May 15, 2020

jakirkham reviewed May 18, 2020

View reviewed changes

cpp/src/decisiontree/decisiontree_impl.cuh Show resolved Hide resolved

jakirkham reviewed May 18, 2020

View reviewed changes

python/cuml/benchmark/bench_helper_funcs.py Outdated Show resolved Hide resolved

jakirkham reviewed May 18, 2020

View reviewed changes

python/cuml/ensemble/randomforest_shared.pyx Show resolved Hide resolved

Salonijain27 reviewed May 19, 2020

View reviewed changes

python/cuml/ensemble/randomforest_shared.pxd Outdated Show resolved Hide resolved

hcho3 force-pushed the fast_treelite_serializer branch from 117d313 to a935bee Compare May 20, 2020 00:30

hcho3 changed the title ~~[WIP] Faster Treelite serialization~~ [REVIEW] Faster Treelite serialization May 21, 2020

raydouglass approved these changes May 22, 2020

View reviewed changes

jakirkham reviewed May 22, 2020

View reviewed changes

python/cuml/ensemble/randomforest_shared.pyx Outdated Show resolved Hide resolved

jakirkham reviewed May 22, 2020

View reviewed changes

python/cuml/ensemble/randomforestclassifier.pyx Outdated Show resolved Hide resolved

Salonijain27 reviewed May 26, 2020

View reviewed changes

cpp/src/fil/fil.cu Outdated Show resolved Hide resolved

Salonijain27 reviewed May 26, 2020

View reviewed changes

python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved

hcho3 changed the title ~~[WIP] Faster Treelite serialization~~ [REVIEW] Faster Treelite serialization Jun 15, 2020

hcho3 commented Jun 15, 2020

View reviewed changes

ci/gpu/build.sh Outdated Show resolved Hide resolved

Use Treelite 0.92 from Pip

ab0ddbb

jakirkham reviewed Jun 16, 2020

View reviewed changes

JohnZed requested changes Jun 16, 2020

View reviewed changes

hcho3 added 3 commits June 16, 2020 19:50

Use Treelite 0.92 in CMake dependencies

57bc348

model_serialized -> treelite_serialized_model

4c909fd

Address reviewer's feedback

e93fd5f

Style fix

334b086

jakirkham reviewed Jun 16, 2020

View reviewed changes

Apply suggestions from code review

fa57994

Co-authored-by: jakirkham <jakirkham@gmail.com>

hcho3 mentioned this pull request Jun 17, 2020

[REVIEW] Updating benchmark tests for correct UMAP algo name #2432

Merged

JohnZed approved these changes Jun 17, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/branch-0.15' into fast_treelit…

c065fe0

…e_serializer

Salonijain27 approved these changes Jun 17, 2020

View reviewed changes

JohnZed merged commit f231111 into rapidsai:branch-0.15 Jun 17, 2020

hcho3 deleted the fast_treelite_serializer branch June 17, 2020 19:35

This was referenced Jun 18, 2020

[FEA] Speed up RF -> FIL conversion for inference #2399

Closed

[BUG] FIL: Treelite's new data-type support breaks protobuf format #1305

Closed

hcho3 mentioned this pull request Jul 10, 2020

[DISCUSS] Remove Protobuf dependency #2538

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Faster Treelite serialization #2263

[REVIEW] Faster Treelite serialization #2263

hcho3 commented May 14, 2020 •

edited

Loading

JohnZed commented May 14, 2020

JohnZed left a comment

JohnZed May 14, 2020

JohnZed May 14, 2020

hcho3 commented May 14, 2020 •

edited

Loading

Salonijain27 left a comment

hcho3 commented May 20, 2020

hcho3 commented May 21, 2020

hcho3 commented Jun 14, 2020 •

edited

Loading

jakirkham left a comment

hcho3 commented Jun 16, 2020

JohnZed left a comment

JohnZed Jun 16, 2020

hcho3 Jun 16, 2020

hcho3 commented Jun 16, 2020

hcho3 commented Jun 16, 2020

jakirkham left a comment

jakirkham Jun 16, 2020

hcho3 Jun 16, 2020 •

edited

Loading

jakirkham Jun 16, 2020

hcho3 Jun 16, 2020

jakirkham commented Jun 16, 2020

hcho3 commented Jun 16, 2020

hcho3 commented Jun 17, 2020

JohnZed left a comment

	header = {'format_str': [x.format.encode('utf-8') for x in frames],
	header = {'format_str': [x.format for x in frames],

[REVIEW] Faster Treelite serialization #2263

[REVIEW] Faster Treelite serialization #2263

Conversation

hcho3 commented May 14, 2020 • edited Loading

Summary

Features

Benchmark setup

Benchmark Results

Aggregate

Breakdown by components

JohnZed commented May 14, 2020

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed May 14, 2020

Choose a reason for hiding this comment

JohnZed May 14, 2020

Choose a reason for hiding this comment

hcho3 commented May 14, 2020 • edited Loading

Salonijain27 left a comment

Choose a reason for hiding this comment

hcho3 commented May 20, 2020

hcho3 commented May 21, 2020

hcho3 commented Jun 14, 2020 • edited Loading

jakirkham left a comment

Choose a reason for hiding this comment

hcho3 commented Jun 16, 2020

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed Jun 16, 2020

Choose a reason for hiding this comment

hcho3 Jun 16, 2020

Choose a reason for hiding this comment

hcho3 commented Jun 16, 2020

hcho3 commented Jun 16, 2020

jakirkham left a comment

Choose a reason for hiding this comment

jakirkham Jun 16, 2020

Choose a reason for hiding this comment

hcho3 Jun 16, 2020 • edited Loading

Choose a reason for hiding this comment

jakirkham Jun 16, 2020

Choose a reason for hiding this comment

hcho3 Jun 16, 2020

Choose a reason for hiding this comment

jakirkham commented Jun 16, 2020

hcho3 commented Jun 16, 2020

hcho3 commented Jun 17, 2020

JohnZed left a comment

Choose a reason for hiding this comment

hcho3 commented May 14, 2020 •

edited

Loading

hcho3 commented May 14, 2020 •

edited

Loading

hcho3 commented Jun 14, 2020 •

edited

Loading

hcho3 Jun 16, 2020 •

edited

Loading