Triton naming #747

deadeyegoodwin · 2020-03-25T20:59:35Z

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

k8s-ci-robot · 2020-03-25T20:59:46Z

Hi @deadeyegoodwin. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

deadeyegoodwin · 2020-03-25T21:00:22Z

/assign @ellis-bigelow

deadeyegoodwin · 2020-03-25T21:01:45Z

@ellis-bigelow @rakelkar
I'm not sure I got all the references or fixed them correctly. There are other mentions of TensorRT but some seem related to integration with TF and those should stay as TensorRT. It seems like TensorRTSpec should be renamed but not sure.

deadeyegoodwin · 2020-03-25T21:42:00Z

There is also a related PR for KubeFlow website: kubeflow/website#1846

kubeflow-bot · 2020-03-25T22:01:39Z

This change is

docs/samples/triton/README.md

yuzisun · 2020-03-30T11:24:51Z

/ok-to-test

ellistarn · 2020-03-30T18:39:36Z

Yeah we're definitely missing changes for https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha2/framework_tensorrt.go
I'd run make test after making the changes.

yuzisun · 2020-03-30T18:58:32Z

Yeah we're definitely missing changes for https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha2/framework_tensorrt.go
I'd run make test after making the changes.

Good point, gonna add a e2e test in my other triton example PR.

deadeyegoodwin · 2020-03-31T16:27:39Z

So do I need to make changes to framework_tensorrt.go, or @yuzisun are you going to handle that in your followup PR?

yuzisun · 2020-04-01T15:20:34Z

So do I need to make changes to framework_tensorrt.go, or @yuzisun are you going to handle that in your followup PR?

@deadeyegoodwin I can take care of that in my PR, once I am done we can merge together.

rakelkar · 2020-04-02T19:46:32Z

Not sure this change makes sense to me? In my understanding KFServing "Frameworks" are ML frameworks - which inference server we use to inference them should be an implementation detail.

Triton is an inference server, and we should certainly update the TensorRT framework to use it (and possibly some other or all other frameworks to use it as well e.g. #744 ) but I don't think Triton is an ML framework.

So it seems logical to me to include TensorRT as a framework rather (assuming NVIDIA cares enough to support it) and consider updating the TensorRT sample to use an actual TRT model.plan instead of TF as it currently does.

@ellis-bigelow @cliveseldon @animeshsingh @yuzisun

/hold

yuzisun · 2020-04-02T20:17:07Z

@rakelkar I think currently the way we organize is really by inference server, maybe the naming is a bit confusing which sounds like ML framework. e.g when you use onnx the actually model can be pytorch but served with onnxruntime; when you use pytorch it is really using KFServing maintained pytorch server to serve the pytorch model. Are we suggesting we should make it two levels, like user first specify ML framework and then choose which inference server to serve the model ?

Predictor:
   pytorch:
       storageUri: XXX
       inferenceServer: triton/onnx/kfserving // A list choices which can serve pytorch model

ellistarn · 2020-04-02T20:46:47Z

I agree with Rakesh's point. QQ though, is the TensorRT name unchanged re: the framework?

ellistarn · 2020-04-02T21:03:42Z

@yuzisun , I'd lean away from specifying inferenceserver in framework. This multilayered model seems more complex than it's worth.

rakelkar · 2020-04-02T21:15:45Z

@yuzisun when we say ONNX - it has to be a model serialized in ONNX format.. it doesnt matter how you got to that point... and from user perspective it doesnt matter whether you use the C++ ONNXServer for inference or your own Python Server that hosts the ONNX runtime... that is a serving infra decision...

So IMO framework maps directly to the serialized model format...

rakelkar · 2020-04-02T21:17:19Z

@yuzisun , I'd lean away from specifying inferenceserver in framework. This multilayered model seems more complex than it's worth.

I agree with Rakesh's point. QQ though, is the TensorRT name unchanged re: the framework?

I believe they only changed the name of the inference server (which makes sense since it supports multiple model formats)...

yuzisun · 2020-04-02T21:19:45Z

@yuzisun , I'd lean away from specifying inferenceserver in framework. This multilayered model seems more complex than it's worth.

I agree with Rakesh's point. QQ though, is the TensorRT name unchanged re: the framework?

I believe they only changed the name of the inference server (which makes sense since it supports multiple model formats)...

So when user want to serve a tensorflow model with triton inference server, how do we specify that on our spec?

rakelkar · 2020-04-02T21:22:09Z

like user first specify ML framework and then choose which inference server to serve the model

btw you can achieve this today by changing the configmap to point to a different inference server for the framework - but it is platform-wide opinion and not per-model... I guess our stance is the platform chooses for the 90% ... and edge cases are covered via custom? This is what cloud providers do too btw (e.g. sagemaker)

wdyt?

deadeyegoodwin · 2020-04-02T21:46:45Z

TensorRT "inference accelerator" is still called "TensorRT": https://developer.nvidia.com/tensorrt

yuzisun · 2020-04-02T21:49:34Z

like user first specify ML framework and then choose which inference server to serve the model

btw you can achieve this today by changing the configmap to point to a different inference server for the framework - but it is platform-wide opinion and not per-model... I guess our stance is the platform chooses for the 90% ... and edge cases are covered via custom? This is what cloud providers do too btw (e.g. sagemaker)

wdyt?

I would agree with you, but we are not there yet I think, also currently it is not simply a change in configmap with the image for a given ML framework, the startup command and parameters are pretty hard coded for a given inference server in the controller. Even that could change in the future I would still think giving user the ability to choose which inference server to use per model is needed, like for pytorch what if the model can not be converted to onnx and user just want to serve on a simple KFServing python server without going custom route.

rakelkar · 2020-04-02T22:24:00Z

model can not be converted to onnx and user just want to serve on a simple KFServing python server without going custom route

But for that we already support pytorch as a framework and have a PYT server (that loads native PYT serialized models and does not require ONNX)?

yuzisun · 2020-04-03T00:35:35Z

model can not be converted to onnx and user just want to serve on a simple KFServing python server without going custom route

But for that we already support pytorch as a framework and have a PYT server (that loads native PYT serialized models and does not require ONNX)?

ya just making an example, or a more relevant question regarding to triton inference server is that since it is a multi-framework inference server how can we allow user to serve tensorflow/pytorch model on triton inference server by leveraging their auto-batching, multi-model co-hosting features without having to convert to the accelerated model format TensorRT. Currently the only way to use triton inference server it to put under tensorrt spec but it is confusing and the model actually may not be in tensorRT format, how do you think we can proceed from here?

ellistarn · 2020-04-03T17:44:47Z

I think model server might be a more accurate descriptor. If it's capable of serving many different model formats, triton definitely seems more clear than tensorrt to me.

The only place I hesitate, is I don't want to change xgboost or sklearn to have a server suffix, nor do I want to remove them. Those specs are framework-based. Can we simply treat triton as a slight exception to the framework rule for now and we can resolve this more holistically once we have a clearer understanding of where the industry is moving with model servers.

rakelkar · 2020-04-03T21:40:29Z

Can we simply treat triton as a slight exception to the framework rule for now and we can resolve this more holistically once we have a clearer understanding of where the industry is moving with model servers

Sounds fair!

/unhold

yuzisun · 2020-06-02T22:20:25Z

oh needs to move the test model from gs://kfserving-samples/models/tensorrt to gs://kfserving-samples/models/triton

ellistarn · 2020-06-02T22:53:03Z

Are we planning another 0.3.x release before the v1beta1 API changes? Much of this renaming will happen there.

edenbuaa · 2020-06-04T06:16:46Z

Are we planning another 0.3.x release before the v1beta1 API changes? Much of this renaming will happen there.

Also want to know.

docs/samples/triton/simple_string/triton.yaml

Co-authored-by: Dan Sun <dsun20@bloomberg.net>

yuzisun · 2020-06-05T22:41:00Z

/lgtm
/approve

k8s-ci-robot · 2020-06-05T22:41:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yuzisun · 2020-06-06T00:04:19Z

/retest

edenbuaa · 2020-06-09T09:13:34Z

@yuzisun Could the kfserving v0.3.0 support triton server now?

* Triton renaming * Update docs/samples/triton/simple_string/triton.yaml Co-authored-by: Dan Sun <dsun20@bloomberg.net> * Update test_triton.py Co-authored-by: Dan Sun <dsun20@bloomberg.net>

k8s-ci-robot added the do-not-merge/work-in-progress label Mar 25, 2020

k8s-ci-robot added needs-ok-to-test size/M labels Mar 25, 2020

k8s-ci-robot requested review from jinchihe and pugangxa March 25, 2020 21:00

k8s-ci-robot assigned ellistarn Mar 25, 2020

abhi-g unassigned ellistarn Mar 25, 2020

abhi-g assigned animeshsingh and ellistarn Mar 25, 2020

yuzisun reviewed Mar 30, 2020

View reviewed changes

docs/samples/triton/README.md Outdated Show resolved Hide resolved

docs/samples/triton/README.md Outdated Show resolved Hide resolved

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Mar 30, 2020

deadeyegoodwin requested a review from yuzisun March 30, 2020 18:04

k8s-ci-robot added the do-not-merge/hold label Apr 2, 2020

k8s-ci-robot removed the do-not-merge/hold label Apr 3, 2020

janeman98 mentioned this pull request Apr 6, 2020

Triton example link is broken kubeflow/website#1874

Closed

Triton renaming

f4db946

ukclivecox mentioned this pull request Jun 3, 2020

[Failed] failed to update deployment "flowers-sample-predictor-default-8h6b5-deployment" #833

Closed

yuzisun reviewed Jun 4, 2020

View reviewed changes

docs/samples/triton/simple_string/triton.yaml Outdated Show resolved Hide resolved

deadeyegoodwin and others added 2 commits June 4, 2020 16:15

Update docs/samples/triton/simple_string/triton.yaml

531d5c4

Co-authored-by: Dan Sun <dsun20@bloomberg.net>

Update test_triton.py

2b7dc3c

yuzisun changed the title ~~WIP: Triton naming~~ Triton naming Jun 5, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress label Jun 5, 2020

k8s-ci-robot assigned yuzisun Jun 5, 2020

k8s-ci-robot added the lgtm label Jun 5, 2020

k8s-ci-robot added the approved label Jun 5, 2020

k8s-ci-robot merged commit 23e2a03 into kserve:master Jun 6, 2020

jbottum mentioned this pull request Nov 19, 2020

Kubeflow 1.2 Blog post kubeflow/community#455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton naming #747

Triton naming #747

deadeyegoodwin commented Mar 25, 2020

k8s-ci-robot commented Mar 25, 2020

deadeyegoodwin commented Mar 25, 2020

deadeyegoodwin commented Mar 25, 2020

deadeyegoodwin commented Mar 25, 2020

kubeflow-bot commented Mar 25, 2020

yuzisun commented Mar 30, 2020

ellistarn commented Mar 30, 2020

yuzisun commented Mar 30, 2020

deadeyegoodwin commented Mar 31, 2020

yuzisun commented Apr 1, 2020

rakelkar commented Apr 2, 2020

yuzisun commented Apr 2, 2020 •

edited

Loading

ellistarn commented Apr 2, 2020 •

edited

Loading

ellistarn commented Apr 2, 2020

rakelkar commented Apr 2, 2020

rakelkar commented Apr 2, 2020

yuzisun commented Apr 2, 2020

rakelkar commented Apr 2, 2020

deadeyegoodwin commented Apr 2, 2020

yuzisun commented Apr 2, 2020

rakelkar commented Apr 2, 2020

yuzisun commented Apr 3, 2020 •

edited

Loading

ellistarn commented Apr 3, 2020 •

edited

Loading

rakelkar commented Apr 3, 2020

yuzisun commented Jun 2, 2020

ellistarn commented Jun 2, 2020

edenbuaa commented Jun 4, 2020

yuzisun commented Jun 5, 2020

k8s-ci-robot commented Jun 5, 2020

yuzisun commented Jun 6, 2020

edenbuaa commented Jun 9, 2020

Triton naming #747

Triton naming #747

Conversation

deadeyegoodwin commented Mar 25, 2020

k8s-ci-robot commented Mar 25, 2020

deadeyegoodwin commented Mar 25, 2020

deadeyegoodwin commented Mar 25, 2020

deadeyegoodwin commented Mar 25, 2020

kubeflow-bot commented Mar 25, 2020

yuzisun commented Mar 30, 2020

ellistarn commented Mar 30, 2020

yuzisun commented Mar 30, 2020

deadeyegoodwin commented Mar 31, 2020

yuzisun commented Apr 1, 2020

rakelkar commented Apr 2, 2020

yuzisun commented Apr 2, 2020 • edited Loading

ellistarn commented Apr 2, 2020 • edited Loading

ellistarn commented Apr 2, 2020

rakelkar commented Apr 2, 2020

rakelkar commented Apr 2, 2020

yuzisun commented Apr 2, 2020

rakelkar commented Apr 2, 2020

deadeyegoodwin commented Apr 2, 2020

yuzisun commented Apr 2, 2020

rakelkar commented Apr 2, 2020

yuzisun commented Apr 3, 2020 • edited Loading

ellistarn commented Apr 3, 2020 • edited Loading

rakelkar commented Apr 3, 2020

yuzisun commented Jun 2, 2020

ellistarn commented Jun 2, 2020

edenbuaa commented Jun 4, 2020

yuzisun commented Jun 5, 2020

k8s-ci-robot commented Jun 5, 2020

yuzisun commented Jun 6, 2020

edenbuaa commented Jun 9, 2020

yuzisun commented Apr 2, 2020 •

edited

Loading

ellistarn commented Apr 2, 2020 •

edited

Loading

yuzisun commented Apr 3, 2020 •

edited

Loading

ellistarn commented Apr 3, 2020 •

edited

Loading