-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triton naming #747
Triton naming #747
Conversation
Hi @deadeyegoodwin. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @ellis-bigelow |
@ellis-bigelow @rakelkar |
There is also a related PR for KubeFlow website: kubeflow/website#1846 |
/ok-to-test |
Yeah we're definitely missing changes for https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha2/framework_tensorrt.go |
Good point, gonna add a e2e test in my other triton example PR. |
So do I need to make changes to framework_tensorrt.go, or @yuzisun are you going to handle that in your followup PR? |
@deadeyegoodwin I can take care of that in my PR, once I am done we can merge together. |
Not sure this change makes sense to me? In my understanding KFServing "Frameworks" are ML frameworks - which inference server we use to inference them should be an implementation detail. Triton is an inference server, and we should certainly update the TensorRT framework to use it (and possibly some other or all other frameworks to use it as well e.g. #744 ) but I don't think Triton is an ML framework. So it seems logical to me to include TensorRT as a framework rather (assuming NVIDIA cares enough to support it) and consider updating the TensorRT sample to use an actual TRT model.plan instead of TF as it currently does. @ellis-bigelow @cliveseldon @animeshsingh @yuzisun /hold |
@rakelkar I think currently the way we organize is really by inference server, maybe the naming is a bit confusing which sounds like ML framework. e.g when you use
|
I agree with Rakesh's point. QQ though, is the TensorRT name unchanged re: the framework? |
@yuzisun , I'd lean away from specifying inferenceserver in framework. This multilayered model seems more complex than it's worth. |
@yuzisun when we say ONNX - it has to be a model serialized in ONNX format.. it doesnt matter how you got to that point... and from user perspective it doesnt matter whether you use the C++ ONNXServer for inference or your own Python Server that hosts the ONNX runtime... that is a serving infra decision... So IMO framework maps directly to the serialized model format... |
I believe they only changed the name of the inference server (which makes sense since it supports multiple model formats)... |
So when user want to serve a tensorflow model with triton inference server, how do we specify that on our spec? |
btw you can achieve this today by changing the configmap to point to a different inference server for the framework - but it is platform-wide opinion and not per-model... I guess our stance is the platform chooses for the 90% ... and edge cases are covered via custom? This is what cloud providers do too btw (e.g. sagemaker) wdyt? |
TensorRT "inference accelerator" is still called "TensorRT": https://developer.nvidia.com/tensorrt |
I would agree with you, but we are not there yet I think, also currently it is not simply a change in configmap with the image for a given ML framework, the startup command and parameters are pretty hard coded for a given inference server in the controller. Even that could change in the future I would still think giving user the ability to choose which inference server to use per model is needed, like for pytorch what if the model can not be converted to onnx and user just want to serve on a simple KFServing python server without going custom route. |
But for that we already support pytorch as a framework and have a PYT server (that loads native PYT serialized models and does not require ONNX)? |
ya just making an example, or a more relevant question regarding to triton inference server is that since it is a multi-framework inference server how can we allow user to serve tensorflow/pytorch model on triton inference server by leveraging their auto-batching, multi-model co-hosting features without having to convert to the accelerated model format TensorRT. Currently the only way to use triton inference server it to put under |
I think model server might be a more accurate descriptor. If it's capable of serving many different model formats, triton definitely seems more clear than tensorrt to me. The only place I hesitate, is I don't want to change xgboost or sklearn to have a server suffix, nor do I want to remove them. Those specs are framework-based. Can we simply treat triton as a slight exception to the framework rule for now and we can resolve this more holistically once we have a clearer understanding of where the industry is moving with model servers. |
Sounds fair! /unhold |
oh needs to move the test model from |
Are we planning another 0.3.x release before the v1beta1 API changes? Much of this renaming will happen there. |
Also want to know. |
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@yuzisun Could the kfserving v0.3.0 support triton server now? |
* Triton renaming * Update docs/samples/triton/simple_string/triton.yaml Co-authored-by: Dan Sun <dsun20@bloomberg.net> * Update test_triton.py Co-authored-by: Dan Sun <dsun20@bloomberg.net>
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: