Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT? #451

Closed
dmac opened this issue Jun 12, 2024 · 5 comments
Labels
question Further information is requested

Comments

@dmac
Copy link

dmac commented Jun 12, 2024

My company runs a Triton fleet that serves multiple models which are continuously retrained and reloaded onto Triton using Model Control Mode EXPLICIT. These are organized into a few different model types, but our convention is that every instance of a model is given a unique name and we don't use the Triton notion of versions.

For example, we have model A training on some cadence and model B training on some cadence, and the latest version of each is loaded on Triton as soon as it's done training. A specific instance of a model is named like "A.1", and when the next iteration of that model is trained, it will be named "A.2", etc. We might also have multiple versions for each type of model loaded onto Triton to support clean transitions from one version of a model to the next. So, at a given moment in time, we might have the following models loaded: A.10, A.11, B.75.

In this example, say we introduce a brand new type of model called C. We also are able to retire model A.10 because it's no longer receiving requests. So at another moment in time we might have these models loaded: A.11, B.75, C.1.

I'm investigating whether it's possible to use the HPS plugin for TensorRT with our existing architecture, without needing to restart Triton whenever our list of active models changes. My understanding is that when we start Triton, we supply a JSON configuration file like:

--backend-config='hps,ps=/path/to/hps.json'

And this file contains the configuration for every model, like:

{
  "supportlonglong": false,
  "models": [{
    "model": "A.10",
    ...
  },{
    "model": "A.11",
    ...
  },{
    "model": "B.75",
    ...
  }]
}

I'm able to explicitly load TRT models that use the HPS plugin, but if I update the JSON file in place and attempt to make inferences on a new model named C.1, Triton reports an error:

2024/06/12 20:21:43 [HCTR][20:21:43.298][ERROR][RK0][tid #139986607271936]: Cannot find the model C.1 in HPS

Is there a way to supply new model configurations once Triton is already running, and to remove configurations that are no longer relevant? I was optimistic after reading this, which mentions "adding the configuration of a new model to the HPS configuration file," but I'm not sure how to do that.

Thanks!

@dmac dmac added the question Further information is requested label Jun 12, 2024
@yingcanw
Copy link
Collaborator

@dmac
If I understand correctly, you are using the Trition TRT backend to deploy the model with the HPS TRT plugin. If so, the online update feature is only supported in the HPS Triton backend, because the HPS configuration is loaded when the HPS is initialized. For online support of HPS configuration updates, the TRT backend needs to support re-parsing the latest configuration logic like here

@dmac
Copy link
Author

dmac commented Jun 13, 2024

If I understand correctly, you are using the Trition TRT backend to deploy the model with the HPS TRT plugin.

Correct.

For online support of HPS configuration updates, the TRT backend needs to support re-parsing the latest configuration logic like here

I see, thanks. Do you know if that feature is planned or has been discussed before? Or is this question better directed at the Triton/TensorRT issues page?

Have you seen anyone using a dynamic, constantly changing HPS configuration like this before, either with the HPS backend, or the TRT backend with the HPS TRT plugin?

@dmac
Copy link
Author

dmac commented Jun 13, 2024

One more question: do models served by the Triton HPS backend need to be trained with the HugeCTR framework? Or is there a way to convert an existing ONNX model to a format that is able to be served by the HPS backend? (Currently we train a model with PyTorch, export to ONNX, then convert to TRT.)

@yingcanw
Copy link
Collaborator

Have you seen anyone using a dynamic, constantly changing HPS configuration like this before, either with the HPS backend, or the TRT backend with the HPS TRT plugin?

Yes, there are users who have such a requirement, so we support the feature of online updating or unloading models in HPS Triton backend. However, such users mainly use HPS as an independent embedding query service, such as building an HPS+TF/TRT inference pipeline using Triton the ensemble mode, so that they can independently control the online update of the embedding table in HPS backend.

One more question: do models served by the Triton HPS backend need to be trained with the HugeCTR framework? Or is there a way to convert an existing ONNX model to a format that is able to be served by the HPS backend? (Currently we train a model with PyTorch, export to ONNX, then convert to TRT.)

The answer is not necessary, the input format of HPS can be found here here,If you need to convert the torch embedding model to hps format, you can refer to the example.

@dmac
Copy link
Author

dmac commented Jun 13, 2024

Ok, it sounds like using an ensemble model with the embeddings looked up from the HPS backend instead of using the HPS TRT plugin is probably what we want. Thanks for the advice.

@dmac dmac closed this as completed Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants