[Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT? #451

dmac · 2024-06-12T22:41:36Z

My company runs a Triton fleet that serves multiple models which are continuously retrained and reloaded onto Triton using Model Control Mode EXPLICIT. These are organized into a few different model types, but our convention is that every instance of a model is given a unique name and we don't use the Triton notion of versions.

For example, we have model A training on some cadence and model B training on some cadence, and the latest version of each is loaded on Triton as soon as it's done training. A specific instance of a model is named like "A.1", and when the next iteration of that model is trained, it will be named "A.2", etc. We might also have multiple versions for each type of model loaded onto Triton to support clean transitions from one version of a model to the next. So, at a given moment in time, we might have the following models loaded: A.10, A.11, B.75.

In this example, say we introduce a brand new type of model called C. We also are able to retire model A.10 because it's no longer receiving requests. So at another moment in time we might have these models loaded: A.11, B.75, C.1.

I'm investigating whether it's possible to use the HPS plugin for TensorRT with our existing architecture, without needing to restart Triton whenever our list of active models changes. My understanding is that when we start Triton, we supply a JSON configuration file like:

--backend-config='hps,ps=/path/to/hps.json'

And this file contains the configuration for every model, like:

{
  "supportlonglong": false,
  "models": [{
    "model": "A.10",
    ...
  },{
    "model": "A.11",
    ...
  },{
    "model": "B.75",
    ...
  }]
}

I'm able to explicitly load TRT models that use the HPS plugin, but if I update the JSON file in place and attempt to make inferences on a new model named C.1, Triton reports an error:

2024/06/12 20:21:43 [HCTR][20:21:43.298][ERROR][RK0][tid #139986607271936]: Cannot find the model C.1 in HPS

Is there a way to supply new model configurations once Triton is already running, and to remove configurations that are no longer relevant? I was optimistic after reading this, which mentions "adding the configuration of a new model to the HPS configuration file," but I'm not sure how to do that.

Thanks!

The text was updated successfully, but these errors were encountered:

yingcanw · 2024-06-13T02:12:41Z

@dmac
If I understand correctly, you are using the Trition TRT backend to deploy the model with the HPS TRT plugin. If so, the online update feature is only supported in the HPS Triton backend, because the HPS configuration is loaded when the HPS is initialized. For online support of HPS configuration updates, the TRT backend needs to support re-parsing the latest configuration logic like here

dmac · 2024-06-13T04:47:08Z

If I understand correctly, you are using the Trition TRT backend to deploy the model with the HPS TRT plugin.

Correct.

For online support of HPS configuration updates, the TRT backend needs to support re-parsing the latest configuration logic like here

I see, thanks. Do you know if that feature is planned or has been discussed before? Or is this question better directed at the Triton/TensorRT issues page?

Have you seen anyone using a dynamic, constantly changing HPS configuration like this before, either with the HPS backend, or the TRT backend with the HPS TRT plugin?

dmac · 2024-06-13T05:14:41Z

One more question: do models served by the Triton HPS backend need to be trained with the HugeCTR framework? Or is there a way to convert an existing ONNX model to a format that is able to be served by the HPS backend? (Currently we train a model with PyTorch, export to ONNX, then convert to TRT.)

yingcanw · 2024-06-13T07:11:11Z

Have you seen anyone using a dynamic, constantly changing HPS configuration like this before, either with the HPS backend, or the TRT backend with the HPS TRT plugin?

Yes, there are users who have such a requirement, so we support the feature of online updating or unloading models in HPS Triton backend. However, such users mainly use HPS as an independent embedding query service, such as building an HPS+TF/TRT inference pipeline using Triton the ensemble mode, so that they can independently control the online update of the embedding table in HPS backend.

One more question: do models served by the Triton HPS backend need to be trained with the HugeCTR framework? Or is there a way to convert an existing ONNX model to a format that is able to be served by the HPS backend? (Currently we train a model with PyTorch, export to ONNX, then convert to TRT.)

The answer is not necessary, the input format of HPS can be found here here，If you need to convert the torch embedding model to hps format, you can refer to the example.

dmac · 2024-06-13T21:52:58Z

Ok, it sounds like using an ensemble model with the embeddings looked up from the HPS backend instead of using the HPS TRT plugin is probably what we want. Thanks for the advice.

dmac added the question Further information is requested label Jun 12, 2024

dmac closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT? #451

[Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT? #451

dmac commented Jun 12, 2024

yingcanw commented Jun 13, 2024

dmac commented Jun 13, 2024

dmac commented Jun 13, 2024

yingcanw commented Jun 13, 2024

dmac commented Jun 13, 2024

[Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT? #451

[Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT? #451

Comments

dmac commented Jun 12, 2024

yingcanw commented Jun 13, 2024

dmac commented Jun 13, 2024

dmac commented Jun 13, 2024

yingcanw commented Jun 13, 2024

dmac commented Jun 13, 2024