Skip to content

Releases: triton-inference-server/hugectr_backend

Merlin: HugeCTR Backend V3.4.1

01 Mar 13:23
fc9ddc4
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.4.1

  • Support HDFS Parameter Server in Training:

    • Decoupled HDFS in Merlin containers to make the HDFS support more flexible. Users can now compile HDFS related functionalities optionally.
    • Now supports loading and dumping models and optimizer states from HDFS.
    • Added a notebook to show how to use HugeCTR with HDFS.
  • Support Multi-hot Inference on Hugectr Backend: We support categorical input in multi-hot format for HugeCTR Backend inference.

Merlin: HugeCTR Backend V3.4

28 Jan 08:49
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.4

  • Hierarchical Parameter Server Enhancements:
    • Missing key insertion feature: Via a simple flag, it is now possible to configure HugeCTR such that missed embedding-table entries during lookup are automatically inserted into volatile database layers such as the Redis and Hashmap backends.
    • Asynchronous timestamp refresh: In the last release we introduced the passing-of-time-aware eviction policies. These are policies that are applied to shrink database partitions through dropping keys if they grow beyond certain limits. However, the time-information utilized by these eviction policies represented the update time. Hence, an embedding was evicted based on the time passed since its last update. If you operate HugeCTR in inference mode, the embedding table is typically immutable. With the above-described missing key insertion feature we now support actively tuning the contents of volatile database layers to the data distribution during lookup. To allow time-based eviction to take place, it is now possible to enable timestamp refreshing for frequently used embeddings. Once enabled, refreshing is handled asynchronously using background threads. Hence, it won’t block your inference jobs. For most applications, the associated performance impact from enabling this feature is barely noticeable.
    • Support HDFS Parameter Server in Training
    1. A new Python API DataSourceParams used to specify the file system and paths to data and model files.
    2. Support loading data from HDFS to the local file system for HugeCTR training.
    3. Support dumping trained model and optimizer states into HDFS.
    • Online seamless update of the parameters of the dense part of the model: HugeCTR Backend has supported online model version updating by the Load API of Triton (including the seamless update of the dense part and corresponding embedding inference cache for the same model), and the Load API is still fully compatible with online deployment of new models.

Merlin: HugeCTR Backend V3.3.1

10 Jan 06:37
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.3.1

  • Hierarchical Parameter Server Enhancements:
    • Online deployment of new models and recycling of old models: In this release, HugeCTR Backend is fully compatible with the model control protocol of Triton. Adding the configuration of a new model to the HPS configuration file. The HugeCTR Backend has supported online deployment of new models by the Load API of Triton. The old models can also be recycled online by the Unload API.
    • Simplified database backend: The Multi-nodes, single-node and all other kinds of volatile database backends can now be configured using the same configuration object.
    • Multi-threaded optimization of Redis code (~2.3x speedup up over HugeCTR v3.3)
    • Fix some issues:
    1. Build HPS test environment and implement unit test of each component
    2. Access violation issue of online Kafka updates
    3. Parquet data reader incorrectly parses the index of categorical features in the case of multiple embedded tables
    4. HPS Redis Backend overflow handling not invoked upon single insertions.

Merlin: HugeCTR Backend V3.3

08 Dec 03:39
224a5d9
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.3

  • Hierarchical Parameter Server:
    • Support Incremental Models Updating From Online Training: HPS now supports iterative model updating via Kafka message queues. It is now possible to connect HugeCTR with Apache Kafka deployments to update the model in-place in real-time. This feature is supported in both phases, training and inference. Please refer to the Demo Notebok.
    • Support Embedding keys Eviction Mechanism: In-memory databases such as Redis or CPU memory backed storage are used now as the feature memory management. Hence, when performing iterative updating, they will automatically evict infrequently used embeddings as training progresses.
    • Support Embedding Cache Asynchronous Refresh Mechanism: We have supported the asynchronous refreshing of incremental embedding keys into the embedding cache. Refresh operation will be triggered when completing the model version iteration or incremental parameters output from online training. The Distributed Database and Persistent Database will be updated by the distributed event streaming platform(Kafka). And then the GPU embedding cache will refresh the values of the existing embedding keys and replace them with the latest incremental embedding vectors. Please refer to the HPS README.
    • Other Improvements: We have added support for multiple database interfaces to our parameter server. In particular, we added an “in memory” database, that utilizes the local CPU memory for storing and recalling embeddings and uses multi-threading to accelerate look-up and storage. Further, we revised support for “distributed” storage of embeddings in a Redis cluster. This way, you can use the combined CPU-accessible memory of your cluster for storing embeddings. The new implementation is up over two orders of magnitude faster than the previous. Further, we performance-optimized support for the “persistent” storage and retrieval of embeddings via RocksDB through the structured use of column families. Creating a hierarchical storage (i.e. using Redis as distributed cache, and RocksDB as fallback), is supported as well. These advantages are free to end-users, as there is no need to adjust the PS configuration.
      We plan to further integrate the hierarchical parameter server with other features, such as the GPU backed embedding caches in upcoming releases. Stay tuned!

Merlin: HugeCTR Backend V3.2.1

23 Nov 02:18
e82771d
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.2.1

  • Embedding cache asynchronous insertion mechanism . We have supported the asynchronous insertion of missing embedding keys into the embedding cache. This feature can be activated automatically through user-defined hit rate threshold in configuration file.When the real hit rate of the embedding cache is higher than the user-defined threshold, the embedding cache will insert the missing key asynchronously, and vice versa, it will still be inserted in a synchronous way to ensure high accuracy of inference requests. Through the asynchronous insertion method, compared with the previous synchronous method, the real hit rate of the embedding cache can be further improved after the embedding cache reaches the user-defined threshold.

Merlin: HugeCTR Backend V3.2

22 Sep 07:31
db826e1
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.2

  • HugeCTR Hierarchical Parameter Server POC: In this release, HugeCTR backend implemented a hierarchical storage mechanism between local SSDs and CPU memory, which breaks the convention that the embedding table must be stored in local CPU memory. The distributed Redis cluster is introduced as a CPU cache to store larger embedding tables and interact with the GPU embedding cache directly. The local RocksDB serves as a query engine to back up the complete embedding table on the local SSDs in order to assist the Redis cluster to perform missing embedding keys look up. see Distributed Deployment for more details. We also provide a new sample to show how to deploy the Hierarchical Parameter Server on the Triton platform.

Merlin: HugeCTR Backend V3.1

30 Jul 02:11
7a49582
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.1

  • Independent Parameter Server Configuration: In this release, HugeCTR backend has decoupled the Parameter Server-related configuration from the Triton configuration file(config.pbtxt), making it easier to configure the embedding table-related parameters per model. Especially for the configuration of multiple embedded tables per model, avoid too many command parameters when launching the Triton server.

  • Metrix: To Indicate GPU and request statistics, use Prometheus to gather metrics into usable, actionable entries, giving you the data you need to manage alerts and performance information in your environment.

  • Multiple Embedding Tables Model Deployment: To enhance the scalability of different number of embedding tables per model requirements, HugeCTR backend has supported the multiple embedding tables per model deployment.

Merlin: HugeCTR Backend V3.0.1

12 Apr 03:20
65f4be7
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.0.1

  • End-to-End Tutorial: In this release, we provide two tutorials on how to train a standard DLRM model using HugeCTR High-level python API and how to deploy the standard DLRM mode to Triton Inference Server using HugeCTR Backend. These tutorials explain the steps to train and inference with HugeCTR and NVTabular with the Merlin framework. The users could collect the inference benchmark by Triton performance analyzer tool.

  • Enhanced Model Scalability: To enhance the scalability of different recommendation model input requirements, we support input data without dense features.

  • Enhanced Model Control API: Fixed the issue that Parameter Server could not release GPU memory when using Triton model control API to unload the model at runtime.

Merlin: HugeCTR Backend V3.0

09 Mar 07:12
29ad9a3
Compare
Choose a tag to compare

Release Notes

What’s New in Version 3.0

  • Hierarchical Framework: The design of HugeCTR adopts a hierarchical framework, decoupling the model weight and embedding table, the GPU cache is used to accelerate the embedding vector look up efficiency in the inference process.
  • Concurrent Model Execution: Multiple models (or multiple instances of the same model) can run simultaneously on the same GPU or on multiple GPUs.
  • Extensible Backends: The inference interface provided by HugeCTR is based on a hierarchical framework, which can be easily integrated with the backend API that allows models to be extended with any execution logic implemented in Python or C++.
  • Easy Deployment of New Models: Updating a model should be as transparent as possible and shouldn’t affect inference performance. This means that no matter how many models need to be deployed, as long as it is a model trained by HugeCTR, it can be loaded through the same HugeCTR backend API. The user only needs to change configuration files for different models.