Push V22.10 integration to main branch (#51)

* support multi-hot cat input * Update CI.DockerFile to latest v3.5-integration * support standalone hps lib * change to new library name * Build the backbone of the hps backend * finish backbone building * make hugectr backend compatible with new HPS * Modify hugectr inference backend to adapt to new HPS * update wdl model tutorial * bug fix * reset other notebooks * model replacement * delete checkoutpoint files * Modify hps backend to adapt to new HPS * support single embedding table lookup for hps backend * add triton helpers file * support new hps ctor API * support multi-tables embedding keys query * add a brief introduction about HPS and use cases * Update CI.DockerFile with hugectr master branch * Fix typos * fix docker run command * Add detailed introduction about hps interface configuration, etc. * resize image * Upload configuration management poc demo * add UI figure * Config sys demo comments * fix issue about disable ec cause crash * add test case for switch off ec * Fix links in README * delete cm related * update hps docs * Change inference log info to verbose * simplify the hugectr backend configuration * Update CI.DockerFile * [ready for merge] Adjustments for new HashMap implementation. * update docs and samples * Add model online-update introduction * modified the doc of hps backend building * modified the doc of hps backend building * modified the doc of hps backend building * modified the doc of hps backend building * modified the doc of hps backend building * draft for hps TIS backend demo * modified the doc of hps backend building * modified the doc of hps backend building * change CI branch to v4.0 * hps demo update * hps infer testing * temp upload * Delete old HugeCTR_Online_Update.png * Upload New hugectr update flow * update CI image * update CI yml * triton model ensemble for hps + tf backends * Removed tempary folder, add hps-triton-ensemble examples * remove jupyter checkpoint files * move notebooks to sample folder * remove example folder * Modify the links iin samples * Change docker images tag * Merge V4.0 with main branch * fix hps batchsize issue * Update CI.DockerFile with v22.09 * parse new congifuration items from ps json * update the container tag for 22.09 * merge Main with hugectr performance test branch * fix multi-table lookup result overlap issue * update container tag Co-authored-by: kingsleyl <kingsleyl@nvidia.com> Co-authored-by: zhuwenjing <zhuwenjing@360.cn> Co-authored-by: Joey Wang <zehuanw@nvidia.com> Co-authored-by: Jerry Shi <jershi@nvidia.com> Co-authored-by: Matthias Langer <mlanger@nvidia.com> Co-authored-by: vgong <vgong@nvidia.com>
triton-inference-server · Oct 25, 2022 · 6157dec · 6157dec
1 parent 579e12f
commit 6157dec
Show file tree

Hide file tree

Showing 7 changed files with 13 additions and 13 deletions.
diff --git a/README.md b/README.md
@@ -56,7 +56,7 @@ All NVIDIA Merlin components are available as open-source projects. However, a m
 
 Docker images for the HugeCTR Backend are available in the NVIDIA container repository on https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr. You can pull and launch the container by running the following command:
 ```
-docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.09  # Start interaction mode  
+docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.10  # Start interaction mode  
 ```
 
 **NOTE**: As of HugeCTR version 3.0, the HugeCTR container is no longer being released separately. If you're an advanced user, you should use the unified Merlin container to build the HugeCTR Training or Inference Docker image from scratch based on your own specific requirements. You can obtain the unified Merlin container by logging into NGC or by going [here](https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr). 
@@ -85,7 +85,7 @@ After you've built HugeCTR from scratch, do the following:
    $ make install
    ```
 
-   **NOTE**: Where <rxx.yy> is the version of Triton that you want to deploy, like `r22.05`. Please remember to specify the absolute path of the local directory that installs the HugeCTR Backend for the `--backend-directory` argument when launching the Triton server.
+   **NOTE**: Where <rxx.yy> is the version of Triton that you want to deploy, like `r22.06`. Please remember to specify the absolute path of the local directory that installs the HugeCTR Backend for the `--backend-directory` argument when launching the Triton server.
 
    The following Triton repositories, which are required, will be pulled and used in the build. By default, the "main" branch/tag will be used for each repository. However, the 
    following cmake arguments can be used to override the "main" branch/tag:

diff --git a/hps_backend/README.md b/hps_backend/README.md
@@ -56,7 +56,7 @@ All NVIDIA Merlin components are available as open-source projects. However, a m
 
 Docker images for the HPS Backend are available in the NVIDIA container repository on https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr. You can pull and launch the container by running the following command:
 ```
-docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.09 # Start interaction mode  
+docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.10 # Start interaction mode  
 ```
 
 **NOTE**: The HPS backend is derived from the HugeCTR backend. As of HugeCTR version 3.0, the HugeCTR container is no longer being released separately. If you're an advanced user, you should use the unified Merlin container to build the HugeCTR Training or Inference Docker image from scratch based on your own specific requirements. You can obtain the unified Merlin container by logging into NGC or by going [here](https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr). 

diff --git a/hps_backend/src/model_instance_state.cpp b/hps_backend/src/model_instance_state.cpp
@@ -157,13 +157,13 @@ ModelInstanceState::ProcessRequest(std::vector<size_t> num_keys_per_table)
       lookup_result_buf->get_ptr()};
 
   for (size_t index = 0; index < num_keys_per_table.size() - 1; ++index) {
+    const void* current_key_ptr = keys_per_table.back();
     keys_per_table.push_back(reinterpret_cast<const void*>(
-        (long long*)cat_column_index_buf_int64->get_raw_ptr() +
-        num_keys_per_table[index]));
+        (long long*)current_key_ptr + num_keys_per_table[index]));
+    float* current_out_ptr = lookup_buffer_offset_per_table.back();
     lookup_buffer_offset_per_table.push_back(
-        lookup_result_buf->get_ptr() +
-        instance_params_.embedding_vecsize_per_table[index] *
-            num_keys_per_table[index]);
+        current_out_ptr + instance_params_.embedding_vecsize_per_table[index] *
+                              num_keys_per_table[index]);
   }
   lookupsession_->lookup(
       keys_per_table, lookup_buffer_offset_per_table, num_keys_per_table);

diff --git a/samples/README.md b/samples/README.md
@@ -48,12 +48,12 @@ You can pull the `Merlin-Training` container by running the following command:
 DLRM model traning:
 
 ```
-docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/dlrm_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash
+docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/dlrm_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash
 ```
 
 Wide&Deep model training:
 ```
-docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/wdl_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash
+docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/wdl_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash
 ```
 
 The container will open a shell when the run command execution is completed. You'll have to start the jupyter lab on the Docker container. It should look similar to this:

diff --git a/samples/dlrm/HugeCTR_DLRM_Inference.ipynb b/samples/dlrm/HugeCTR_DLRM_Inference.ipynb
@@ -314,7 +314,7 @@
     "\n",
     "In this tutorial, we will deploy the DLRM to a single V100(32GB)\n",
     "\n",
-    "docker run --gpus=all -it -v /dlrm_infer/:/dlrm_infer -v /dlrm_train/:/dlrm_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash\n",
+    "docker run --gpus=all -it -v /dlrm_infer/:/dlrm_infer -v /dlrm_train/:/dlrm_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash\n",
     "\n",
     "After you enter into the container you can launch triton server with the command below:\n",
     "\n",

diff --git a/samples/hierarchical_deployment/README.md b/samples/hierarchical_deployment/README.md
@@ -144,7 +144,7 @@ mkdir -p wdl_infer
 
 Wide&Deep model inference container:
 ```
-docker run -it --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --net=host -v wdl_infer:/wdl_infer/ -v wdl_train:/wdl_train/ -v your_rocksdb_path:/wdl_infer/rocksdb/ nvcr.io/nvidia/merlin/merlin-hugectr:22.09
+docker run -it --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --net=host -v wdl_infer:/wdl_infer/ -v wdl_train:/wdl_train/ -v your_rocksdb_path:/wdl_infer/rocksdb/ nvcr.io/nvidia/merlin/merlin-hugectr:22.10
 ```
 The container will open a shell when the run command execution is completed. It should look similar to this:
 ```

diff --git a/samples/wdl/HugeCTR_WDL_Inference.ipynb b/samples/wdl/HugeCTR_WDL_Inference.ipynb
@@ -303,7 +303,7 @@
     "\n",
     "In this tutorial, we will deploy the Wide&Deep to a single A100(32GB)\n",
     "\n",
-    "docker run --gpus=all -it -v /wdl_infer/:/wdl_infer -v /wdl_train/:/wdl_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash\n",
+    "docker run --gpus=all -it -v /wdl_infer/:/wdl_infer -v /wdl_train/:/wdl_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash\n",
     "After you enter into the container you can launch triton server with the command below:\n",
     "\n",
     "tritonserver --model-repository=/wdl_infer/model/ --load-model=wdl \n",