triton-inference-server · yingcanw · Oct 25, 2022 · Feb 15, 2022 · Feb 15, 2022 · Feb 15, 2022
diff --git a/README.md b/README.md
@@ -56,7 +56,7 @@ All NVIDIA Merlin components are available as open-source projects. However, a m
 
 Docker images for the HugeCTR Backend are available in the NVIDIA container repository on https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr. You can pull and launch the container by running the following command:
 ```
-docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.09  # Start interaction mode  
+docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.10  # Start interaction mode  
 ```
 
 **NOTE**: As of HugeCTR version 3.0, the HugeCTR container is no longer being released separately. If you're an advanced user, you should use the unified Merlin container to build the HugeCTR Training or Inference Docker image from scratch based on your own specific requirements. You can obtain the unified Merlin container by logging into NGC or by going [here](https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr). 
@@ -85,7 +85,7 @@ After you've built HugeCTR from scratch, do the following:
    $ make install
    ```
 
-   **NOTE**: Where <rxx.yy> is the version of Triton that you want to deploy, like `r22.05`. Please remember to specify the absolute path of the local directory that installs the HugeCTR Backend for the `--backend-directory` argument when launching the Triton server.
+   **NOTE**: Where <rxx.yy> is the version of Triton that you want to deploy, like `r22.06`. Please remember to specify the absolute path of the local directory that installs the HugeCTR Backend for the `--backend-directory` argument when launching the Triton server.
 
    The following Triton repositories, which are required, will be pulled and used in the build. By default, the "main" branch/tag will be used for each repository. However, the 
    following cmake arguments can be used to override the "main" branch/tag:

diff --git a/hps_backend/README.md b/hps_backend/README.md
@@ -56,7 +56,7 @@ All NVIDIA Merlin components are available as open-source projects. However, a m
 
 Docker images for the HPS Backend are available in the NVIDIA container repository on https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr. You can pull and launch the container by running the following command:
 ```
-docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.09 # Start interaction mode  
+docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.10 # Start interaction mode  
 ```
 
 **NOTE**: The HPS backend is derived from the HugeCTR backend. As of HugeCTR version 3.0, the HugeCTR container is no longer being released separately. If you're an advanced user, you should use the unified Merlin container to build the HugeCTR Training or Inference Docker image from scratch based on your own specific requirements. You can obtain the unified Merlin container by logging into NGC or by going [here](https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr). 

diff --git a/hps_backend/src/model_instance_state.cpp b/hps_backend/src/model_instance_state.cpp
@@ -157,13 +157,13 @@ ModelInstanceState::ProcessRequest(std::vector<size_t> num_keys_per_table)
       lookup_result_buf->get_ptr()};
 
   for (size_t index = 0; index < num_keys_per_table.size() - 1; ++index) {
+    const void* current_key_ptr = keys_per_table.back();
     keys_per_table.push_back(reinterpret_cast<const void*>(
-        (long long*)cat_column_index_buf_int64->get_raw_ptr() +
-        num_keys_per_table[index]));
+        (long long*)current_key_ptr + num_keys_per_table[index]));
+    float* current_out_ptr = lookup_buffer_offset_per_table.back();
     lookup_buffer_offset_per_table.push_back(
-        lookup_result_buf->get_ptr() +
-        instance_params_.embedding_vecsize_per_table[index] *
-            num_keys_per_table[index]);
+        current_out_ptr + instance_params_.embedding_vecsize_per_table[index] *
+                              num_keys_per_table[index]);
   }
   lookupsession_->lookup(
       keys_per_table, lookup_buffer_offset_per_table, num_keys_per_table);

diff --git a/samples/README.md b/samples/README.md
@@ -48,12 +48,12 @@ You can pull the `Merlin-Training` container by running the following command:
 DLRM model traning:
 
 ```
-docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/dlrm_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash
+docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/dlrm_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash
 ```
 
 Wide&Deep model training:
 ```
-docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/wdl_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash
+docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/wdl_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash
 ```
 
 The container will open a shell when the run command execution is completed. You'll have to start the jupyter lab on the Docker container. It should look similar to this:

diff --git a/samples/dlrm/HugeCTR_DLRM_Inference.ipynb b/samples/dlrm/HugeCTR_DLRM_Inference.ipynb
@@ -314,7 +314,7 @@
     "\n",
     "In this tutorial, we will deploy the DLRM to a single V100(32GB)\n",
     "\n",
-    "docker run --gpus=all -it -v /dlrm_infer/:/dlrm_infer -v /dlrm_train/:/dlrm_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash\n",
+    "docker run --gpus=all -it -v /dlrm_infer/:/dlrm_infer -v /dlrm_train/:/dlrm_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash\n",
     "\n",
     "After you enter into the container you can launch triton server with the command below:\n",
     "\n",

diff --git a/samples/hierarchical_deployment/README.md b/samples/hierarchical_deployment/README.md
@@ -144,7 +144,7 @@ mkdir -p wdl_infer
 
 Wide&Deep model inference container:
 ```
-docker run -it --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --net=host -v wdl_infer:/wdl_infer/ -v wdl_train:/wdl_train/ -v your_rocksdb_path:/wdl_infer/rocksdb/ nvcr.io/nvidia/merlin/merlin-hugectr:22.09
+docker run -it --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --net=host -v wdl_infer:/wdl_infer/ -v wdl_train:/wdl_train/ -v your_rocksdb_path:/wdl_infer/rocksdb/ nvcr.io/nvidia/merlin/merlin-hugectr:22.10
 ```
 The container will open a shell when the run command execution is completed. It should look similar to this:
 ```

diff --git a/samples/wdl/HugeCTR_WDL_Inference.ipynb b/samples/wdl/HugeCTR_WDL_Inference.ipynb
@@ -303,7 +303,7 @@
     "\n",
     "In this tutorial, we will deploy the Wide&Deep to a single A100(32GB)\n",
     "\n",
-    "docker run --gpus=all -it -v /wdl_infer/:/wdl_infer -v /wdl_train/:/wdl_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash\n",
+    "docker run --gpus=all -it -v /wdl_infer/:/wdl_infer -v /wdl_train/:/wdl_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash\n",
     "After you enter into the container you can launch triton server with the command below:\n",
     "\n",
     "tritonserver --model-repository=/wdl_infer/model/ --load-model=wdl \n",