Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push V22.10 integration to main branch #51

Merged
merged 103 commits into from
Oct 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
ac36454
support multi-hot cat input
yingcanw Feb 15, 2022
3996ae7
Update CI.DockerFile to latest v3.5-integration
yingcanw Feb 15, 2022
448f8fc
Merge branch 'yingcan_integration' into 'v3.5-integration'
yingcanw Feb 15, 2022
4c2f4b7
support standalone hps lib
yingcanw Feb 24, 2022
c37deeb
Merge branch 'v3.5-integration' into 'main'
yingcanw Mar 1, 2022
56b72ac
change to new library name
KingsleyLiu-NV Mar 2, 2022
1dcb4f4
Merge branch 'yingcan_integration' into 'hugectr_performance_test'
KingsleyLiu-NV Mar 2, 2022
0a00b05
Merge branch 'hugectr_performance_test' into 'v3.5-integration'
yingcanw Mar 8, 2022
291153f
Merge branch 'v3.5-integration' of https://gitlab-master.nvidia.com/d…
yingcanw Mar 8, 2022
1056a7c
Build the backbone of the hps backend
yingcanw Mar 8, 2022
f1947a2
finish backbone building
yingcanw Mar 10, 2022
c8932e1
make hugectr backend compatible with new HPS
KingsleyLiu-NV Mar 16, 2022
018839e
Modify hugectr inference backend to adapt to new HPS
KingsleyLiu-NV Mar 16, 2022
5739700
Merge branch 'new-hps-interface' into 'hugectr_performance_test'
KingsleyLiu-NV Mar 16, 2022
98ffc5c
update wdl model tutorial
Mar 17, 2022
d944197
bug fix
Mar 17, 2022
0dff0eb
reset other notebooks
Mar 17, 2022
a9e60e2
model replacement
Mar 17, 2022
2717b1c
Merge branch 'v3.5-integration' into add_update_model_tutorial
Mar 17, 2022
b9ca0d7
Merge branch 'add_update_model_tutorial' of https://gitlab-master.nvi…
yingcanw Mar 18, 2022
0f053f3
delete checkoutpoint files
yingcanw Mar 18, 2022
8f6d0d3
Merge branch 'add_update_model_tutorial' into 'v3.5-integration'
yingcanw Mar 18, 2022
e545bf3
Modify hps backend to adapt to new HPS
yingcanw Mar 21, 2022
8d439d6
support single embedding table lookup for hps backend
yingcanw Mar 22, 2022
9496182
add triton helpers file
yingcanw Mar 22, 2022
70a8493
support new hps ctor API
yingcanw Mar 24, 2022
f08e84e
support multi-tables embedding keys query
yingcanw Mar 28, 2022
e1e57da
Merge branch 'hps_backend' into 'v3.5-integration'
yingcanw Apr 1, 2022
7c74721
add a brief introduction about HPS and use cases
yingcanw Apr 1, 2022
47f671b
Update CI.DockerFile with hugectr master branch
yingcanw Apr 1, 2022
b141a97
Merge branch 'hps_backend' into 'v3.5-integration'
yingcanw Apr 1, 2022
3a7f23e
Merge branch 'v3.5-integration' into 'main'
yingcanw Apr 1, 2022
8a7db08
Fix typos
yingcanw Apr 1, 2022
94cf54e
Merge branch 'hugectr_performance_test' of https://gitlab-master.nvid…
yingcanw Apr 1, 2022
ca4e23b
fix docker run command
yingcanw Apr 1, 2022
047f172
Merge branch 'main' into hugectr_performance_test
yingcanw Apr 1, 2022
d93bd4c
Add detailed introduction about hps interface configuration, etc.
yingcanw Apr 4, 2022
83b858d
resize image
yingcanw Apr 4, 2022
7496784
Upload configuration management poc demo
yingcanw Apr 11, 2022
e5db8fe
add UI figure
yingcanw Apr 11, 2022
86ec940
Config sys demo comments
zehuanw Apr 11, 2022
8383303
Merge branch 'config_sys_demo_comments' into 'config_sys_demo'
yingcanw Apr 11, 2022
177cd2c
fix issue about disable ec cause crash
yingcanw Apr 22, 2022
b9990dc
add test case for switch off ec
yingcanw Apr 24, 2022
ee8ac74
Fix links in README
jershi425 Apr 24, 2022
317eb56
Merge branch 'jershi-main-patch-88848' into 'main'
yingcanw Apr 24, 2022
0271ef6
delete cm related
yingcanw Apr 24, 2022
88468e1
Merge branch 'fix_hps_crash_issue' into 'main'
yingcanw Apr 24, 2022
ba4a3ea
Merge branch 'main' into hugectr_performance_test
yingcanw Apr 24, 2022
829fa59
update hps docs
yingcanw Apr 29, 2022
3748fa1
Merge branch 'fea-doc-revise' into 'main'
yingcanw Apr 29, 2022
2976050
Change inference log info to verbose
yingcanw May 23, 2022
50e1ef3
simplify the hugectr backend configuration
yingcanw May 24, 2022
7a3b0cd
Update CI.DockerFile
yingcanw May 27, 2022
08253c7
Merge branch 'v3.7-integration' into 'hugectr_performance_test'
yingcanw May 27, 2022
03ce124
[ready for merge] Adjustments for new HashMap implementation.
bashimao Jun 7, 2022
ab2ef96
Merge branch 'kafka-client-update-matthias' into 'v3.7-integration'
bashimao Jun 7, 2022
f432414
Merge branch 'v3.7-integration' into 'hugectr_performance_test'
yingcanw Jun 7, 2022
8ee3d25
update docs and samples
yingcanw Jun 13, 2022
1f4f9d6
Add model online-update introduction
yingcanw Jun 14, 2022
88ae0ed
Merge branch 'v3.7-integration' into 'main'
yingcanw Jun 14, 2022
53903fe
Merge branch 'v3.7-integration' into 'hugectr_performance_test'
yingcanw Jun 14, 2022
1cec214
modified the doc of hps backend building
Jun 16, 2022
7953824
modified the doc of hps backend building
Jun 16, 2022
0a6d461
modified the doc of hps backend building
Jun 16, 2022
461537c
modified the doc of hps backend building
Jun 16, 2022
be01453
modified the doc of hps backend building
Jun 16, 2022
b452b57
draft for hps TIS backend demo
Jun 21, 2022
a4c70f2
modified the doc of hps backend building
Jun 21, 2022
b06c2d3
modified the doc of hps backend building
Jun 21, 2022
586d16b
change CI branch to v4.0
yingcanw Jun 24, 2022
f081d25
hps demo update
Jun 24, 2022
42a90e6
hps infer testing
Jun 29, 2022
45e666b
temp upload
Jul 13, 2022
84ea459
Delete old HugeCTR_Online_Update.png
yingcanw Aug 2, 2022
e79eadd
Upload New hugectr update flow
yingcanw Aug 2, 2022
1fd36cb
update CI image
yingcanw Aug 2, 2022
41f2772
update CI yml
yingcanw Aug 2, 2022
78e4305
triton model ensemble for hps + tf backends
Aug 4, 2022
584b742
Removed tempary folder, add hps-triton-ensemble examples
Aug 10, 2022
c05431b
remove jupyter checkpoint files
Aug 10, 2022
22c039a
move notebooks to sample folder
Aug 10, 2022
cac2ea2
remove example folder
Aug 10, 2022
aa2c9c5
Merge branch 'hps_backend_model_ensemble_demo' of https://gitlab-mast…
yingcanw Aug 11, 2022
ab0e8f5
Modify the links iin samples
yingcanw Aug 11, 2022
543e81e
Change docker images tag
yingcanw Aug 11, 2022
ef2a243
Merge branch 'hps_backend_model_ensemble_demo' into 'v4.0-integration'
yingcanw Aug 11, 2022
327407d
Merge V4.0 with main branch
yingcanw Aug 17, 2022
7611a6a
Merge branch 'v4.0-integration' into 'main'
yingcanw Aug 17, 2022
217cfbb
fix hps batchsize issue
yingcanw Aug 22, 2022
de7ff89
Update CI.DockerFile with v22.09
yingcanw Aug 22, 2022
56f50bf
parse new congifuration items from ps json
yingcanw Aug 22, 2022
900ce9b
Merge branch 'v22.09-integration' of https://gitlab-master.nvidia.com…
yingcanw Aug 22, 2022
2d84ab7
update the container tag for 22.09
yingcanw Sep 14, 2022
e21e4b5
Merge branch 'main' into 'v22.09-integration'
yingcanw Sep 14, 2022
aca3003
Merge branch 'v22.09-integration' into 'main'
yingcanw Sep 14, 2022
dfd0fe6
merge Main with hugectr performance test branch
yingcanw Sep 14, 2022
052ae0b
Merge branch 'main' into 'hugectr_performance_test'
yingcanw Sep 14, 2022
7d19e57
fix multi-table lookup result overlap issue
yingcanw Oct 25, 2022
00f4c3b
Merge branch 'main' of https://gitlab-master.nvidia.com/dl/hugectr/hu…
yingcanw Oct 25, 2022
cc56360
update container tag
yingcanw Oct 25, 2022
a240831
Merge branch 'hugectr_performance_test' into 'v22.10-integration'
yingcanw Oct 25, 2022
25c8e45
fix conflicts
yingcanw Oct 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ All NVIDIA Merlin components are available as open-source projects. However, a m

Docker images for the HugeCTR Backend are available in the NVIDIA container repository on https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr. You can pull and launch the container by running the following command:
```
docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.09 # Start interaction mode
docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.10 # Start interaction mode
```

**NOTE**: As of HugeCTR version 3.0, the HugeCTR container is no longer being released separately. If you're an advanced user, you should use the unified Merlin container to build the HugeCTR Training or Inference Docker image from scratch based on your own specific requirements. You can obtain the unified Merlin container by logging into NGC or by going [here](https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr).
Expand Down Expand Up @@ -85,7 +85,7 @@ After you've built HugeCTR from scratch, do the following:
$ make install
```

**NOTE**: Where <rxx.yy> is the version of Triton that you want to deploy, like `r22.05`. Please remember to specify the absolute path of the local directory that installs the HugeCTR Backend for the `--backend-directory` argument when launching the Triton server.
**NOTE**: Where <rxx.yy> is the version of Triton that you want to deploy, like `r22.06`. Please remember to specify the absolute path of the local directory that installs the HugeCTR Backend for the `--backend-directory` argument when launching the Triton server.

The following Triton repositories, which are required, will be pulled and used in the build. By default, the "main" branch/tag will be used for each repository. However, the
following cmake arguments can be used to override the "main" branch/tag:
Expand Down
2 changes: 1 addition & 1 deletion hps_backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ All NVIDIA Merlin components are available as open-source projects. However, a m

Docker images for the HPS Backend are available in the NVIDIA container repository on https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr. You can pull and launch the container by running the following command:
```
docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.09 # Start interaction mode
docker run --gpus=1 --rm -it nvcr.io/nvidia/merlin/merlin-hugectr:22.10 # Start interaction mode
```

**NOTE**: The HPS backend is derived from the HugeCTR backend. As of HugeCTR version 3.0, the HugeCTR container is no longer being released separately. If you're an advanced user, you should use the unified Merlin container to build the HugeCTR Training or Inference Docker image from scratch based on your own specific requirements. You can obtain the unified Merlin container by logging into NGC or by going [here](https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr).
Expand Down
10 changes: 5 additions & 5 deletions hps_backend/src/model_instance_state.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -157,13 +157,13 @@ ModelInstanceState::ProcessRequest(std::vector<size_t> num_keys_per_table)
lookup_result_buf->get_ptr()};

for (size_t index = 0; index < num_keys_per_table.size() - 1; ++index) {
const void* current_key_ptr = keys_per_table.back();
keys_per_table.push_back(reinterpret_cast<const void*>(
(long long*)cat_column_index_buf_int64->get_raw_ptr() +
num_keys_per_table[index]));
(long long*)current_key_ptr + num_keys_per_table[index]));
float* current_out_ptr = lookup_buffer_offset_per_table.back();
lookup_buffer_offset_per_table.push_back(
lookup_result_buf->get_ptr() +
instance_params_.embedding_vecsize_per_table[index] *
num_keys_per_table[index]);
current_out_ptr + instance_params_.embedding_vecsize_per_table[index] *
num_keys_per_table[index]);
}
lookupsession_->lookup(
keys_per_table, lookup_buffer_offset_per_table, num_keys_per_table);
Expand Down
4 changes: 2 additions & 2 deletions samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,12 @@ You can pull the `Merlin-Training` container by running the following command:
DLRM model traning:

```
docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/dlrm_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash
docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/dlrm_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash
```

Wide&Deep model training:
```
docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/wdl_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash
docker run --gpus=all -it --cap-add SYS_NICE -v ${PWD}:/wdl_train/ --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash
```

The container will open a shell when the run command execution is completed. You'll have to start the jupyter lab on the Docker container. It should look similar to this:
Expand Down
2 changes: 1 addition & 1 deletion samples/dlrm/HugeCTR_DLRM_Inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@
"\n",
"In this tutorial, we will deploy the DLRM to a single V100(32GB)\n",
"\n",
"docker run --gpus=all -it -v /dlrm_infer/:/dlrm_infer -v /dlrm_train/:/dlrm_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash\n",
"docker run --gpus=all -it -v /dlrm_infer/:/dlrm_infer -v /dlrm_train/:/dlrm_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash\n",
"\n",
"After you enter into the container you can launch triton server with the command below:\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion samples/hierarchical_deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ mkdir -p wdl_infer

Wide&Deep model inference container:
```
docker run -it --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --net=host -v wdl_infer:/wdl_infer/ -v wdl_train:/wdl_train/ -v your_rocksdb_path:/wdl_infer/rocksdb/ nvcr.io/nvidia/merlin/merlin-hugectr:22.09
docker run -it --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --net=host -v wdl_infer:/wdl_infer/ -v wdl_train:/wdl_train/ -v your_rocksdb_path:/wdl_infer/rocksdb/ nvcr.io/nvidia/merlin/merlin-hugectr:22.10
```
The container will open a shell when the run command execution is completed. It should look similar to this:
```
Expand Down
2 changes: 1 addition & 1 deletion samples/wdl/HugeCTR_WDL_Inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@
"\n",
"In this tutorial, we will deploy the Wide&Deep to a single A100(32GB)\n",
"\n",
"docker run --gpus=all -it -v /wdl_infer/:/wdl_infer -v /wdl_train/:/wdl_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.09 /bin/bash\n",
"docker run --gpus=all -it -v /wdl_infer/:/wdl_infer -v /wdl_train/:/wdl_train --net=host nvcr.io/nvidia/merlin/merlin-hugectr:22.10 /bin/bash\n",
"After you enter into the container you can launch triton server with the command below:\n",
"\n",
"tritonserver --model-repository=/wdl_infer/model/ --load-model=wdl \n",
Expand Down