From ea51b3705a600d82290bd880ad68c90f0758cb15 Mon Sep 17 00:00:00 2001 From: Sanger Steel Date: Wed, 12 Jun 2024 16:20:07 -0400 Subject: [PATCH 1/6] docs: Update documentation on Tensorizer --- docs/source/models/tensorizer.rst | 13 +++++++++++++ vllm/engine/arg_utils.py | 2 +- 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 docs/source/models/tensorizer.rst diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst new file mode 100644 index 0000000000000..36209f50348ae --- /dev/null +++ b/docs/source/models/tensorizer.rst @@ -0,0 +1,13 @@ +.. _tensorizer: + +Loading Models with CoreWeave's Tensorizer +================ +vLLM supports loading models with `CoreWeave's Tensorizer `_. +vLLM model tensors serialized to disk, an HTTP/HTTPS or S3 endpoint can be deserialized +at runtime extremely quickly and directly to the GPU, allowing for significantly +shorter pod startup time and CPU memory usage. Tensor encryption is also supported. + +For more information on how to use CoreWeave's Tensorizer, please refer to +`CoreWeave's Tensorizer documentation `_ and +the `example script here `_ for +how to serialize a vLLM model as well a general usage guide to using Tensorizer with vLLM. \ No newline at end of file diff --git a/vllm/engine/arg_utils.py b/vllm/engine/arg_utils.py index 227de5475b949..ba53b5c86fa72 100644 --- a/vllm/engine/arg_utils.py +++ b/vllm/engine/arg_utils.py @@ -230,7 +230,7 @@ def add_cli_args( '* "dummy" will initialize the weights with random values, ' 'which is mainly for profiling.\n' '* "tensorizer" will load the weights using tensorizer from ' - 'CoreWeave. See the Tensorize vLLM Model script in the Examples' + 'CoreWeave. See the Tensorize vLLM Model script in the Examples ' 'section for more information.\n' '* "bitsandbytes" will load the weights using bitsandbytes ' 'quantization.\n') From 2bfe485a4646f22b51f18dc4dbd01cb3b8113a16 Mon Sep 17 00:00:00 2001 From: Sanger Steel Date: Wed, 12 Jun 2024 16:32:31 -0400 Subject: [PATCH 2/6] docs: Fix grammar --- docs/source/models/tensorizer.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst index 36209f50348ae..05a2133473cba 100644 --- a/docs/source/models/tensorizer.rst +++ b/docs/source/models/tensorizer.rst @@ -3,7 +3,7 @@ Loading Models with CoreWeave's Tensorizer ================ vLLM supports loading models with `CoreWeave's Tensorizer `_. -vLLM model tensors serialized to disk, an HTTP/HTTPS or S3 endpoint can be deserialized +vLLM model tensors serialized to a HTTP/HTTPS endpoint, a S3 endpoint, or disk, can be deserialized at runtime extremely quickly and directly to the GPU, allowing for significantly shorter pod startup time and CPU memory usage. Tensor encryption is also supported. From f28af644fec12041a7c092665c7dbeea556a88dc Mon Sep 17 00:00:00 2001 From: Sanger Steel Date: Wed, 12 Jun 2024 16:41:19 -0400 Subject: [PATCH 3/6] docs: Make `.rst` doc less wordy --- docs/source/models/tensorizer.rst | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst index 05a2133473cba..b82128442792b 100644 --- a/docs/source/models/tensorizer.rst +++ b/docs/source/models/tensorizer.rst @@ -3,11 +3,10 @@ Loading Models with CoreWeave's Tensorizer ================ vLLM supports loading models with `CoreWeave's Tensorizer `_. -vLLM model tensors serialized to a HTTP/HTTPS endpoint, a S3 endpoint, or disk, can be deserialized -at runtime extremely quickly and directly to the GPU, allowing for significantly -shorter pod startup time and CPU memory usage. Tensor encryption is also supported. +vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized +at runtime extremely quickly directly to the GPU, resulting in significantly +shorter Pod startup times and CPU memory usage. Tensor encryption is also supported. -For more information on how to use CoreWeave's Tensorizer, please refer to -`CoreWeave's Tensorizer documentation `_ and -the `example script here `_ for -how to serialize a vLLM model as well a general usage guide to using Tensorizer with vLLM. \ No newline at end of file +For more information on CoreWeave's Tensorizer, please refer to +`CoreWeave's Tensorizer documentation `_. For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see +the `vLLM example script `_. \ No newline at end of file From 44a89559fcf50f469df619b059c0b90991e170fe Mon Sep 17 00:00:00 2001 From: Sanger Steel Date: Thu, 13 Jun 2024 09:37:46 -0400 Subject: [PATCH 4/6] docs: Fix underline --- docs/source/models/tensorizer.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst index b82128442792b..ffff10719356e 100644 --- a/docs/source/models/tensorizer.rst +++ b/docs/source/models/tensorizer.rst @@ -1,7 +1,7 @@ .. _tensorizer: Loading Models with CoreWeave's Tensorizer -================ +========================================== vLLM supports loading models with `CoreWeave's Tensorizer `_. vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized at runtime extremely quickly directly to the GPU, resulting in significantly From 10665f90cdf434faf53850500ddfa4f0e64a9989 Mon Sep 17 00:00:00 2001 From: Sanger Steel Date: Thu, 13 Jun 2024 15:04:43 -0400 Subject: [PATCH 5/6] docs: Resolve review comments --- docs/source/index.rst | 1 + docs/source/{models => serving}/tensorizer.rst | 0 2 files changed, 1 insertion(+) rename docs/source/{models => serving}/tensorizer.rst (100%) diff --git a/docs/source/index.rst b/docs/source/index.rst index b7c0d5b880079..f5d8627596a70 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -81,6 +81,7 @@ Documentation serving/env_vars serving/usage_stats serving/integrations + serving/tensorizer .. toctree:: :maxdepth: 1 diff --git a/docs/source/models/tensorizer.rst b/docs/source/serving/tensorizer.rst similarity index 100% rename from docs/source/models/tensorizer.rst rename to docs/source/serving/tensorizer.rst From eb7740830276f97e188051d340fc6cc4c58f1516 Mon Sep 17 00:00:00 2001 From: Sanger Steel Date: Thu, 13 Jun 2024 15:14:03 -0400 Subject: [PATCH 6/6] docs: Update Tensorizer website link --- docs/source/serving/tensorizer.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/serving/tensorizer.rst b/docs/source/serving/tensorizer.rst index ffff10719356e..a44696507fb9a 100644 --- a/docs/source/serving/tensorizer.rst +++ b/docs/source/serving/tensorizer.rst @@ -2,7 +2,7 @@ Loading Models with CoreWeave's Tensorizer ========================================== -vLLM supports loading models with `CoreWeave's Tensorizer `_. +vLLM supports loading models with `CoreWeave's Tensorizer `_. vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized at runtime extremely quickly directly to the GPU, resulting in significantly shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.