From ea51b3705a600d82290bd880ad68c90f0758cb15 Mon Sep 17 00:00:00 2001
From: Sanger Steel <sangersteel@gmail.com>
Date: Wed, 12 Jun 2024 16:20:07 -0400
Subject: [PATCH 1/6] docs: Update documentation on Tensorizer

---
 docs/source/models/tensorizer.rst | 13 +++++++++++++
 vllm/engine/arg_utils.py          |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 docs/source/models/tensorizer.rst

diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst
new file mode 100644
index 0000000000000..36209f50348ae
--- /dev/null
+++ b/docs/source/models/tensorizer.rst
@@ -0,0 +1,13 @@
+.. _tensorizer:
+
+Loading Models with CoreWeave's Tensorizer
+================
+vLLM supports loading models with `CoreWeave's Tensorizer <https://github.com/coreweave/tensorizer>`_.
+vLLM model tensors serialized to disk, an HTTP/HTTPS or S3 endpoint can be deserialized
+at runtime extremely quickly and directly to the GPU, allowing for significantly
+shorter pod startup time and CPU memory usage. Tensor encryption is also supported.
+
+For more information on how to use CoreWeave's Tensorizer, please refer to
+`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_ and
+the `example script here <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_ for
+how to serialize a vLLM model as well a general usage guide to using Tensorizer with vLLM.
\ No newline at end of file
diff --git a/vllm/engine/arg_utils.py b/vllm/engine/arg_utils.py
index 227de5475b949..ba53b5c86fa72 100644
--- a/vllm/engine/arg_utils.py
+++ b/vllm/engine/arg_utils.py
@@ -230,7 +230,7 @@ def add_cli_args(
             '* "dummy" will initialize the weights with random values, '
             'which is mainly for profiling.\n'
             '* "tensorizer" will load the weights using tensorizer from '
-            'CoreWeave. See the Tensorize vLLM Model script in the Examples'
+            'CoreWeave. See the Tensorize vLLM Model script in the Examples '
             'section for more information.\n'
             '* "bitsandbytes" will load the weights using bitsandbytes '
             'quantization.\n')

From 2bfe485a4646f22b51f18dc4dbd01cb3b8113a16 Mon Sep 17 00:00:00 2001
From: Sanger Steel <sangersteel@gmail.com>
Date: Wed, 12 Jun 2024 16:32:31 -0400
Subject: [PATCH 2/6] docs: Fix grammar

---
 docs/source/models/tensorizer.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst
index 36209f50348ae..05a2133473cba 100644
--- a/docs/source/models/tensorizer.rst
+++ b/docs/source/models/tensorizer.rst
@@ -3,7 +3,7 @@
 Loading Models with CoreWeave's Tensorizer
 ================
 vLLM supports loading models with `CoreWeave's Tensorizer <https://github.com/coreweave/tensorizer>`_.
-vLLM model tensors serialized to disk, an HTTP/HTTPS or S3 endpoint can be deserialized
+vLLM model tensors serialized to a HTTP/HTTPS endpoint, a S3 endpoint, or disk, can be deserialized
 at runtime extremely quickly and directly to the GPU, allowing for significantly
 shorter pod startup time and CPU memory usage. Tensor encryption is also supported.
 

From f28af644fec12041a7c092665c7dbeea556a88dc Mon Sep 17 00:00:00 2001
From: Sanger Steel <sangersteel@gmail.com>
Date: Wed, 12 Jun 2024 16:41:19 -0400
Subject: [PATCH 3/6] docs: Make `.rst` doc less wordy

---
 docs/source/models/tensorizer.rst | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst
index 05a2133473cba..b82128442792b 100644
--- a/docs/source/models/tensorizer.rst
+++ b/docs/source/models/tensorizer.rst
@@ -3,11 +3,10 @@
 Loading Models with CoreWeave's Tensorizer
 ================
 vLLM supports loading models with `CoreWeave's Tensorizer <https://github.com/coreweave/tensorizer>`_.
-vLLM model tensors serialized to a HTTP/HTTPS endpoint, a S3 endpoint, or disk, can be deserialized
-at runtime extremely quickly and directly to the GPU, allowing for significantly
-shorter pod startup time and CPU memory usage. Tensor encryption is also supported.
+vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
+at runtime extremely quickly directly to the GPU, resulting in significantly
+shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.
 
-For more information on how to use CoreWeave's Tensorizer, please refer to
-`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_ and
-the `example script here <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_ for
-how to serialize a vLLM model as well a general usage guide to using Tensorizer with vLLM.
\ No newline at end of file
+For more information on CoreWeave's Tensorizer, please refer to
+`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_. For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
+the `vLLM example script <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_.
\ No newline at end of file

From 44a89559fcf50f469df619b059c0b90991e170fe Mon Sep 17 00:00:00 2001
From: Sanger Steel <sangersteel@gmail.com>
Date: Thu, 13 Jun 2024 09:37:46 -0400
Subject: [PATCH 4/6] docs: Fix underline

---
 docs/source/models/tensorizer.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/models/tensorizer.rst b/docs/source/models/tensorizer.rst
index b82128442792b..ffff10719356e 100644
--- a/docs/source/models/tensorizer.rst
+++ b/docs/source/models/tensorizer.rst
@@ -1,7 +1,7 @@
 .. _tensorizer:
 
 Loading Models with CoreWeave's Tensorizer
-================
+==========================================
 vLLM supports loading models with `CoreWeave's Tensorizer <https://github.com/coreweave/tensorizer>`_.
 vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
 at runtime extremely quickly directly to the GPU, resulting in significantly

From 10665f90cdf434faf53850500ddfa4f0e64a9989 Mon Sep 17 00:00:00 2001
From: Sanger Steel <sangersteel@gmail.com>
Date: Thu, 13 Jun 2024 15:04:43 -0400
Subject: [PATCH 5/6] docs: Resolve review comments

---
 docs/source/index.rst                          | 1 +
 docs/source/{models => serving}/tensorizer.rst | 0
 2 files changed, 1 insertion(+)
 rename docs/source/{models => serving}/tensorizer.rst (100%)

diff --git a/docs/source/index.rst b/docs/source/index.rst
index b7c0d5b880079..f5d8627596a70 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -81,6 +81,7 @@ Documentation
    serving/env_vars
    serving/usage_stats
    serving/integrations
+   serving/tensorizer
 
 .. toctree::
    :maxdepth: 1
diff --git a/docs/source/models/tensorizer.rst b/docs/source/serving/tensorizer.rst
similarity index 100%
rename from docs/source/models/tensorizer.rst
rename to docs/source/serving/tensorizer.rst

From eb7740830276f97e188051d340fc6cc4c58f1516 Mon Sep 17 00:00:00 2001
From: Sanger Steel <sangersteel@gmail.com>
Date: Thu, 13 Jun 2024 15:14:03 -0400
Subject: [PATCH 6/6] docs: Update Tensorizer website link

---
 docs/source/serving/tensorizer.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/serving/tensorizer.rst b/docs/source/serving/tensorizer.rst
index ffff10719356e..a44696507fb9a 100644
--- a/docs/source/serving/tensorizer.rst
+++ b/docs/source/serving/tensorizer.rst
@@ -2,7 +2,7 @@
 
 Loading Models with CoreWeave's Tensorizer
 ==========================================
-vLLM supports loading models with `CoreWeave's Tensorizer <https://github.com/coreweave/tensorizer>`_.
+vLLM supports loading models with `CoreWeave's Tensorizer <https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer>`_.
 vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
 at runtime extremely quickly directly to the GPU, resulting in significantly
 shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.