From ad162a602f846832262231dcd4471de01debd76e Mon Sep 17 00:00:00 2001
From: syaseen-rh <syaseen@redhat.com>
Date: Tue, 20 Aug 2024 13:13:43 -0400
Subject: [PATCH] removing step 7 as per SME feedback

---
 modules/optimizing-the-vllm-runtime.adoc | 38 +-----------------------
 1 file changed, 1 insertion(+), 37 deletions(-)
diff --git a/modules/optimizing-the-vllm-runtime.adoc b/modules/optimizing-the-vllm-runtime.adoc
index 22312291..78408293 100644
--- a/modules/optimizing-the-vllm-runtime.adoc
+++ b/modules/optimizing-the-vllm-runtime.adoc
@@ -75,7 +75,7 @@ containers:
 ----
 + 
 .. Replace `<path_to_speculative_model>` and `<path_to_original_model>` with the paths to the speculative model and original model on your S3-compatible object storage. 
-.. Replace all other placeholder values with your own.
+.. Replace `<NUM_SPECULATIVE_TOKENS>` with your own value.
 . To configure the vLLM model-serving runtime for multi-modal inferencing, add the following arguments:
 +
 [source]
@@ -92,42 +92,6 @@ Only use the `--trust-remote-code` argument with models from trusted sources.
 . Click *Update*.
 +
 The *Serving runtimes* page opens and shows the list of runtimes that are installed. Confirm that the custom model-serving runtime you updated is shown.
-. For speculative decoding, you must additionally redeploy the `InferenceService` custom resource definition (CRD) for the vLLM model-serving runtime as follows:
-.. Log in to the OpenShift CLI.
-.. List the available inference services in your namespace:
-+
-[source]
-----
-oc get -n <namespace> isvc
-----
-.. Note the name of the `InferenceService` that needs to be redeployed.
-.. Save the `InferenceService` manifest to a YAML file:
-+
-[source]
-----
-oc get -n <namespace> isvc <inference-service-name> -o yaml > inferenceservice.yml
-----
-.. Replace the placeholder values with your own.
-.. Delete the exisiting `InferenceService` CRD:
-+
-[source]
-----
-oc delete -n <namespace> isvc <inference-service-name>
-----
-.. Replace the placeholder values with your own.
-.. Deploy the modified `InferenceService` CRD using the YAML file that you saved:
-+
-[source]
-----
-oc apply -f inferenceservice.yml
-----
-.. Optional: Check the status of the `InferenceService` deployment as follows:
-+
-[source]
-----
-oc get pod,isvc -n <namespace>
-----
-.. Replace placeholder values with your own.
 ifdef::upstream[]
 . Deploy the model by using the custom runtime as described in {odhdocshome}/serving-models/#deploying-models-using-the-single-model-serving-platform_serving-large-models[Deploying models on the single-model serving platform].
 endif::[]