updating README

aniketmaurya · Mar 17, 2024 · d83e937 · d83e937
1 parent 11ddf41
commit d83e937
Showing 1 changed file with 7 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -104,25 +104,26 @@ terminal.
 
 ### Serve HuggingFace Models
 
-You can easily serve any HuggingFace Transformer model using FastServe.
+Leveraging FastServe, you can seamlessly serve any HuggingFace Transformer model, enabling flexible deployment across various computing environments, from CPU-based systems to powerful GPU and multi-GPU setups.
 
 For some models, it is required to have a HuggingFace API token correctly set up in your environment to access models from the HuggingFace Hub.
 This is not necessary for all models, but you may encounter this requirement, such as accepting terms of use or any other necessary steps. Take a look at your model's page for specific requirements.
 ```
 export HUGGINGFACE_TOKEN=<your hf token>
 ```
 
-Example of run the server:
+The server can be easily initiated with a specific model. In the example below, we demonstrate using `gpt2`. You should replace `gpt2` with your model of choice. The `model_name` parameter is optional; if not provided, the class attempts to fetch the model name from an environment variable `HUGGINGFACE_MODEL_NAME`. Additionally, you can now specify whether to use GPU acceleration with the `use_gpu` parameter, which defaults to `False` for CPU usage.
+
 ```python
 from fastserve.models import ServeHuggingFace
 
-# Here, we use "gpt2" as an example. Replace "gpt2" with the name of your desired model.
-# The `model_name` parameter is optional; the class can retrieve it from an environment variable called `HUGGINGFACE_MODEL_NAME`.
-app = ServeHuggingFace(model_name="gpt2")
+# Initialize with GPU support if desired by setting `use_gpu=True`.
+# For CPU usage, you can omit `use_gpu` or set it to `False`.
+app = ServeHuggingFace(model_name="gpt2", use_gpu=True)
 app.run_server()
 ```
 
-or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1` from
+or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1 --use_gpu 1` from
 terminal.
 
 To make a request to the server, send a JSON payload with the prompt you want the model to generate text for. Here's an example using requests in Python: