Skip to content

Commit

Permalink
updating README
Browse files Browse the repository at this point in the history
  • Loading branch information
antunsz committed Mar 17, 2024
1 parent 11ddf41 commit d83e937
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,25 +104,26 @@ terminal.

### Serve HuggingFace Models

You can easily serve any HuggingFace Transformer model using FastServe.
Leveraging FastServe, you can seamlessly serve any HuggingFace Transformer model, enabling flexible deployment across various computing environments, from CPU-based systems to powerful GPU and multi-GPU setups.

For some models, it is required to have a HuggingFace API token correctly set up in your environment to access models from the HuggingFace Hub.
This is not necessary for all models, but you may encounter this requirement, such as accepting terms of use or any other necessary steps. Take a look at your model's page for specific requirements.
```
export HUGGINGFACE_TOKEN=<your hf token>
```

Example of run the server:
The server can be easily initiated with a specific model. In the example below, we demonstrate using `gpt2`. You should replace `gpt2` with your model of choice. The `model_name` parameter is optional; if not provided, the class attempts to fetch the model name from an environment variable `HUGGINGFACE_MODEL_NAME`. Additionally, you can now specify whether to use GPU acceleration with the `use_gpu` parameter, which defaults to `False` for CPU usage.

```python
from fastserve.models import ServeHuggingFace

# Here, we use "gpt2" as an example. Replace "gpt2" with the name of your desired model.
# The `model_name` parameter is optional; the class can retrieve it from an environment variable called `HUGGINGFACE_MODEL_NAME`.
app = ServeHuggingFace(model_name="gpt2")
# Initialize with GPU support if desired by setting `use_gpu=True`.
# For CPU usage, you can omit `use_gpu` or set it to `False`.
app = ServeHuggingFace(model_name="gpt2", use_gpu=True)
app.run_server()
```

or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1` from
or, run `python -m fastserve.models --model huggingface --model_name bigcode/starcoder --batch_size 4 --timeout 1 --use_gpu 1` from
terminal.

To make a request to the server, send a JSON payload with the prompt you want the model to generate text for. Here's an example using requests in Python:
Expand Down

0 comments on commit d83e937

Please sign in to comment.