Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RTD #321

Merged
merged 29 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1715320
add RTD template
mrwyattii Nov 21, 2023
fea34b9
add conf.py
mrwyattii Nov 21, 2023
687c945
update autodoc
mrwyattii Nov 21, 2023
87b69c4
fleshing out docs
mrwyattii Nov 29, 2023
64231aa
added some extensions
mrwyattii Nov 29, 2023
22d1a6f
filling out API docs, adding sphinx extensions
mrwyattii Nov 29, 2023
9de0959
continuing to add more docs
mrwyattii Nov 29, 2023
61093fa
finish pipeline docs
mrwyattii Nov 29, 2023
042ebd6
update install and persistent deployment
mrwyattii Nov 29, 2023
1e6985b
add persistent deployment docs
mrwyattii Dec 4, 2023
0681f25
updating config docs
mrwyattii Dec 4, 2023
5e2289a
add REST API docs
mrwyattii Dec 4, 2023
fa5f948
adding more pages, links, etc.
mrwyattii Dec 4, 2023
a5b4b06
added text for configs
mrwyattii Jan 6, 2024
5240a3f
final updates to RTD
mrwyattii Feb 10, 2024
26c9e04
updated requirements
mrwyattii Feb 10, 2024
8d39591
updated requirements
mrwyattii Feb 10, 2024
7e57a5f
updated requirements
mrwyattii Feb 10, 2024
dcdecee
remove DSConfigModel base
mrwyattii Feb 10, 2024
e5dcad7
resolve dependency problem for RTD build
mrwyattii Feb 10, 2024
3ca5eb3
Merge branch 'main' into rtd-staging
mrwyattii Feb 13, 2024
82c8f5e
fix problems from merge
mrwyattii Feb 10, 2024
669fe7c
fix links to generate config
mrwyattii Feb 14, 2024
abb01c4
fix broken unit test
mrwyattii Feb 14, 2024
50840d4
fix broken unit test
mrwyattii Feb 15, 2024
bf94e10
clean up changes around
mrwyattii Feb 15, 2024
a3a30b2
pass to Pipeline object
mrwyattii Feb 15, 2024
180da23
Merge branch 'main' into rtd-staging
mrwyattii Feb 15, 2024
0012e33
Merge branch 'rtd-staging' of https://github.com/microsoft/DeepSpeed-…
mrwyattii Feb 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: "2"

build:
os: "ubuntu-22.04"
tools:
python: "3.10"

python:
install:
- requirements: docs/requirements.txt

sphinx:
configuration: docs/source/conf.py
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
14 changes: 14 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
asyncio
autodoc_pydantic<2.0.0
deepspeed>=0.13.0
grpcio
grpcio-tools
sphinx==7.1.2
sphinx-prompt
sphinx-rtd-theme==1.3.0rc1
sphinx_autodoc_typehints
sphinx_copybutton
torch
transformers
ujson
zmq
25 changes: 25 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
API
===

DeepSpeed-MII provides a very simple API to deploy your LLM:

.. autofunction:: mii.pipeline

The :func:`mii.pipeline` API is a great way to try DeepSpeed-MII with ragged
batching and dynamic splitfuse. The pipeline is non-persistent and only exists
for the lifetime of the python script where it is used. For examples of how to
use :func:`mii.pipeline` please see :doc:`pipeline`.

.. autofunction:: mii.serve

The :func:`mii.serve` API is intended for production use cases, where a
persistent model deployment is necessary. The persistent deployment utilizes
ragged batching and dynamic splitfuse to deliver high throughput and low latency
to multiple clients in parallel. For examples of how to use :func:`mii.serve`
please see :doc:`deployment`.

.. autofunction:: mii.client

The :func:`mii.client` API allows multiple processes to connect to a persistent
deployment created with :func:`mii.serve`. For examples of how to use
:func:`mii.client` please see :doc:`deployment`.
79 changes: 79 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
# Configuration file for the Sphinx documentation builder.
import os
import sys

sys.path.insert(0, os.path.abspath('../../'))

# -- Project information

project = 'DeepSpeed-MII'
copyright = '2023, Microsoft'
author = 'Microsoft'

with open("../../version.txt", "r") as f:
release = f.readline().rstrip()

# -- General configuration

extensions = [
'sphinx.ext.duration',
'sphinx.ext.doctest',
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx.ext.viewcode',
'sphinx_autodoc_typehints',
'sphinx_copybutton',
'sphinx-prompt',
'sphinxcontrib.autodoc_pydantic',
]

intersphinx_mapping = {
'python': ('https://docs.python.org/3/',
None),
'sphinx': ('https://www.sphinx-doc.org/en/master/',
None),
}
intersphinx_disabled_domains = ['std']

# sphinx_autodoc_typehints config
typehints_defaults = "braces"

# autodoc_pyandtic config
autodoc_pydantic_model_show_field_summary = False
autodoc_pydantic_field_signature_prefix = ' '
autodoc_pydantic_model_signature_prefix = 'class'
autodoc_pydantic_model_show_json = False
autodoc_pydantic_model_show_config_summary = False
autodoc_pydantic_model_show_config_member = False
autodoc_pydantic_model_show_validator_summary = False
autodoc_pydantic_model_show_validator_members = False
autodoc_pydantic_model_summary_list_order = 'bysource'
autodoc_pydantic_model_member_order = 'bysource'
autodoc_pydantic_field_list_validators = False

# sphinx_copybutton config
copybutton_prompt_text = r">>> |\$ |\(.venv\) \$ "
copybutton_prompt_is_regexp = True

#autodoc_mock_imports = ["deepspeed", "torch"]
autodoc_member_order = 'bysource'
autosummary_generate = True

templates_path = ['_templates']

# -- Options for HTML output

html_theme = 'sphinx_rtd_theme'
html_theme_options = {
"logo_only": True,
}
html_logo = "../images/mii-dark.svg"
logo_only = True

# -- Options for EPUB output
epub_show_urls = 'footnote'
72 changes: 72 additions & 0 deletions docs/source/config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
Configuration
=============

The config classes described here are used to customize :doc:`pipeline` and :doc:`deployment`.

.. _model_configuration:

Model Configuration
-------------------

The :class:`ModelConfig <mii.config.ModelConfig>` is used to stand up a
DeepSpeed inference engine and provides a large amount of control to users. This
class is automatically generated from user-provided arguments to
:func:`mii.pipeline` and :func:`mii.serve`. The fields can be provided in a
``model_config`` dictionary or as keyword arguments.

For example, to change the default ``max_length`` for token generation of a
pipeline, the following are equivalent:

As a keyword argument:

.. code-block:: python

pipe = mii.pipeline("mistralai/Mistral-7B-v0.1", max_length=2048)

As a ``model_config`` dictionary:

.. code-block:: python

pipe = mii.pipeline("mistralai/Mistral-7B-v0.1", model_config={"max_length": 2048})

.. autopydantic_model:: mii.config.ModelConfig

.. _mii_configuration:

MII Server Configuration
------------------------

The :class:`MIIConfig <mii.config.MIIConfig>` is used to stand up a
DeepSpeed-MII `gRPC <https://grpc.io/>`_ server and provide a large amount of
control to users. This class is automatically generated from user-provided
arguments to :func:`mii.serve`. The fields can be provided in a ``mii_config``
dictionary or as keyword arguments.

For example, to change the base port number used to to communicate with a
persistent deployment and the default ``max_length`` for token generation, the
following are equivalent:

As keyword arguments:

.. code-block:: python

client = mii.serve("mistralai/Mistral-7B-v0.1", port_number=50055, max_length=2048)

As ``model_config`` and ``mii_config`` dictionaries:

.. code-block:: python

client = mii.serve("mistralai/Mistral-7B-v0.1", mii_config={"port_number": 50055}, model_config={"max_length": 2048})

.. autopydantic_model:: mii.config.MIIConfig

Text-Generation Configuration
-----------------------------

The :class:`GenerateParamsConfig <mii.config.GenerateParamsConfig>` is used to
process user-provided keyword arguments passed to :class:`MIIPipeline
<mii.batching.ragged_batching.MIIPipeline>` and :class:`MIIClient
<mii.backend.client.MIIClient>` when doing text-generation.

.. autopydantic_model:: mii.config.GenerateParamsConfig
:exclude-members: prompt_length
128 changes: 128 additions & 0 deletions docs/source/deployment.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
Persistent Deployments
======================

A persistent model deployment can created with the :func:`mii.serve` API. This
stands up a gRPC server and returns a :class:`MIIClient
<mii.backend.client.MIIClient>` object that can be used to send generation
requests to the inference server. The inference server will persist after the
python script exits and until it is explicitly terminated.

To connect to an existing deployment, the :func:`mii.client` API is used. This
will connect with an existing gRPC server and return a :class:`MIIClient
<mii.backend.client.MIIClient>` object.

MIIClient
---------

.. autoclass::
mii.backend.client.MIIClient

.. automethod:: __call__

.. automethod:: generate

.. automethod:: terminate_server

:class:`MIIClient <mii.backend.client.MIIClient>` is a callable class that
provides a simplified interface for generating text for prompt inputs on a
persistent model deployment. To create a persistent deployment, you must only
provide the HuggingFace model name (or path to a locally stored model) to the
:func:`mii.serve` API. DeepSpeed-MII will automatically load the model weights,
create an inference engine, stand up a gRPC server, and return the callable
client. An example is provided below:

.. code-block:: python

import mii
client = mii.serve("mistralai/Mistral-7B-v0.1")
response = client(["DeepSpeed is", "Seattle is"], max_new_tokens=128)
print(response)

Because the deployment is persistent, this server will continue running until it
is explicitly shutdown. This allows users to connect to a deployment from other
processes using the :func:`mii.client` API:

.. code-block:: python

import mii
client = mii.client("mistralai/Mistral-7B-v0.1")
response = client(["DeepSpeed is", "Seattle is"], max_new_tokens=128)
print(response)

When a server needs to be shutdown, this can be done from any client object:

.. code-block:: python

import mii
client = mii.client("mistralai/Mistral-7B-v0.1")
client.terminate_server()

Deployment Configuration
------------------------

While we prioritize offering a simple interface for loading models into
production-ready persistent deployments, we also provide many configuration
options for our persistent deployment.

**Any of the fields in** :class:`ModelConfig <mii.config.ModelConfig>` **and**
:class:`MIIConfig <mii.config.MIIConfig>` **can be passed as keyword
arguments or in respective** ``model_config`` **and** ``mii_config``
**dictionaries to the** :func:`mii.serve` **API. Please see** :ref:`Model
Configuration <model_configuration>` **and** :ref:`MII Server Configuration
<mii_configuration>` **for more information.**


Generate Options
----------------

Text-generation behavior using the callable :class:`MIIClient
<mii.backend.client.MIIClient>` class can be customized with several keyword
arguments. A full list of the available options can be found in
:class:`GenerateParamsConfig <mii.config.GenerateParamsConfig>`.

The generate options affect on the prompt(s) passed in a given call the client.
For example, the generation length can be controlled on a per-prompt basis and
override the default ``max_length``:

.. code-block:: python

response_long = client(prompt, max_length=1024)
response_short = client(prompt, max_length=128)

.. _deployment_model_parallelism:

Model Parallelism
-----------------

Our persistent deployment supports splitting models across multiple GPUs using
tensor parallelism. To enable model parallelism, pass the ``tensor_parallel``
argument to :func:`mii.serve`:

.. code-block:: python

client = mii.serve("mistralai/Mistral-7B-v0.1", tensor_parallel=2)

.. _deployment_model_replicas:

Model Replicas
--------------

The persistent deployment can also create multiple model replicas. Passing the
``replica_num`` argument to :func:`mii.serve` enables this feature:

.. code-block:: python

client = mii.serve("mistralai/Mistral-7B-v0.1", replica_num=2)

With multiple model replicas, the incoming requests from clients will be
forwarded to the replicas in a round-robin scheduling by an intermediate
load-balancer process. For example, if 4 requests with ids ``0, 1, 2, 3`` are
sent to the persistent deployment, then ``replica 0`` will process requests
``0`` and ``2`` while ``replica 1`` will process requests ``1`` and ``3``.

Model replicas also compose with model parallelism. For example, 2 replicas can
be created each split across 2 GPUs on a system with 4 GPUs total:

.. code-block:: python

client = mii.serve("mistralai/Mistral-7B-v0.1", replica_num=2, tensor_parallel=2)
Loading