Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r1.0][server] update readme #1852

Merged
merged 1 commit into from
May 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions demos/speech_server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@ This demo is an implementation of starting the voice service and accessing the s
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

It is recommended to use **paddlepaddle 2.2.1** or above.
It is recommended to use **paddlepaddle 2.2.2** or above.
You can choose one way from meduim and hard to install paddlespeech.

### 2. Prepare config File
The configuration file can be found in `conf/application.yaml` .
Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
Currently the engine type supports two forms: python and inference (Paddle Inference)
**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.


The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Expand Down Expand Up @@ -51,8 +52,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)

```

Expand All @@ -74,8 +75,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)

```

Expand Down
19 changes: 10 additions & 9 deletions demos/speech_server/README_cn.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,28 @@
([简体中文](./README_cn.md)|English)
(简体中文|[English](./README.md))

# 语音服务

## 介绍
这个demo是一个启动语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
这个demo是一个启动离线语音服务和访问服务的实现。它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。


## 使用方法
### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

推荐使用 **paddlepaddle 2.2.1** 或以上版本。
你可以从 medium,hard 三中方式中选择一种方式安装 PaddleSpeech。
推荐使用 **paddlepaddle 2.2.2** 或以上版本。
你可以从 medium,hard 两种方式中选择一种方式安装 PaddleSpeech。


### 2. 准备配置文件
配置文件可参见 `conf/application.yaml` 。
其中,`engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
目前服务集成的语音任务有: asr(语音识别)、tts(语音合成)以及cls(音频分类)。
目前引擎类型支持两种形式:python 及 inference (Paddle Inference)
**注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。


这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
ASR client 的输入是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。

可以下载此 ASR client的示例音频:
```bash
Expand Down Expand Up @@ -52,8 +53,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)

```

Expand All @@ -75,8 +76,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
INFO: Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)

```

Expand Down
6 changes: 3 additions & 3 deletions demos/speech_server/conf/application.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# This is the parameter configuration file for PaddleSpeech Serving.
# This is the parameter configuration file for PaddleSpeech Offline Serving.

#################################################################################
# SERVER SETTING #
Expand All @@ -7,8 +7,8 @@ host: 127.0.0.1
port: 8090

# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference']

# task choices = ['asr_python', 'asr_inference', 'tts_python', 'tts_inference', 'cls_python', 'cls_inference']
protocol: 'http'
engine_list: ['asr_python', 'tts_python', 'cls_python']


Expand Down
6 changes: 5 additions & 1 deletion demos/streaming_tts_server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This demo is an implementation of starting the streaming speech synthesis servic
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

It is recommended to use **paddlepaddle 2.2.1** or above.
It is recommended to use **paddlepaddle 2.2.2** or above.
You can choose one way from meduim and hard to install paddlespeech.


Expand All @@ -29,6 +29,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
- When the voc model is hifigan, when voc_pad=20, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
- Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
- **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.



### 3. Streaming speech synthesis server and client using http protocol
Expand Down Expand Up @@ -120,6 +122,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.

Output:
```bash
Expand Down Expand Up @@ -254,6 +257,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
- `spk_id, speed, volume, sample_rate` do not take effect in streaming speech synthesis service temporarily.


Output:
Expand Down
30 changes: 17 additions & 13 deletions demos/streaming_tts_server/README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,27 @@
### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

推荐使用 **paddlepaddle 2.2.1** 或以上版本。
推荐使用 **paddlepaddle 2.2.2** 或以上版本。
你可以从 medium,hard 两种方式中选择一种方式安装 PaddleSpeech。


### 2. 准备配置文件
配置文件可参见 `conf/tts_online_application.yaml` 。
- `protocol`表示该流式TTS服务使用的网络协议,目前支持 **http 和 websocket** 两种。
- `engine_list`表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
- 该demo主要介绍流式语音合成服务,因此语音任务应设置为tts
- 目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎;**online-onnx** 表示使用onnxruntime进行推理的引擎。其中,online-onnx的推理速度更快
- 流式TTS引擎的AM模型支持:**fastspeech2 以及fastspeech2_cnndecoder**; Voc 模型支持:**hifigan, mb_melgan**
- 流式am推理中,每次会对一个chunk的数据进行推理以达到流式的效果。其中`am_block`表示chunk中的有效帧数,`am_pad` 表示一个chunk中am_block前后各加的帧数。am_pad的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- fastspeech2不支持流式am推理,因此am_pad与am_block对它无效
- fastspeech2_cnndecoder 支持流式推理,当am_pad=12时,流式推理合成音频与非流式合成音频一致
- 流式voc推理中,每次会对一个chunk的数据进行推理以达到流式的效果。其中`voc_block`表示chunk中的有效帧数,`voc_pad` 表示一个chunk中voc_block前后各加的帧数。voc_pad的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- hifigan, mb_melgan 均支持流式voc 推理
- 当voc模型为mb_melgan,当voc_pad=14时,流式推理合成音频与非流式合成音频一致;voc_pad最小可以设置为7,合成音频听感上没有异常,若voc_pad小于7,合成音频听感上存在异常。
- 当voc模型为hifigan,当voc_pad=20时,流式推理合成音频与非流式合成音频一致;当voc_pad=14时,合成音频听感上没有异常。
- `protocol` 表示该流式 TTS 服务使用的网络协议,目前支持 **http 和 websocket** 两种。
- `engine_list` 表示即将启动的服务将会包含的语音引擎,格式为 <语音任务>_<引擎类型>。
- 该 demo 主要介绍流式语音合成服务,因此语音任务应设置为 tts
- 目前引擎类型支持两种形式:**online** 表示使用python进行动态图推理的引擎;**online-onnx** 表示使用 onnxruntime 进行推理的引擎。其中,online-onnx 的推理速度更快
- 流式 TTS 引擎的 AM 模型支持:**fastspeech2 以及fastspeech2_cnndecoder**; Voc 模型支持:**hifigan, mb_melgan**
- 流式 am 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `am_block` 表示 chunk 中的有效帧数,`am_pad` 表示一个 chunk 中 am_block 前后各加的帧数。am_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- fastspeech2 不支持流式 am 推理,因此 am_pad 与 m_block 对它无效
- fastspeech2_cnndecoder 支持流式推理,当 am_pad=12 时,流式推理合成音频与非流式合成音频一致
- 流式 voc 推理中,每次会对一个 chunk 的数据进行推理以达到流式的效果。其中 `voc_block` 表示chunk中的有效帧数,`voc_pad` 表示一个 chunk 中 voc_block 前后各加的帧数。voc_pad 的存在用于消除流式推理产生的误差,避免由流式推理对合成音频质量的影响。
- hifigan, mb_melgan 均支持流式 voc 推理
- 当 voc 模型为 mb_melgan,当 voc_pad=14 时,流式推理合成音频与非流式合成音频一致;voc_pad 最小可以设置为7,合成音频听感上没有异常,若 voc_pad 小于7,合成音频听感上存在异常。
- 当 voc 模型为 hifigan,当 voc_pad=20 时,流式推理合成音频与非流式合成音频一致;当 voc_pad=14 时,合成音频听感上没有异常。
- 推理速度:mb_melgan > hifigan; 音频质量:mb_melgan < hifigan
- **注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。


### 3. 使用http协议的流式语音合成服务端及客户端使用方法
#### 3.1 服务端使用方法
Expand Down Expand Up @@ -119,6 +121,7 @@
- `sample_rate`: 采样率,可选 [0, 8000, 16000],默认值:0,表示与模型采样率相同
- `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。
- `play`: 是否播放音频,边合成边播放, 默认值:False,表示不播放。**播放音频需要依赖pyaudio库**。
- `spk_id, speed, volume, sample_rate` 在流式语音合成服务中暂时不生效。


输出:
Expand Down Expand Up @@ -254,6 +257,7 @@
- `sample_rate`: 采样率,可选 [0, 8000, 16000],默认值:0,表示与模型采样率相同
- `output`: 输出音频的路径, 默认值:None,表示不保存音频到本地。
- `play`: 是否播放音频,边合成边播放, 默认值:False,表示不播放。**播放音频需要依赖pyaudio库**。
- `spk_id, speed, volume, sample_rate` 在流式语音合成服务中暂时不生效。


输出:
Expand Down
4 changes: 3 additions & 1 deletion paddlespeech/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@
paddlespeech_server help
```
### Start the server
First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started
First set the service-related configuration parameters, similar to `./conf/application.yaml`. Set `engine_list`, which represents the speech tasks included in the service to be started.
**Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.

Then start the service:
```bash
paddlespeech_server start --config_file ./conf/application.yaml
Expand Down
1 change: 1 addition & 0 deletions paddlespeech/server/README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
```
### 启动服务
首先设置服务相关配置文件,类似于 `./conf/application.yaml`,设置 `engine_list`,该值表示即将启动的服务中包含的语音任务。
**注意:** 如果在容器里可正常启动服务,但客户端访问 ip 不可达,可尝试将配置文件中 `host` 地址换成本地 ip 地址。
然后启动服务:
```bash
paddlespeech_server start --config_file ./conf/application.yaml
Expand Down