Performance Benchmarks

Model Size and Benchmarks
- Latest
  - Model Sizes
  - Speed Benchmarks

🚀 Model Size and Benchmarks

Our design philosophy is to provide high quality compact models balancing between capacity and adequate quality.

It is widely known (also confirmed by our own research) that you can get incremental improvements by scaling your model further 2x, 5x, 10x. But we believe firmly that performance gains should be achieved on similar or lower computation budget.

There are novel techniques that might enable packing models close to EE performance into packages as small as 10-20 MB, but for now the lowest we could achieve was about 50 MB for EE models.

Latest

Model Sizes

Model	Params, M	Model size CE, MB	Model size EE, MB
EN V1	45.6	182	~45.5 (EE)
DE V1	45.6	182	~45.5 (EE)
ES V1	52.8	211	~52.75 (EE)
New	~20-30	?	~35 (EE)

Speed Benchmarks

It is customary to publish Multiply-Adds or FLOPS as a measure of compute required, but we prefer just sharing model sizes and tests on commodity hardware.

All of the below benchmarks and estimates were run on 6 cores (12 threads) of AMD Ryzen Threadripper 1920X 12-Core Processor (3500 МHz). Scale accordingly for your device. These tests are just run as is using native PyTorch, without any clever batching / concurrency techniques. You are welcome to submit your test results!

Test procedure:

We take 100 10-second audio files
Split into batches of 1, 5, 10, 25 files
Measure how long it takes to process a batch of a given size on CPU
On GPU our models are so fast that batch size and audio length do no really matter (in practical cases)
We measure how many seconds of audio per second one processor core can process. This is similar to 1 / RTF per core

We report results for the following types of models:

FP32 (baseline)
FP32 + Fused (CE v1)
FP32 + INT8
FP32 Fused + INT8
Full INT8 + Fused (EE, small)
Best / xsmall (EE, xsmall, quantized, compiled, further improved and optimized)
xxsmall - cutting edge model, used in EE distros

Seconds of audio per second per core (1 / RTF per core):

Batch size	FP32	FP32 + Fused	FP32 + INT8	FP32 Fused + INT8	Full INT8 + Fused	New Best (xsmall)	xxsmall
1	7.7	8.8	8.8	9.1	11.0	22.6	33.8
5	11.8	13.6	13.6	15.6	17.5	29.8	50.4
10	12.8	14.6	14.6	16.7	18.0	29.3	53.3
25	12.9	14.9	14.9	17.9	18.7	29.8	49.2

We are not yet decided on which speed improvements should trickle down from EE to CE for which languages.

Full End-To-End Distro Sizing and Performance

GPU Accelerated Sizing

Sizing	Minimal	Recommended
Disk	NVME, 256+ GB	NVME, 256+ GB
RAM	32 GB	32 GB
CPU cores	8+	12+
Core frequency	3+ GHz	3.5+ GHz
Hyper threading	+	+
AVX2 instructions	Not necessary	Not necessary
Compatible GPUs	(*)	(*)
GPU count	1	1

Metrics	8 "threads"	16 "threads"
Mean latency, ms	280	320
95 percentile, ms	430	476
99 percentile, ms	520	592
Files per 1000 ms	25.0	43.4
Files per 500 ms	12.5	21.7
1 / RTF	85.6	145.0
Billing / gRPC threads	12 - 18	22 - 30
1 / RTF / CPU cores	10.7	12.1

(*) Suitable GPUs:

Any Nvidia GPUs higher than 1070 8+GB RAM (blower fan);
Any single-slot Nvidia Quadro 8+GB RAM (TDP 100 - 150W) (blower fan or passive);
Nvidia Tesla T4 (passive) TDP 75W;

CPU Accelerated Sizing

Sizing	Minimal	Recommended
Disk	SSD, 256+ GB	SSD, 256+ GB
RAM	32 GB	32 GB
CPU cores	8+	12+
Core frequency	3.5+GHz+	3.5+ GHz
Hyper threading	+	+
AVX2 instructions	+	+

Metrics	8 "threads"	16 "threads"
Mean latency, ms	320	470
95 percentile, ms	580	760
99 percentile, ms	720	890
Files per 1000 ms	11.1	15.9
Files per 500 ms	5.6	8.0
1 / RTF	37.0	53.0
Billing / gRPC threads	6 - 9	8 - 10
1 / RTF / CPU cores	4.6	4.4

header)

Home
Getting Started
- Quickstart
- PyTorch
- ONNX
Benchmarks:
Licensing:
- License
- CE and EE Tiers
Services:
- Model Adaptation
- Adding New Languages
TTS:
- SSML
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly