Buy hardware | Install | Discord | Join Us

TT-NN is a Python & C++ Neural Network OP library.

API Reference | Model Demos

Grayskull (GS) Models

Model	Batch	End-to-end throughput [1]	Device throughput [2]	Target throughput
ResNet-50 (fps)	20	5,100	6,600	10,000
BERT-Large (sen/s)	12	370	406	410
Falcon7B-decode (t/s)	32	135	135	140
ViT (fps)	8	860	1570	2000
T5 small (sen/s)		140
Bloom (sen/s)		70
U-Net	coming soon

[1] - Observed from the host. Includes dispatch overhead and kernel execution time. For LLMs, token-to-token decode throughput is reported.

[2] - Ignoring host overhead. Kernel execution time only. For LLMs, token-to-token decode throughput is reported.

Wormhole (WH) Models

Note

All model demos in this table function on both N150 and N300 Wormhole cards, unless otherwise stated.

Furthermore, all performance numbers here are run or based off an N300 Wormhole card.

Model	Last verified release	Gen. Token [3]	Batch	Time to first token [4]	End-to-end throughput [1]	Device throughput [2]	Target throughput
Falcon7B	v0.51.0-rc24	129th	32	0.08 s	16.7 t/s/u - 534 t/s	19.6 t/s/u - 627 t/s	26
Mistral-7B	v0.51.0-rc28	129th	32	coming soon	9.9 t/s/u - 317 t/s	11.0 t/s/u - 352 t/s	25
Mamba-2.8B	v0.51.0-rc26	any	32	0.04 s	12.3 t/s/u - 394 t/s	17.1 t/s/u - 547 t/s	41
LLaMA-3.1-8B	v0.51.0-rc28	129th	1	coming soon	8.3 t/s/u - 8.3 t/s	9.7 t/s/u - 9.7 t/s	23
BERT-Large (sen/s) [5]		-	8	-	270	340	400
Stable Diffusion 1.4 512x512 (sec/img) [6]		-	1	-	6	5	3
ResNet-50 (fps)		-	16	-	4,100	5,010	7,000

[1] - Observed from the host. Includes dispatch overhead and kernel execution time. For LLMs, token-to-token decode throughput is reported.

[2] - Ignoring host overhead. Kernel execution time only. For LLMs, token-to-token decode throughput is reported.

[3] - Generating the i'th token in a sequence while the kv_cache is filled with i-1 rows.

[4] - Time to fill the kv_cache and generate the first output token (1st user).

[5] - This model demo does not work on N150. It does work on N300.

[6] - This model demo does not work on N300. It does work on N150.

TT-QuietBox & TT-LoudBox (2x4 mesh of WHs) Models

Model	Last verified release	Technique	Gen. Token [3]	Batch	Time to first token [4]	End-to-end throughput [1]	Device throughput [2]	Target throughput
Falcon7B	v0.51.0-rc36	Data Parallel	129th	256	0.11 s	13.4 t/s/u - 3430 t/s	19.6 t/s/u - 5018 t/s	26 t/s/u
LLaMA-2-70B	v0.51.0-rc36	Tensor Parallel	129th	32	coming soon	10.4 t/s/u - 333 t/s	16.6 t/s/u - 531 t/s	20 t/s/u
LLaMA-3.1-70B	v0.51.0-rc36	Tensor Parallel	129th	32	coming soon	10.4 t/s/u - 333 t/s	15.8 t/s/u - 506 t/s	20 t/s/u
Falcon40B	v0.51.0-rc35	Tensor Parallel	129th	32	coming soon	5.3 t/s/u - 168 t/s	12.2 t/s/u - 390 t/s	36 t/s/u
Mixtral7Bx8	v0.51.0-rc33	Tensor Parallel	129th	32	0.19 s	15.7 t/s/u - 502 t/s	21.4 t/s/u - 685 t/s	33 t/s/u
ResNet-50 (fps)		Data Parallel	-	128	-	31,250	40,080	56,000

Single Galaxy (8x4 mesh of WHs) Models

Model	Last verified release	Technique	Gen. Token [3]	Batch	Time to first token [4]	End-to-end throughput [1]	Device throughput [2]	Target throughput
Falcon7B	v0.51.0-rc30	Data Parallel	129th	1024	0.30 s	4.0 t/s/u - 4096 t/s	17.7 t/s/u - 18125 t/s	26 t/s/u

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

Using TT-NN ops and tensors

import ttnn
import torch

with ttnn.manage_device(device_id=0) as device:
   a = torch.ones((5, 7))
   b = torch.ones((1, 7))

   a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
   b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)

   output = a + b
   output = ttnn.to_torch(output)

print(output)

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Name		Name	Last commit message	Last commit date
Latest commit History 10,302 Commits
.github		.github
cmake		cmake
dockerfile		dockerfile
docs		docs
infra		infra
models		models
scripts		scripts
tech_reports/FlashAttention		tech_reports/FlashAttention
tests		tests
tt_metal		tt_metal
ttnn		ttnn
.clang-format		.clang-format
.clangd		.clangd
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Doxyfile		Doxyfile
ErrorMessageBestPractices.md		ErrorMessageBestPractices.md
INSTALLING.md		INSTALLING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
METALIUM_GUIDE.md		METALIUM_GUIDE.md
README.md		README.md
best_practices.md		best_practices.md
build_metal.sh		build_metal.sh
check_copyright_config.yaml		check_copyright_config.yaml
cloc.sh		cloc.sh
conftest.py		conftest.py
create_venv.sh		create_venv.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Buy hardware | Install | Discord | Join Us

API Reference | Model Demos

Grayskull (GS) Models

Wormhole (WH) Models

TT-QuietBox & TT-LoudBox (2x4 mesh of WHs) Models

Single Galaxy (8x4 mesh of WHs) Models

Model Updates

Using TT-NN ops and tensors

Programming Guide | API Reference

Getting started

Tech Reports

About

Releases 61

Packages

Contributors 130

Languages

License

tenstorrent/tt-metal

Folders and files

Latest commit

History

Repository files navigation

Buy hardware | Install | Discord | Join Us

API Reference | Model Demos

Grayskull (GS) Models

Wormhole (WH) Models

TT-QuietBox & TT-LoudBox (2x4 mesh of WHs) Models

Single Galaxy (8x4 mesh of WHs) Models

Model Updates

Using TT-NN ops and tensors

Programming Guide | API Reference

Getting started

Tech Reports

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 61

Packages 0

Contributors 130

Languages

Packages