Tokens/second #13

poglicouk · 2024-06-22T10:49:58Z

poglicouk
Jun 22, 2024

Hey, any advice on increasing the tokens per second when using this?

My typical rate is around 8t/s when using the webui with tts, however when using neuro i'm only getting around 1t/s, often dropping to 0.2t/s.

Any help / pointers would be appreciated!

kimjammer · 2024-06-23T16:52:18Z

kimjammer
Jun 23, 2024
Maintainer

It sounds like you might be hitting your GPU compute/VRAM limits. Unfortunately, there aren't any easy ways to fix this other than to just get more GPU or use a smaller model. I think the Neuro STT and TTS models will take up ~5-6GB of vram, so try to find a LLM that will fit in whatever you have left.
The token/second decrease is probably caused by needing to shuffle model layers back and forth between RAM and VRAM, or it falling back to CPU based inference.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokens/second #13

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Tokens/second #13

poglicouk Jun 22, 2024

Replies: 1 comment

kimjammer Jun 23, 2024 Maintainer

poglicouk
Jun 22, 2024

kimjammer
Jun 23, 2024
Maintainer