By default OpenAI returns an mp3 while Kokoros seems to return a wav #63

rikhuijzer · 2025-02-22T18:07:32Z

From the OpenAI docs, OpenAI returns an mp3:

curl https://api.openai.com/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Today is a wonderful day to build something people love!",
    "voice": "alloy"
  }' \
  --output speech.mp3

But if I look at the file that I get from the Kokoros OpenAI compatible endpoint, it looks like it's a wav file?

$ file tests/tmp-openai-compatible.mp3
tests/tmp-openai-compatible.mp3: RIFF (little-endian) data, WAVE audio, IEEE Float, mono 24000 Hz

This is the request I made

$ export HOST="http://localhost:3000"

$ curl "$HOST/v1/audio/speech" \
  --verbose \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Today is a wonderful day to build something people love!",
    "voice": "am_adam"
  }' \
  --output speech.mp3

VLC is unable to play it and ffmpeg also crashes when trying to convert it (while it worked with OpenAI's mp3). The vscode audio-preview extension can play it without problems.

lucasjinreal · 2025-02-23T01:28:33Z

Hello, this format should be able to modify when user request, wav is the default format used in save, would u like add a PR to support mp3 return specific in openai returning?

7jrxt42BxFZo4iAnN4CX · 2025-03-17T12:06:25Z

#68

curl -X POST http://localhost:3000/v1/audio/speech \
                -H "Content-Type: application/json" \
                -d '{
              "model": "anything can go here",
              "input": "Hello, this is a test of the Kokoro TTS system!",
              "voice": "af_sky", 
              "response_format": "mp3"
            }' --output sky-says-hello.mp3

gnusupport · 2025-03-22T19:19:10Z

ffmpeg -i "${output_file}" "${m4a}" 2>&1

I'm normally converting the mp3 to m4a file so that it is ready to paste or share on chat systems and then people can easily listen to the voice message.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

By default OpenAI returns an mp3 while Kokoros seems to return a wav #63

By default OpenAI returns an mp3 while Kokoros seems to return a wav #63

rikhuijzer commented Feb 22, 2025

lucasjinreal commented Feb 23, 2025

7jrxt42BxFZo4iAnN4CX commented Mar 17, 2025 •

edited

Loading

gnusupport commented Mar 22, 2025

By default OpenAI returns an mp3 while Kokoros seems to return a wav #63

By default OpenAI returns an mp3 while Kokoros seems to return a wav #63

Comments

rikhuijzer commented Feb 22, 2025

lucasjinreal commented Feb 23, 2025

7jrxt42BxFZo4iAnN4CX commented Mar 17, 2025 • edited Loading

gnusupport commented Mar 22, 2025

7jrxt42BxFZo4iAnN4CX commented Mar 17, 2025 •

edited

Loading