Skip to content

Transcribe audio and video files with speaker diarization and logically grouped timestamps

Notifications You must be signed in to change notification settings

mikeesto/gemini-transcribe

Repository files navigation

Gemini Transcribe

https://gemini-transcribe.fly.dev/

A web application for transcribing audio and video files using Google's Gemini Flash model.

Flash is a very interesting model to explore for audio transcription because:

  • We can prompt for specific transcription outputs, as it processes both audio and text inputs
  • It has built-in speaker diarization
  • It can attempt to detect not only words but also silence, sentiment, and sounds beyond human voices
  • It can translate the transcription, in particular to languages other than English

Google claims Flash's 1.5 word error rate is 9.6% in the FLEURS benchmark (September, 2024). This project is now using the experimental Flash 2.0, which does not appear to have been benchmarked yet.

About

Transcribe audio and video files with speaker diarization and logically grouped timestamps

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published