This project provides a FastAPI-based web service to convert audio files to text using the SpeechRecognition library.
- Convert various audio formats (MP3, WAV, etc.) to text.
- Support for multiple languages (currently set to Thai).
- Update speech recognition engine and language dynamically via API.
- Support for both file upload and base64 encoded audio input.
- Python 3.9+
- FastAPI
- SpeechRecognition
- PyDub
- MoviePy
-
Clone the repository:
git clone https://github.com/PongpreechaSuea/Speech2Text.git cd Speech2Text
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
Update the config.py file with your desired settings:
- Speech recognition engine
- Default language
- File paths for temporary storage
- Server host and port
-
Start the server: python app.py Copy
-
The API will be available at
http://localhost:3000
(or the port you configured) -
Use the following endpoints:
GET /
: Get API informationPUT /v1/api/using/engine
: Update speech-to-text enginePUT /v1/api/using/language
: Update speech-to-text languagePOST /v1/api/using/speech2text
: Convert speech to text (file upload)POST /v1/api/using_base64/speech2text_base64
: Convert speech to text (base64 encoded audio)
Once the server is running, you can access the API documentation at http://localhost:3000/docs
import requests
url = "http://localhost:3000/v1/api/using/speech2text"
files = {"file": open("audio.mp3", "rb")}
response = requests.post(url, files=files)
print(response.json())