Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with OpenAI API and Whisper Webservice. #13

Merged
merged 14 commits into from
Nov 26, 2023

Conversation

tigerpaws01
Copy link
Collaborator

@tigerpaws01 tigerpaws01 commented Nov 21, 2023

What This Branch Accomplished

  • The app is now compatible with both OpenAI APIs and Whisper Webservices. It can now work with official, non-official, and self-hosted servers as long as they abide by either standard.
  • The app now offers configuration for endpoint choices etc.
  • Fixed: New cursor position after text input. Previously, the cursor ends up at several characters ahead of the end of the commited text.

Future Directions

Several possible refactoring, optimization, and improvements.

  • HTTPS: Due to the need of testing self-hosted servers, clear-text transmission is explicitly allowed. This should not be the case in the near future.
  • Refactoring: Configuration UI class, DataStore class, etc.
  • Motormouth, Automatic Sentence Break, Silent Audio Clips, Traditional / Simplified Chinese, In-Memory Recording (first mentioned in Microphone & OpenAI Transcribe API #8)
  • App Localization
  • Manual Cursor Displacement (first mentioned in Customized Keyboard View #6)
  • UX: Configuration "set" buttons do not work intuitively (no way to tell whether changes are saved or not etc.).

Testing This Branch

OpenAI API

Should work as before.

  • Open app and make the following configurations:
    • Endpoint: https://api.openai.com/v1/audio/transcriptions
    • Language Code: zh or en
    • Request Style: OpenAI API
    • API Key: A valid, official OpenAI API Key
  • Remember to hit each "set" button for input fields!
  • Perform transcription normally.

Self-hosted Whisper Webservice

  • Find a hosted Whisper Webservice, or locally host one yourself.
    • CPU: docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
    • GPU: docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
    • Recommend ASR_MODEL are base, small, or medium for locally hosted servers.
    • At the time this PR is created, onerahmet/openai-whisper-asr-webservice has version v1.2.0.
    • After setting up a locally hosted server, it is recommended that you visit localhost:9000 and make one request using the web interface, as the container may not download necessary models before the first request is made and completed.
  • Open app and make the following configurations:
    • Endpoint: For locally hosted servers, configure app endpoint as http://<local-ip>:9000/asr. Obtain local IP via methods like ipconfig, which could look like 192.168.xxx.xxx. For hosted servers out there, use their URL endpoints.
    • Language Code: zh or en
    • Request Style: Whisper Webservice
    • API Key: Anything works. It won't be used.

Closes: #3

Ideally, both configuration handling and Datastore access should be refactored in the future.
…i key.

Since OpenAI API and Whisper Webservice have different request styles.
Copy link
Owner

@j3soon j3soon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR. Just confirmed this works on a Pixel_3a_API_34_extension_level_7_x86_64 simulator.

Most of the comments are notes for myself. Only the comment on the formatting of <T> may be useful to you.

@j3soon j3soon merged commit 4f96e4b into master Nov 26, 2023
@j3soon j3soon deleted the feature/04-self-hosted-server branch November 26, 2023 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a custom backend server for performing speech-to-text
2 participants