Compatibility with OpenAI API and Whisper Webservice. #13

tigerpaws01 · 2023-11-21T14:18:14Z

What This Branch Accomplished

The app is now compatible with both OpenAI APIs and Whisper Webservices. It can now work with official, non-official, and self-hosted servers as long as they abide by either standard.
The app now offers configuration for endpoint choices etc.
Fixed: New cursor position after text input. Previously, the cursor ends up at several characters ahead of the end of the commited text.

Future Directions

Several possible refactoring, optimization, and improvements.

HTTPS: Due to the need of testing self-hosted servers, clear-text transmission is explicitly allowed. This should not be the case in the near future.
Refactoring: Configuration UI class, DataStore class, etc.
Motormouth, Automatic Sentence Break, Silent Audio Clips, Traditional / Simplified Chinese, In-Memory Recording (first mentioned in Microphone & OpenAI Transcribe API #8)
App Localization
Manual Cursor Displacement (first mentioned in Customized Keyboard View #6)
UX: Configuration "set" buttons do not work intuitively (no way to tell whether changes are saved or not etc.).

Testing This Branch

OpenAI API

Should work as before.

Open app and make the following configurations:
- Endpoint: https://api.openai.com/v1/audio/transcriptions
- Language Code: zh or en
- Request Style: OpenAI API
- API Key: A valid, official OpenAI API Key
Remember to hit each "set" button for input fields!
Perform transcription normally.

Self-hosted Whisper Webservice

Find a hosted Whisper Webservice, or locally host one yourself.
- CPU: docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
- GPU: docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
- Recommend ASR_MODEL are base, small, or medium for locally hosted servers.
- At the time this PR is created, onerahmet/openai-whisper-asr-webservice has version v1.2.0.
- After setting up a locally hosted server, it is recommended that you visit localhost:9000 and make one request using the web interface, as the container may not download necessary models before the first request is made and completed.
Open app and make the following configurations:
- Endpoint: For locally hosted servers, configure app endpoint as http://<local-ip>:9000/asr. Obtain local IP via methods like ipconfig, which could look like 192.168.xxx.xxx. For hosted servers out there, use their URL endpoints.
- Language Code: zh or en
- Request Style: Whisper Webservice
- API Key: Anything works. It won't be used.

Closes: #3

Refs: - https://blog.csdn.net/gengkui9897/article/details/82863966 - https://stackoverflow.com/questions/45940861/android-8-cleartext-http-traffic-not-permitted

Refs: - Studying of request bodies of the self-hosted server. - https://platform.openai.com/docs/guides/speech-to-text - https://zh.wikipedia.org/wiki/%E4%BA%92%E8%81%94%E7%BD%91%E5%AA%92%E4%BD%93%E7%B1%BB%E5%9E%8B (audio/mp4 not listed in the English page) - https://stackoverflow.com/questions/24279563/uploading-a-large-file-in-multipart-using-okhttp - https://blog.csdn.net/XuWei1213/article/details/79693340

… code configuration.

Ideally, both configuration handling and Datastore access should be refactored in the future.

Different form body and headers are required respectively. Ref: https://platform.openai.com/docs/guides/speech-to-text/quickstart

…i key. Since OpenAI API and Whisper Webservice have different request styles.

Data class ref: https://www.baeldung.com/kotlin/returning-multiple-values

Original argument is wrong. Turns out 1 is desired to place the cursor at the end of the commited text. Ref: https://developer.android.com/reference/android/view/inputmethod/InputConnection#commitText(java.lang.CharSequence,%20int)

j3soon

Thanks for opening this PR. Just confirmed this works on a Pixel_3a_API_34_extension_level_7_x86_64 simulator.

Most of the comments are notes for myself. Only the comment on the formatting of <T> may be useful to you.

android/app/src/main/res/values/strings.xml

android/app/src/main/java/com/example/whispertoinput/MainActivity.kt

android/app/src/main/java/com/example/whispertoinput/WhisperTranscriber.kt

android/app/src/main/java/com/example/whispertoinput/MainActivity.kt

android/app/src/main/java/com/example/whispertoinput/WhisperTranscriber.kt

tigerpaws01 added 13 commits November 21, 2023 16:38

feat(AndroidManifest): Allowed clear text traffic (HTTP).

f14dea8

Refs: - https://blog.csdn.net/gengkui9897/article/details/82863966 - https://stackoverflow.com/questions/45940861/android-8-cleartext-http-traffic-not-permitted

feat(activity_main.xml): Added fields in UI for endpoint and language…

16a0fbc

… code configuration.

feat(): Extracted string resources.

3ae62f4

feat(.MainActivity): Dealt with new configs.

d78887e

Ideally, both configuration handling and Datastore access should be refactored in the future.

feat(): Added compatability for OpenAI styled requests.

f54c199

Different form body and headers are required respectively. Ref: https://platform.openai.com/docs/guides/speech-to-text/quickstart

feat(activity_main.xml): Made the layout scrollable.

7d1d489

feat(activity_main.xml): Added configuration for request style and ap…

50ea307

…i key. Since OpenAI API and Whisper Webservice have different request styles.

feat(activity_main.xml): Extracted string resources.

13dd8f7

feat(.MainActivity): Made widgets capable of option configuration.

a94dd0c

feat(.WhisperTranscriber): Respected different request styles.

05e9008

Data class ref: https://www.baeldung.com/kotlin/returning-multiple-values

feat(.WhisperTranscriber): Error handling.

28255b8

fix(.WhisperInputService): commmitText argument.

93c0746

Original argument is wrong. Turns out 1 is desired to place the cursor at the end of the commited text. Ref: https://developer.android.com/reference/android/view/inputmethod/InputConnection#commitText(java.lang.CharSequence,%20int)

tigerpaws01 requested a review from j3soon November 21, 2023 14:23

tigerpaws01 assigned j3soon Nov 21, 2023

tigerpaws01 mentioned this pull request Nov 24, 2023

UI/UX Improvements & Optimizations #14

Merged

j3soon approved these changes Nov 26, 2023

View reviewed changes

Add comment and fix formatting

3e77120

j3soon merged commit 4f96e4b into master Nov 26, 2023

j3soon deleted the feature/04-self-hosted-server branch November 26, 2023 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility with OpenAI API and Whisper Webservice. #13

Compatibility with OpenAI API and Whisper Webservice. #13

tigerpaws01 commented Nov 21, 2023 •

edited by j3soon

Loading

j3soon left a comment

Compatibility with OpenAI API and Whisper Webservice. #13

Compatibility with OpenAI API and Whisper Webservice. #13

Conversation

tigerpaws01 commented Nov 21, 2023 • edited by j3soon Loading

What This Branch Accomplished

Future Directions

Testing This Branch

OpenAI API

Self-hosted Whisper Webservice

j3soon left a comment

Choose a reason for hiding this comment

tigerpaws01 commented Nov 21, 2023 •

edited by j3soon

Loading