Microphone & OpenAI Transcribe API #8

tigerpaws01 · 2023-11-17T12:06:10Z

What This Branch Did

Microphone Integration: The keyboard now takes input from microphone.
OpenAI Transcribe API: The keyboard now transcribes recorded audio clips.
Permissions & Settings:
- If microphone permissions are not granted upon starting recording, a message is shown, and the user is redirected to the app.
- In the app's MainActivity, one can set an API Key, and navigate to the app settings panel with a button (for manual permission configuration).
- Upon entering MainActivity, the user will be prompted the option to grant microphone permissions.
Exception Handling: Basic handling via message Toasts. Common exceptions will (very likely) not block or crash the keyboard.

Known Issues & Future Directions

Motormouth Countering: Prevent recording an overly long audio clip.
Are You Done?: Implement automatic sentence break detection.
Be My Spokesman?: A totally silent audio clip seems to produce weird sentences like 多謝您收睇時局新聞,再會! among many.
Not A Province: The whisper-1 model produces both simplified and traditional Chinese (Mandarin) characters.
Whisper To My Ear: Currently, audio clips are recorded and saved as files (with hardcoded names). Whether they can be stored in memory / streams, and whether this is a better option, is unknown.
Configurations: Several settings may have rom for improvement.
- Ktor Engine: OkHttp
- Output format: MPEG4
- Audio Encoder: AMR_NB

Testing This Branch

As previous branches, start an emulator.
Connect microphone to host audio input:
Configure an API Key.
Test the transcription utility.

Notes

It is recommended NOT to read all the references thoroughly. There are a lot. Reading solely paragraphs in interest would suffice.

Closes: #2

This is essential for requesting microphone usage. It is considered as a dangerous permission. Unlike normal permissions, `RECORD_AUDIO` has to be explicitly requested in an `ActivityCompat` as well. Ref: - MediaRecorder: https://developer.android.com/guide/topics/media/platform/mediarecorder - Permissions: https://developer.android.com/training/permissions/requesting#normal-dangerous

It seems that permission cannot be requested in a service, but only in an `ActivityCompat`. Therefore, the user will be redirected to either the `MainActivity` or the App Settings Panel. It is unsure whether this is the best practice, or a recommended one at all, but it works rather intuitively.

Requests permission via `ActivityCompat.requestPermissions`. A request code is given to distinguish between requests. It has no other meaning. An `onRequestPermissionsResult` is overriden to process request results. In this case, if the permission is not given, a toast message shows up. Refs: - https://developer.android.com/guide/topics/media/platform/mediarecorder - https://developer.android.com/training/permissions/requesting

…sion_required).

Refs: - https://developer.android.com/training/permissions/requesting - https://developer.android.com/reference/androidx/core/content/ContextCompat#checkSelfPermission(android.content.Context,%20java.lang.String)

…button. The button opens up the application settings panel for the user to manually configure microphone settings. Ref: https://stackoverflow.com/a/32822298

…cks.

This is planned to be refactored later into a specialized class, just like keyboard and job manager were. Ref: https://developer.android.com/guide/topics/media/platform/mediarecorder

Ref: https://developer.android.com/guide/topics/media/platform/mediarecorder

The default (no-argument) constructor of MediaRecorder is deprecated, but the one with a Context argument is added only in API Level 31. Refs: - https://developer.android.com/guide/topics/media/platform/mediarecorder - https://developer.android.com/reference/android/media/MediaRecorder#MediaRecorder()

…n is granted. The same code as in `MainActivity`.

…ty of Whisper To Input. Ref: - https://stackoverflow.com/questions/3606596/start-activity-from-service-in-android - https://developer.android.com/training/basics/firstapp/starting-activity?hl=zh-cn

- Checks permission upon microphone usage (as suggested in https://developer.android.com/training/permissions/requesting#principles). - If permission is not granted, opens up the `MainActivity`, where the permission can either be automatically or manually set. - Otherwise, starts the MediaRecorder.

Including recording cancellation & window events.

This is required to make OpenAI API Calls, as an exception encountered stated. `SecurityException: Permission denied (missing INTERNET permission?)`

…API. Followed the setup in the (un?)official OpenAI API for Kotlin: https://github.com/aallam/openai-kotlin/tree/main - `mavenCentral()` is omitted. - It's included in settings.gradle. Also refer to the following link, stating a change in the Gradle standards. - https://stackoverflow.com/questions/69163511/build-was-configured-to-prefer-settings-repositories-over-project-repositories-b - Setting up a Ktor engine: OkHttp is chosen due to information here - https://ktor.io/docs/http-client-engines.html - Version is from the latest entry found in here: https://mvnrepository.com/artifact/io.ktor/ktor-client-okhttp (under the Central tag) - Without setting up a client engine, exceptions will be thrown. See: ktorio/ktor#1070

…format. - Passes the recorded audio file name to `WhisperJobManager` so it can make trascription calls with that filename. - Renamed the variable to be consistent. - Changed audio output format to MPEG4 (.m4a) so it's supported by OpenAI (['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']). - Whether this is the best format remains to be checked. - Whether `AMR_NB` is the best audio encoder remains to be checked. - Whether there are other configs to improve the audio / performance remains to be checked. Refs: - https://developer.android.com/reference/android/media/MediaRecorder.AudioSource - https://developer.android.com/reference/android/media/MediaRecorder.AudioEncoder - https://developer.android.com/reference/android/media/MediaRecorder.OutputFormat

Refs: - https://github.com/aallam/openai-kotlin/tree/main - https://github.com/aallam/openai-kotlin/blob/main/guides/GettingStarted.md#audio - https://platform.openai.com/docs/guides/speech-to-text?lang=curl - https://platform.openai.com/docs/api-reference/audio/createTranscription

Refs: - https://platform.openai.com/docs/api-reference?lang=node - https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

Done via Android Studio (Ctrl + Alt + Shift + L).

This class is responsible for encapsulating the process of starting and stopping a MediaRecorder.

… list of required permissions. Kotlin does not have the `static` keyword. Instead, using `companion object`s is advised. Ref: https://stackoverflow.com/questions/40352684/what-is-the-equivalent-of-java-static-methods-in-kotlin

The code is almost the same as in `WhisperInputService`.

Code is almost the same as in `WhisperInputService`, but works on multiple permissions.

…lename control outside of it.

DataStore is a data storage solution. It provides two interfaces: - Preference: key-value pairs - Proto: protocol buffer based typed objects This will be used to store the API key (from user input). For simplicity, Preference Datastore is used. Ref: https://developer.android.com/topic/libraries/architecture/datastore#preferences-create

1. First, disable api key input, and set api key button. Apply a "loading" hint to the input field. 2. Retrieve the stored api key from the dataStore in the IO thread. - dataStore seems to be a (static-like?) variable accessible under a Context. This is defined with `val Context.dataStore: ...` using a "delegate". - dataStore.data is a `Flow<Preferences>`. - A `Flow` has emitters and collectors working asynchronously, decoupled from each other. - Emitters can emit data into the flow, while collectors can collect data from the flow. - dataStore uses this model to implement event- or data-driven programming. - `map` transforms `Flow<T1>` into `Flow<T2>`. Here, a flow of `Preferences` is transformed into a flow of the data stored in each `Prefereces`. - `first()` captures the first element emitted by the flow. - Using `last()` would capture the last element emitted by the flow. This blocks the coroutine scope. Therefore, it seems like DataStore somehow keeps emitting `Perferences` without termination. - Using `collect` specifies a function to process the collected data. This also seems to block. - `first()` would throw an error if the flow is empty, but it seems like DataStore always have data ready in the flow. - `first()` is a blocking call, thus run in the IO thread. - This variant of DataStore (Preferences DataStore) offers no data type safety. The `stringPreferencesKey` to tell DataStore that the expected stored data of key "api-key" is a String. 3. After the stored API Key data is retrieved, the input field is set depending on whether there exists a stored api key. - If null or empty, the hint displays "Enter API Key" message. - Otherwise, display the stored api key. - These situations have been tested, - DataStore can retrieve newer data, if the data is updated. - DataStore can retrieve older data multiple times (i.e., the stored data won't be eliminated or exhausted after reading). 4. Finally, re-enable the input field and button, and assign the set api key button onclick event (to avoid setting the api key before retrieval). Refs: (Recommended NOT to be thoroughly read. There are quite a lot.) - Using DataStore: https://developer.android.com/topic/libraries/architecture/datastore - Using Flows (generally, offical): https://developer.android.com/kotlin/flow - Using Flows (generally): https://www.baeldung.com/kotlin/flow-intro - Flow.first: https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines.flow/first.html - DataStore v.s. SharedPreferences: https://juejin.cn/post/7112486451626901540

Uses coroutine scope structures similar to reading. When finished, show a Toast message. Ref: https://developer.android.com/topic/libraries/architecture/datastore#preferences-write

Ref: https://developer.android.com/topic/libraries/architecture/datastore#preferences-read

…ll. Reformatted code. Transcription results can be null in case of cancellation and exception. As the callback function expects a nullable `String?`, it makes more sense to have the callback handle it being null, instead of preventing it from running at all.

Ref (Kotlin Pairs): https://www.educba.com/kotlin-pair/

…RECOGNITION. Ref: https://developer.android.com/reference/android/media/MediaRecorder.AudioSource#VOICE_RECOGNITION

…nto feature/03-mic-integration change sync with master

…Granted`.

j3soon

Thanks for opening this PR. Just confirmed this works on a Pixel_3a_API_34_extension_level_7_x86_64 simulator. @ijsun has also tested the exported APK on a physical android device as well.

I only have some minor comments as below.

android/app/src/main/java/com/example/whispertoinput/MainActivity.kt

android/app/src/main/java/com/example/whispertoinput/WhisperInputService.kt

android/app/build.gradle.kts

android/app/src/main/java/com/example/whispertoinput/WhisperJobManager.kt

j3soon

Looks good to me. I appreciate the well-organized and intuitive code. Thank you!

tigerpaws01 added 30 commits November 15, 2023 17:11

feat(values/strings): Added the extracted string resource (mic_permis…

0e94a4f

…sion_required).

feat(.MainActivity): Checks permission before requesting.

6f11044

Refs: - https://developer.android.com/training/permissions/requesting - https://developer.android.com/reference/androidx/core/content/ContextCompat#checkSelfPermission(android.content.Context,%20java.lang.String)

feat(.MainActivity): Added an onClick event for the grant permission …

5ae94e0

…button. The button opens up the application settings panel for the user to manually configure microphone settings. Ref: https://stackoverflow.com/a/32822298

feat(.WhisperInputService): Added named functions for UI event callba…

960a9ba

…cks.

fix(.WhisperInputService): missing parenthesis.

407d9c1

feat(.WhisperInputService): Added members & variables for recording.

bd713db

This is planned to be refactored later into a specialized class, just like keyboard and job manager were. Ref: https://developer.android.com/guide/topics/media/platform/mediarecorder

feat(.WhisperInputService): Assigned a filename for recorded audio file.

c21d886

Ref: https://developer.android.com/guide/topics/media/platform/mediarecorder

feat(.WhisperInputService): Added function to check whether permissio…

3b97794

…n is granted. The same code as in `MainActivity`.

feat(.WhisperInputService): Added a function to launch the MainActivi…

c220087

…ty of Whisper To Input. Ref: - https://stackoverflow.com/questions/3606596/start-activity-from-service-in-android - https://developer.android.com/training/basics/firstapp/starting-activity?hl=zh-cn

feat(.WhisperInputService): Added stopRecording() whereever necessary.

ba483c2

Including recording cancellation & window events.

feat(AndroidManifest): Added INTERNET permission.

382291b

This is required to make OpenAI API Calls, as an exception encountered stated. `SecurityException: Permission denied (missing INTERNET permission?)`

feat(.WhisperJobManager): Changed language to 'zh'.

b8dddda

Refs: - https://platform.openai.com/docs/api-reference?lang=node - https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

fix(): Automatic code reformat.

d79757f

Done via Android Studio (Ctrl + Alt + Shift + L).

feat(.RecorderManager): Added interface for RecorderManager.

c8592ee

This class is responsible for encapsulating the process of starting and stopping a MediaRecorder.

refactor(.RecorderManager): Implemented start, stop, and setup.

baf5b85

The code is almost the same as in `WhisperInputService`.

refactor(): Replace MediaRecorder usage with RecorderManager.

6e3ccd0

refactor(.RecorderManager): Implemented isAllPermissionsGranted.

1e53729

Code is almost the same as in `WhisperInputService`, but works on multiple permissions.

refactor(): Moved permission control inside RecorderManager, and fi…

0b6f132

…lename control outside of it.

feat(activity_main): Adjusted layout. Added API Key Input Field.

c8d9096

feat(activity_main): View ids renamed.

e7329e0

tigerpaws01 added 8 commits November 17, 2023 17:16

feat(.MainActivity): Implemented writing API Key to DataStore.

08573cb

Uses coroutine scope structures similar to reading. When finished, show a Toast message. Ref: https://developer.android.com/topic/libraries/architecture/datastore#preferences-write

feat(): Retrieve stored api key before making api requests.

6474ad7

Ref: https://developer.android.com/topic/libraries/architecture/datastore#preferences-read

feat(): Added exception callback to display error messages.

338dbdb

Ref (Kotlin Pairs): https://www.educba.com/kotlin-pair/

feat(.RecorderManager): Changed AudioSource config from MIC to VOICE_…

28fa83f

…RECOGNITION. Ref: https://developer.android.com/reference/android/media/MediaRecorder.AudioSource#VOICE_RECOGNITION

Merge branch 'master' of https://github.com/j3soon/whisper-to-input i…

f630d02

…nto feature/03-mic-integration change sync with master

fix(.RecorderManager): Actually checks permissions in `allPermissions…

24e7449

…Granted`.

tigerpaws01 requested a review from j3soon November 17, 2023 12:13

tigerpaws01 assigned j3soon Nov 17, 2023

j3soon requested changes Nov 18, 2023

View reviewed changes

refactor(): Renamed WhisperJobManager.

5819504

tigerpaws01 requested a review from j3soon November 19, 2023 03:52

j3soon approved these changes Nov 19, 2023

View reviewed changes

j3soon merged commit 379cb8c into master Nov 19, 2023

j3soon deleted the feature/03-mic-integration branch November 19, 2023 06:13

tigerpaws01 mentioned this pull request Nov 21, 2023

Compatibility with OpenAI API and Whisper Webservice. #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microphone & OpenAI Transcribe API #8

Microphone & OpenAI Transcribe API #8

tigerpaws01 commented Nov 17, 2023 •

edited

Loading

j3soon left a comment

j3soon left a comment

Microphone & OpenAI Transcribe API #8

Microphone & OpenAI Transcribe API #8

Conversation

tigerpaws01 commented Nov 17, 2023 • edited Loading

What This Branch Did

Known Issues & Future Directions

Testing This Branch

Notes

j3soon left a comment

Choose a reason for hiding this comment

j3soon left a comment

Choose a reason for hiding this comment

tigerpaws01 commented Nov 17, 2023 •

edited

Loading