-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microphone & OpenAI Transcribe API #8
Conversation
This is essential for requesting microphone usage. It is considered as a dangerous permission. Unlike normal permissions, `RECORD_AUDIO` has to be explicitly requested in an `ActivityCompat` as well. Ref: - MediaRecorder: https://developer.android.com/guide/topics/media/platform/mediarecorder - Permissions: https://developer.android.com/training/permissions/requesting#normal-dangerous
It seems that permission cannot be requested in a service, but only in an `ActivityCompat`. Therefore, the user will be redirected to either the `MainActivity` or the App Settings Panel. It is unsure whether this is the best practice, or a recommended one at all, but it works rather intuitively.
Requests permission via `ActivityCompat.requestPermissions`. A request code is given to distinguish between requests. It has no other meaning. An `onRequestPermissionsResult` is overriden to process request results. In this case, if the permission is not given, a toast message shows up. Refs: - https://developer.android.com/guide/topics/media/platform/mediarecorder - https://developer.android.com/training/permissions/requesting
…button. The button opens up the application settings panel for the user to manually configure microphone settings. Ref: https://stackoverflow.com/a/32822298
This is planned to be refactored later into a specialized class, just like keyboard and job manager were. Ref: https://developer.android.com/guide/topics/media/platform/mediarecorder
The default (no-argument) constructor of MediaRecorder is deprecated, but the one with a Context argument is added only in API Level 31. Refs: - https://developer.android.com/guide/topics/media/platform/mediarecorder - https://developer.android.com/reference/android/media/MediaRecorder#MediaRecorder()
…n is granted. The same code as in `MainActivity`.
- Checks permission upon microphone usage (as suggested in https://developer.android.com/training/permissions/requesting#principles). - If permission is not granted, opens up the `MainActivity`, where the permission can either be automatically or manually set. - Otherwise, starts the MediaRecorder.
Including recording cancellation & window events.
This is required to make OpenAI API Calls, as an exception encountered stated. `SecurityException: Permission denied (missing INTERNET permission?)`
…API. Followed the setup in the (un?)official OpenAI API for Kotlin: https://github.com/aallam/openai-kotlin/tree/main - `mavenCentral()` is omitted. - It's included in settings.gradle. Also refer to the following link, stating a change in the Gradle standards. - https://stackoverflow.com/questions/69163511/build-was-configured-to-prefer-settings-repositories-over-project-repositories-b - Setting up a Ktor engine: OkHttp is chosen due to information here - https://ktor.io/docs/http-client-engines.html - Version is from the latest entry found in here: https://mvnrepository.com/artifact/io.ktor/ktor-client-okhttp (under the Central tag) - Without setting up a client engine, exceptions will be thrown. See: ktorio/ktor#1070
…format. - Passes the recorded audio file name to `WhisperJobManager` so it can make trascription calls with that filename. - Renamed the variable to be consistent. - Changed audio output format to MPEG4 (.m4a) so it's supported by OpenAI (['flac', 'm4a', 'mp3', 'mp4', 'mpeg', 'mpga', 'oga', 'ogg', 'wav', 'webm']). - Whether this is the best format remains to be checked. - Whether `AMR_NB` is the best audio encoder remains to be checked. - Whether there are other configs to improve the audio / performance remains to be checked. Refs: - https://developer.android.com/reference/android/media/MediaRecorder.AudioSource - https://developer.android.com/reference/android/media/MediaRecorder.AudioEncoder - https://developer.android.com/reference/android/media/MediaRecorder.OutputFormat
Done via Android Studio (Ctrl + Alt + Shift + L).
This class is responsible for encapsulating the process of starting and stopping a MediaRecorder.
… list of required permissions. Kotlin does not have the `static` keyword. Instead, using `companion object`s is advised. Ref: https://stackoverflow.com/questions/40352684/what-is-the-equivalent-of-java-static-methods-in-kotlin
The code is almost the same as in `WhisperInputService`.
Code is almost the same as in `WhisperInputService`, but works on multiple permissions.
…lename control outside of it.
DataStore is a data storage solution. It provides two interfaces: - Preference: key-value pairs - Proto: protocol buffer based typed objects This will be used to store the API key (from user input). For simplicity, Preference Datastore is used. Ref: https://developer.android.com/topic/libraries/architecture/datastore#preferences-create
1. First, disable api key input, and set api key button. Apply a "loading" hint to the input field. 2. Retrieve the stored api key from the dataStore in the IO thread. - dataStore seems to be a (static-like?) variable accessible under a Context. This is defined with `val Context.dataStore: ...` using a "delegate". - dataStore.data is a `Flow<Preferences>`. - A `Flow` has emitters and collectors working asynchronously, decoupled from each other. - Emitters can emit data into the flow, while collectors can collect data from the flow. - dataStore uses this model to implement event- or data-driven programming. - `map` transforms `Flow<T1>` into `Flow<T2>`. Here, a flow of `Preferences` is transformed into a flow of the data stored in each `Prefereces`. - `first()` captures the first element emitted by the flow. - Using `last()` would capture the last element emitted by the flow. This blocks the coroutine scope. Therefore, it seems like DataStore somehow keeps emitting `Perferences` without termination. - Using `collect` specifies a function to process the collected data. This also seems to block. - `first()` would throw an error if the flow is empty, but it seems like DataStore always have data ready in the flow. - `first()` is a blocking call, thus run in the IO thread. - This variant of DataStore (Preferences DataStore) offers no data type safety. The `stringPreferencesKey` to tell DataStore that the expected stored data of key "api-key" is a String. 3. After the stored API Key data is retrieved, the input field is set depending on whether there exists a stored api key. - If null or empty, the hint displays "Enter API Key" message. - Otherwise, display the stored api key. - These situations have been tested, - DataStore can retrieve newer data, if the data is updated. - DataStore can retrieve older data multiple times (i.e., the stored data won't be eliminated or exhausted after reading). 4. Finally, re-enable the input field and button, and assign the set api key button onclick event (to avoid setting the api key before retrieval). Refs: (Recommended NOT to be thoroughly read. There are quite a lot.) - Using DataStore: https://developer.android.com/topic/libraries/architecture/datastore - Using Flows (generally, offical): https://developer.android.com/kotlin/flow - Using Flows (generally): https://www.baeldung.com/kotlin/flow-intro - Flow.first: https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines.flow/first.html - DataStore v.s. SharedPreferences: https://juejin.cn/post/7112486451626901540
Uses coroutine scope structures similar to reading. When finished, show a Toast message. Ref: https://developer.android.com/topic/libraries/architecture/datastore#preferences-write
…ll. Reformatted code. Transcription results can be null in case of cancellation and exception. As the callback function expects a nullable `String?`, it makes more sense to have the callback handle it being null, instead of preventing it from running at all.
…nto feature/03-mic-integration change sync with master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this PR. Just confirmed this works on a Pixel_3a_API_34_extension_level_7_x86_64
simulator. @ijsun has also tested the exported APK on a physical android device as well.
I only have some minor comments as below.
android/app/src/main/java/com/example/whispertoinput/WhisperInputService.kt
Show resolved
Hide resolved
android/app/src/main/java/com/example/whispertoinput/WhisperJobManager.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I appreciate the well-organized and intuitive code. Thank you!
What This Branch Did
MainActivity
, one can set an API Key, and navigate to the app settings panel with a button (for manual permission configuration).MainActivity
, the user will be prompted the option to grant microphone permissions.Toast
s. Common exceptions will (very likely) not block or crash the keyboard.Known Issues & Future Directions
多謝您收睇時局新聞,再會!
among many.whisper-1
model produces both simplified and traditional Chinese (Mandarin) characters.OkHttp
MPEG4
AMR_NB
Testing This Branch
Notes
It is recommended NOT to read all the references thoroughly. There are a lot. Reading solely paragraphs in interest would suffice.
Closes: #2