-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Either fully support or remove audio capture entirely: "MAY" re audio capture is ambiguous #140
Comments
For example, a change to unambiguously support audio capture using "If the platform supports system-wide or specific application audio capture for which permissions have been granted to capture the user agent MUST capture audio output from that device when the |
How could the user agent possibly know that when |
Consider playing a video at The fact that audio is being output at the system before
which appears to be impossible to determine - how does the user agent "know" anything about the entire lifetime of the stream, particularly when the constraint Why would audio not be captured in the case of a local media player application playing a video with audio track of the resource being output, if in fact the "Who" is responsible for not capturing audio when the system supports such capture; the specification, the implementation? |
AFAIK, only system-wide audio capture has been implemented (in Chromium), but audio capture from applications isn't precluded. Since this isn't a widely implemented feature, getting consensus to make it mandatory could be hard. On the other hand, it is used, so getting consensus to remove it is also unlikely. So it's optional, in the absence of further implementation progress. |
Not by default at *nix. It is not possible to get system wide audio output at Chromium at *nix without setting Firefox does list Consider
unless the user physically selects The fix here is very simple. Either remove MAY language to not compel implementers to be in conformance with this specification re audio capture at all, or substitute MUST for MAY to compel implementers to allow selection of If you are aware of a canonical means to select system wide audio output at Chromium/Chrome other than the procedure described above, kindly share the complete procedure here. As it stands, after experimenting and testing various approaches, it appears to be impossible to capture audio output at Chromium using an official API. |
At *nix, by default , Chromium only provides access to
Right now the specification is not clear at all re audio capture, thus, there is no language in the specification which can point to at an implementer issue which conveys unambiguous language; the implementer, if the issue is not closed, could simply point to MAY language, if at all, and let the issue sit. How can a user cite the current iteration of this specification as authority for audio capture when MAY is used? |
This comment attests to system wide audio capture being possible at *indows OS by default w3c/mediacapture-main#694 (comment). *nix has the technical capability to capture system wide audio output, from PulseAudio/Examples - Arch Wiki https://wiki.archlinux.org/index.php/PulseAudio/Examples#ALSA_monitor_source
however, there is no official algorithm or constraint described demonstrating the canonical procedure to do so. Firefox 75 and Nightly 78 list |
@aboba Browsing Chromium issues in the wild for
|
The API is not ambiguous: it clearly states that applications may not rely on audio being returned. This means an app cannot force a user to share audio, which was intentional. The use case we had consensus to include was complementary audio for screen-sharing, at a user's discretion. Audio-only capture was deemed out of scope. |
The goal is not to force a user to capture audio. The goal for disambiguation is to allow a user to capture audio.
That use case is not possible at Firefox or Chromium without using The use case that you describe as being consensus is no possible right now. Consider
then
at a browser, which satisfies
and nullifies
However, given the use case consensus agreed upon, if file issues at Chromium and Firefox right now, the implementers could simply say "we don't want to capture audio, for no reason" and the specification cannot be cited as a primary source for the requirement to capture both vide and the audio output by |
If is not clear how
can possibly be determined? How can the user agent possibly know if "no audio will be shared for the lifetime of the stream"? What is the algorithm to determine that initial state and the real-time state of the If the application being captured is, for example, from Yet, given the specification and implementations at Firefox and Chromium at *nix it is impossible to capture that shared audio using |
Solution: Leave it up to the user not the user agent to decide whether or not
which cannot rationally be determined by any algorithm run by any user agent where A prototype example is already implemented at *indows: A simple checkbox at Individual browsers' behaviour is not infrequently mentioned at media capture main as an implementation to model other browser behviour on or not. In this case *indows just let's the user decide to capture ("Share") audio or not. Then "MAY" is in the hands of the user, not a user agent that cannot "know" anything about the lifetime of a |
The term of art "MAY" is precisely ambiguous, capable of more than one interpretation and application, as evidenced by *indows implementation, in coding parlance, "flaky". Not taking one side or the other. The codified rules of statutory construction can be used to determine what the words meant when decreed. There are other similar terms of art all too often used when a legislative body ran out of time, got lazy, or wanted the ability for the other branches to apply the rule ambiguously on purpose unbeknownst to the unintitated, typically whom the rule applies to, not for, in this case "MAY" means to implementer (consumer of the language) disregard capturing audio if you want. A user at *nix, reading
takes that literally. The user has the requirement to do just that, "a particular application", "the entire system audio or any combination thereof." whether There is a "MAY". That means there is potential for implementation, which is not strictly precluded. An ambiguous term. YMMV depending on whom is interpreting that word. Could land on either side, or both, depending on arbiter and an individuals' willingness or lack thereof to accept fuzzy logic outcomes for static input value. The system user agent (however that term is defined; browser; OS; machine) technically supports capturing any of the listed items. If file at implementers (browsers) they can cite "MAY", file here and here
Yes, exercising that discretion in the affirmative by filing this issue. How to achieve that use case at Firefox and Chromium at Linux? Or, in the case of Firefox and Chromium at Linux does "MAY" really mean "Not Implemented"? |
Revisited and tested the concepts at https://gist.github.com/guest271314/59406ad47a622d19b26f8a8c1e1bdfd5 several hundred times and have a working example (prototype) of starting and stopping capture of system audio output ("What U Hear") at Linux, tentatively At the current working version the captured audio is piped through If in fact system audio capture is not intended to be part and parcel of this specification, in spite of the language
and
what is the best way to proceed to get the process/algorithm specified? Or, the prevailing consensus is that application and system audio capture a matter of implementer discretion and interest - that is what "MAY" is intended to mean in specification, and "abandon all hope" of getting this formally specified in a form implementers are used to (e.g., W3C template, group, etc.;) and just publish the procedure and code at a GitHub repository - and close this issue? Thanks, /guest271314/ |
Closing based on #140 (comment) |
@jan-ivar I just saw this comment you made a few months back:
Can you elaborate on why that was the case, and whether or not audio capture of other applications is still out-of-scope? |
This entire paragraph
is ambiguous, capable of producing confusion for both implementers and fron-end users, see w3c/mediacapture-screen-share-extensions#12.
For *nix users with PulseAudio installed the platform supports system wide audio capture at device named "Monitor of ", see https://bugs.chromium.org/p/chromium/issues/detail?id=1032815.
Right now, *nix users with PulseAudio installed have the technical capability to capture system-wide audio, yet due to lack of clear and affirmative language in the specification implementers are obviously not encouraged or compelled to make sure that audio capture is implemented to conform with the specification: no mandatory language tells them to do so.
Either change the language to state the user agent MUST capture audio when the constraint is passed where supported at the architecture and platform; or, if audio capture is really not intended for this API, remove all audio language from the specification completely.
The text was updated successfully, but these errors were encountered: