-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update getusermedia.html #651
Conversation
Include referece to "It also allows the manipulation of audio output devices (speakers and headphones)." https://w3c.github.io/mediacapture-main/#privacy-and-security-considerations and `MediaDeviceKind` Enumeration description `audiooutput` Represents an audio output device; for example a pair of headphones. at "sources" definition https://w3c.github.io/mediacapture-main/#dfn-source.
“This work is dedicated to the public domain” |
While this spec does define enumeration of "audiooutput" output devices, they're not "sources", as defined in this terminology section. They're sinks, and defined in detail in https://w3c.github.io/mediacapture-output/ |
It should be made unequivocally clear in the specification that it is possible to capture audio output - not exclusively microphone input - even if the terminology used is "input" and "sink" - with an example of the canonical means to do so using the API's in the specification. The suggestion to use Whether the term of art used is "sink" or "source", "input" or "output", when the code at #650 (comment) is run at Firefox 70 and Nightly 73 the obbservable result is that only audio output is captured and recorded, not microphone. Is your position that capturing audio output using the above linked code is still considered to be capturing a "sink"? |
No. The model is "Browsers provide a media pipeline from sources to sinks". "A MediaStreamTrack object represents a media source". Looking at your code:
The The spec allows us to do this, saying "User Agents MAY allow users to use any media source, including pre-recorded media files." Importantly, it is not an
const devices = await navigator.mediaDevices.enumerateDevices();
const speaker = devices.find(({kind}) => kind == "audiooutput");
await element.setSinkId(speaker.deviceId); A physical device that is both a source and a sink, like a mic'd headset, shows as two devices: one This supports having devices that are only sinks, only sources, and both. Makes sense? |
The source in that case is audio output to headphones or speakers, correct? Your attempt to clarify the capabilities described in the specification does provide clarity to some extent. The point of this PR and linked issue is to make it unambiguously clear that it is possible to capture audio output, as a "source", to speakers or headphones using the methods defined in this specification, that is, precisely under which cases a "sink" becomes a "source", with accompanying canonical code example to do so added to the specification - or, make it clear that is not possible per this specification, so that users who have that requirement can abandon all hope of achieving that requirement using |
Per your previous comment
incorporates by reference any media source, and supercedes the restrictive language
which is what this PR initially changes to recognize the case of
where it cannot logically be the case that capture of
|
"That case" being the special "Monitor" audio input device in Firefox on Linux? Sure. However, this is achieved at the OS/device driver level—Specs generally can't, nor should they, attempt to restrict what an OS or user agent can provide as audio input in the form of virtual devices. Importantly, the existence of such a device, in no way changes the fact that no general mechanism for capturing system audio output exists in this spec.
I think the spec is clear: "Sinks" never become "sources" in this model. They're entirely separate devices from the viewpoint of this API. Any real-world connection between browser-provided devices in this API exists outside of this spec. E.g. there's no constraint or property for this.¹
This spec is written with extension specs in mind, so it rarely rules things out, only in. As such, if something isn't mentioned in this spec, it isn't covered by this spec. 1. The only relationship between exposed "devices" in this spec is |
Technically, that case is also possible at Chromium on Linux following the procedure described at guest271314/SpeechSynthesisRecorder#14 (comment) once. Can use Your reply appears to indicate that users in the field should abandon all hope of this specification making it clear that capturing audio output is possible per the specification even though the functionality is already possible in the field, given adequate experimentation and testing with an end to achieve that goal. Given the inclusion of the term Am banned from W3C indefinitely and from WICG for 1,000 years so am not able to propose a specification under the auspices of those entities to unequivocally capture audio output. This specification provides the infrastructure to do so, yet for the reasons you are relying on, are essentially stating to abandon all hope of this specification being amended to reflect the state of the art in the field, rather that this specification has implicit foundational restrictions forbidding such functionality (direct, unambiguous capture of audio output devices) which prevent extensibility with regard to capturing audio output; can you kindly state the above or similar language specifically? |
@jan-ivar Since the inference is that implementers and users of the resulting API of this specification should abandon all hope of explicitly capturing audio output under the language of this specification - even though such requirement is already possible at Firefox, Nightly and Chrome, Chromium at *nix - how do you suggest to proceed to realize that goal (either per specification or not) in the field? |
I think step 1 is providing a compelling use case. Frankly, capturing browser audio output at that late a stage only to bring it back into the browser, smells like a workaround for something that should be doable directly using components like For instance—did I see you mention web speech earlier?—I recently made some design recommendations in mozilla/standards-positions#170 (comment) to make I'd hope we could do better for any use case short of media device stack testing (which is what we use the Monitor device for in Firefox). If you do find compelling use cases, and can get vendor interest in solving those use cases, then I guess an extension spec would be the way to go. However, IMHO specs don't always drive buy-in, buy-in drives specs. |
Can compile a list of issues and bugs relevant to Web Speech API and Web Audio API working in conjunction. Those requests have been ongoing for several years now. In brief,
It depends on how thorough and lengthy you want the list to be to satisfy the as-yet undefined term "compelling". Notice the local direct text to
|
Essentially, it is nearly impossible to improve SST/TTS technology, for example, screen readers, accessibility for persons that might not have all of their faculties functioning adequately, etc. ( use see https://lists.w3.org/Archives/Public/public-speech-api/2017Jul/0004.html for some use cases), locally, without the ability to capture, compare, modify input and ouput. Currently TTS is not specified to be captured anywhere. SST is not specified to be exclusive to local processing. When last checked Chrome, Chromium send users' biometric data (their voice) to a remote service. The web platform should provide a means for users to analyze input and output TTS/SST locally. The code at the linked repositories re capturing |
I sympathize with the frustration of APIs that don't (yet) work well with tracks, but we seem to be in agreement this PR is not the way to fix it, so I'm closing this. |
@jan-ivar How do you suggest to proceed to specify that audio output can be captured? Can #629 be incorporated into https://github.com/w3c/mediacapture-main/issues/652, and #640? Or, is your closure of this PR effectively the closure of #629 and "Abandon all hope" of capturing audio output only under this specification? |
Include referece to "It also allows the manipulation of audio output devices (speakers and headphones)." https://w3c.github.io/mediacapture-main/#privacy-and-security-considerations and
MediaDeviceKind
Enumeration descriptionaudiooutput
Represents an audio output device; for example a pair of headphones.at "sources" definition https://w3c.github.io/mediacapture-main/#dfn-source.
Fixes #650.