Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" #21

wolenetz · 2015-10-13T23:04:21Z

Migrated from w3c bugzilla tracker. For history prior to migration, please see:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28379

It was previously assigned to Adrian Bateman. Editors will sync soon to determine who to take this bug.

wolenetz · 2015-10-15T21:48:38Z

It sounds like a set/get low latency API might solve this.

jdsmith3000 · 2015-10-19T23:24:01Z

This issue requests app control over the latency model, and that's clearly a new feature request. It might be possible to detect a live stream and set lower latency buffering, but it's not clear that would be the best thing to do on all live streams. An API that lets the app communicate intent is likely needed to resolve this adequately.

On V.Next already.

paulbrucecotton · 2015-11-17T16:08:00Z

The Media Task Force has agreed to designate this issue as V.Next:
https://lists.w3.org/Archives/Public/public-html-media/2015Nov/0027.html

greentorus · 2016-08-02T12:29:01Z

Feature proposals/"requests":

a) The low latency mode should also support video streams (e.g. H.264) with an initial single key frame followed by P-frames only.
Because having key-frames from time to time means significantly larger packets from time to time which take longer time to transmit and therefore arrive later at the client, which are no issue for buffered VOD situations, but cause stuttering in case of low-latency situations with close to zero buffering. Having only P-frames looses seeking, but low-latency use-cases like video chat or cloud gaming does not need that anyway.

b) The low-latency mode should work well with adding each new video frame individually to the source buffer.
Because adding multiple video frames together to the source buffer would introduce an unnecessary buffering and therefore increase the delay.

jyavenard · 2016-08-02T12:39:46Z

What you want to do, and the type of video data you use (a single starting keyframe, followed by P-frame), is currently fundamentally incompatible with the sourcebuffer architecture and spirit.

MSE requires regularly spaced keyframes to work, in particular in order to be able to evict data from the sourcebuffer.
The concept of dealing with individual frames would have to be removed, and allowing to evict data using a binary offset only.

An alternative would be to sourcebuffer::remove to take either a percentage, or a byte offset. seeking would have to be disallowed.
the live seekable attribute would always return an empty range.

greentorus · 2016-08-02T17:22:59Z

Yes, I see the point that it is for now fundamentally incompatible with the current MSE philosophy. But what I have in mind is: Low-latency MSE is a very interesting feature for many applications, and as this issue shows we are not the first ones being interested into that ;-) And single-keyframe video streams are one important aspect for good low-latency I think. So extending the MSE architecture to make that possible would be useful and worth it.
Maybe there are very simple approaches, simpler than percentages or byte offsets: For example (as mentioned in the mozilla board), low-latency use-cases are usually personalized and interactive and therefore don't need seeking anyway. So one simple solution could be that seeking and sourcebuffer::remove is officially simply not possible (returning an error) if the video has only one keyframe (so far).

andrewmd5 · 2018-09-26T07:18:06Z

Have there been any updates on this or a real live low latency mode for MSE vNext?

wolenetz · 2018-10-02T18:59:41Z

Not tangible, though I have discussed some approaches face-to-face with @jyavenard earlier this year.

wolenetz · 2018-10-02T22:22:18Z

@greentorus / #21 (comment): It sounds like you're requesting a different feature (though for live low latency as goal): seeking and sourcebuffer::remove (and background video suspension, and video track de/re-selection) would need to be constrained to not involve reconfiguring the decoder, because the implementation would be unable to pre-roll from an ancient (and likely no longer buffered) keyframe to satisfy those scenarios. Have you considered using the MediaStream API to satisfy those constraints without involving major change to MSE buffering/GC (nor HTMLMediaElement extension) behavior?

I propose we keep this issue (renamed and refocused) to be more like what #133 wants (an explicit MSE API to set/get the implementation's low vs "smoothing" latency model. Please file a separate issue if the "single keyframe plus lots of P frames" scenario is not a better fit for the MediaStream API than a vNext MSE API.

mmmmichael · 2018-10-03T15:03:54Z

@greentorus / #21 (comment): It sounds like you're requesting a different feature (though for live low latency as goal): seeking and sourcebuffer::remove (and background video suspension, and video track de/re-selection) would need to be constrained to not involve reconfiguring the decoder, because the implementation would be unable to pre-roll from an ancient (and likely no longer buffered) keyframe to satisfy those scenarios. Have you considered using the MediaStream API to satisfy those constraints without involving major change to MSE buffering/GC (nor HTMLMediaElement extension) behavior?

Yes, we are also considering the MediaStream/WebRTC API.

However, compared to MSE, MediaStream/WebRTC involves a lot of unnecessary high-level complexity and protocol restrictions for only displaying a live video stream.

Also, as a minor secondary reason, it seems the MSE video pipeline is better optimized for higher resolution in many browser implementations. For example, the MSE implementation in Firefox under Windows seems to use hardware decoding based on the Windows Media Foundation, but its MediaStream implementation seems to use software-only decoding.

I propose we keep this issue (renamed and refocused) to be more like what #133 wants (an explicit MSE API to set/get the implementation's low vs "smoothing" latency model. Please file a separate issue if the "single keyframe plus lots of P frames" scenario is not a better fit for the MediaStream API than a vNext MSE API.

We don't care about what the solution is, as long as it provides low-latency. So an explicit latency model sounds good.

However, real low-latency seems to be not possible without "single keyframe plus a lots of P frames".

For example, suppose the user has a 20 Mbps network connection. This supports a 2160p 60fps video stream, typically with 1 keyframe per second. Depending on the scenario, in many situations a keyframe often consumes up to 1/2 of that total bandwidth of even more (in this case around 10 Mb), while the P frames are very small (around 150 kb). This is no problem when using high-latency buffering. But it means that transferring a keyframe takes 1/2 second. This means the minimum possible latency is also 1/2 second. When only using P frames, the minimum possible latency is 1/2 of 1/60 = 1/120 second. Note that decreasing the number of keyframes per second decreases bandwidth, but doesn't decrease latency, which will stay at 1/2 second. The only exception seems to be: Not sending any keyframes anymore after a single initial one. Then, after some initialization hickup, the minimum latency is 1/120 latency.

This problem was the reason why we started experimenting with a "single keyframe plus a lots of P frames".

How could this problem be avoided in the "low vs 'smoothing' latency model" proposal when still having regular keyframes?

fernando-80 · 2019-05-15T20:31:39Z

How could this problem be avoided in the "low vs 'smoothing' latency model" proposal when still having regular keyframes?

That's a good point! This sounds to me like a transport layer issue, rather than the proposed "low latency model". From my understanding of the feature need and the discussion, low latency model is going to conceptually disable the MSE receiver jitter buffer, in a way that frames are rendered as they come. The handling of any artifacts, and data loss whose consequence is loss of playback"smoothness" is by definition abstracted to outside the MSE, perhaps to the application/system layer.
Getting back to the comparison between webRTC and MSE with "low latency model" as I see - webRTC is a complete (but not flexible) solution that transports video to one of the peers, which just needs a simple video html5 element to render it. The MSE with "low latency model" should be suitable for applications needing further refined control over the transport, smoothness, video formats, enhancements etc.

wolenetz self-assigned this Oct 15, 2015

wolenetz added needs follow-up needs implementation labels Oct 15, 2015

wolenetz added this to the V.Next milestone Oct 15, 2015

wolenetz added feature request and removed needs follow-up needs implementation labels Oct 15, 2015

wolenetz mentioned this issue Aug 1, 2016

Low-latency mode #133

Closed

jyavenard mentioned this issue Sep 12, 2016

Support playback through unbuffered ranges, and allow app to provide buffered gap tolerance #160

Open

wolenetz mentioned this issue Sep 19, 2017

(Deprecate sequence muxed support) Sequence AppendMode can produce surprisingly bad results with muxed media #186

Open

wolenetz mentioned this issue Oct 2, 2018

Enable buffering of WebCodecs Encoded Chunks for playback with MSE - aka "MSE for WebCodecs" or "MSE4WC" #184

Open

wolenetz changed the title ~~should buffering model be an option?~~ Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" Oct 2, 2018

wolenetz mentioned this issue May 16, 2019

Define a preemptive eviction MSE API for low-memory platforms #232

Open

chcunningham mentioned this issue May 18, 2019

Add hint attribute to HTMLMediaElement to configure rendering latency whatwg/html#4638

Open

wolenetz removed this from the VNext milestone Jun 9, 2020

mwatson2 added agenda Topic should be discussed in a group call and removed feature request agenda Topic should be discussed in a group call labels Sep 21, 2020

mwatson2 added this to the V2 milestone Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" #21

Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" #21

wolenetz commented Oct 13, 2015

wolenetz commented Oct 15, 2015

jdsmith3000 commented Oct 19, 2015

paulbrucecotton commented Nov 17, 2015

greentorus commented Aug 2, 2016

jyavenard commented Aug 2, 2016

greentorus commented Aug 2, 2016

andrewmd5 commented Sep 26, 2018 •

edited

Loading

wolenetz commented Oct 2, 2018

wolenetz commented Oct 2, 2018

mmmmichael commented Oct 3, 2018

fernando-80 commented May 15, 2019

Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" #21

Expose an explicit set/get low-latency versus "smoothing" MSE API rather than relying on implementation-specific, implicit bytestream hints that the stream might be "live" #21

Comments

wolenetz commented Oct 13, 2015

wolenetz commented Oct 15, 2015

jdsmith3000 commented Oct 19, 2015

paulbrucecotton commented Nov 17, 2015

greentorus commented Aug 2, 2016

jyavenard commented Aug 2, 2016

greentorus commented Aug 2, 2016

andrewmd5 commented Sep 26, 2018 • edited Loading

wolenetz commented Oct 2, 2018

wolenetz commented Oct 2, 2018

mmmmichael commented Oct 3, 2018

fernando-80 commented May 15, 2019

andrewmd5 commented Sep 26, 2018 •

edited

Loading