Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSE-for-WebCodecs draft feature specification #302

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

wolenetz
Copy link
Member

@wolenetz wolenetz commented Oct 1, 2021

* Updates MediaSource and SourceBuffer sections' IDL to reference the
  new methods and types.
* Updates SOTD substantives list format and content to include this
  feature.

See w3c#184 for the spec issue tracking this feature's addition.

(Remove this set of lines during eventual squash and merge:)
* Adds placeholder notes including references to the WebCodecs spec to
  let the updated IDLs' references to definitions from that spec succeed.
Upcoming commits will remove the placeholders and include exposition on
the behavior of the updated IDL, possibly also refactoring reused steps
into subalgorithms.
Needs verification later of correct linkage to WebCodecs dfns
for "valid AudioDecoderConfig" and "valid VideoDecoderConfig" once
WebRef updates. (See their associated exports added in
w3c/webcodecs#372).
Adds TODO items to track more work necessary in this draft.
Fixes the addSourceBuffer parameter type to be TypeOrConfig.
References a not-yet-defined internal slot that tracks the pending
append chunks promise, and adds steps to reject that promise on abort()
or removeSourceBuffer(), if necessary.
Updates various places that describe BSF to also mention the possibility
that a SourceBuffer may instead be configured to expect WebCodecs
encoded chunks appends.
Adds a note to isTypeSupported() for how to use the WebCodecs
isConfigSupported() methods to proactively check support for buffering
encoded chunks into a SourceBuffer.
Fixes reference to the sourceBuffer in removeSourceBuffer()'s promise
abort step.
Updates the coded frame eviction algorithm's definition of |new data| to
possibly be EncodedChunks being appended.
Adds top-level steps for appendEncodedChunks().
Updates abort() and removeSourceBuffer() handling to conditionally
reject promise vs abort / updateend firing.
Updates reset parser state algorithm to clear any EncodedChunks from the
input configs and chunks slot, but leave any WebCodecs config there.
References w3c#301 for discussion from each of appendBuffer() and
appendEncodedChunks().
Defined [[pending append chunks promise]].

Defined [[input webcodecs configs and chunks]]

Enforced all EncodedChunks have non-null duration during synchronous
appendEncodedChunks(), after the step that runs the Prepare Append
algorithm.

Updated init-segment-received to mention [=chunks append=] algorithm can
also call it.

Updated coded-frame-processing to mention [=chunks append=] algorithm
can also call it.

Defined [=chunks append=] algorithm,

Defined generation of DTS for frames from EncodedChunks (needed by coded
frame processing algorithm) to be simply 0.  Noted this assumes no
discontinuity within or across appendEncodedChunks() calls, except as
may be explicitly signalled by the app calling abort() or changeType().

Disambiguated (slightly) "SourceBuffer track configuration" versus
"SourceBufferConfig".

Updated changeType.

Some notes on the current (M95) Chrome prototype implementation of this
feature; all are expected to be fixed very soon:
* crbug.com/1255048: it doesn't support changeType(SourceBufferConfig))
* crbug.com/1255050: it doesn't support h.264 EncodedVideoChunks' append
  support, and it assumes encoded chunks' DTS==PTS instead of using just
  0 for all DTS of EncodedChunks' frames sent to the coded frame
  processing algorithm: this may require further refinement as noted in
  the spec as well if it is not working as expected once the prototype
  is updated.
* crbug.com/1255052: it still hardcodes EncodedAudioChunk durations to
  be 22ms coded frames due to duration field originally not in
  EncodedAudioChunk specification, and it checks for EncodedVideoChunk
  duration in the middle of the Prepare Append steps instead of after
  that subalgorithm.
@wolenetz
Copy link
Member Author

wolenetz commented Oct 1, 2021

@mwatson2, please review this full feature. There are probably some edge cases in pre-existing text that I missed that might still need some updating. Hopefully, the sequence of changes I put together in this PR enable easier piecemeal review.

@tidoust, please review for respec and standards-track aspects.

Thank you both!

appended in the same {{SourceBuffer/appendEncodedChunks()}} call: the application should instead wait for the
{{SourceBuffer}} to no longer be {{SourceBuffer/updating}}, and then call {{SourceBuffer/abort()}} or
{{SourceBuffer/changeType()}} before appending {{EncodedChunks}} that are discontinuous with those in the
previous {{SourceBuffer/appendEncodedChunks()}} call.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why EncodedChunks cannot be appended in any order like media segments can ?

Does the above mean that to append, say, overlapping media (replacing some existing frames) we need to call changeType even if the type has not changed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is not ergonomic/compatible with previous behavior, and should be improved.

If I recall from when I wrote this draft and the prototype in Chromium, sequences of out-of-order presentation time versus the decode sequence must still be supported. The issue, I think, arises from the lack of explicit "decode timestamp" in EncodedChunks, and the prototype/draft API instead assumes the order of chunk appends describes their decode sequence. There is no ability of the coded frame processing algorithm to detect discontinuities in EncodedChunk decode time, unlike bytestream-formatted media. Without some ability to determine actual discontinuity implicitly, explicit discontinuity signaling is needed (abort() or changeType()). Even so, changeType isn't required if type isn't changing, but there is indeed a discontinuity. Instead, abort() signals the discontinuity explicitly without changing type.

This could be improved/simplified for "segments" mode: keyframes themselves could be used as the implicit signal of potential discontinuity, especially since there is allowance for only 1 audio config or only 1 video config in a SourceBuffer configured to allow EncodedChunk appends (and therefore there is no need to have such per-keyframe discontinuity also impacting the processing of coded frames for any other track buffers in that SourceBuffer.)

I might be missing some other detail in my recollection of this old PR and prototype though. For example, "sequence" mode appends that are not discontinuous at keyframes, but might appear to be if they are from SAP-Type-2 encode sequences, could cause interop issues if keyframes are used as implicit potential discontinuities. Conversely, without implicit discontinuity signaling, lack of app explicitly signaling could impair reaching the intent of "sequence" mode buffering.

One solution might be to use keyframes as implicit potential discontinuity signaling, but disallow use of appending chunks in "sequence" mode.

// that constraint: the implementation MUST reject a sequence containing both
// audio and video chunks.
typedef (sequence< (EncodedAudioChunk or EncodedVideoChunk) >
or EncodedAudioChunk or EncodedVideoChunk) EncodedChunks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we write sequence<EncodedAudioChunk> or sequence<EncodedVideoChunk> or EncodedAudioChunk or EncodedVideoChunk ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I tried that originally in the prototype, but the IDL bindings generator failed. I fuzzily recall maybe it was due to an identifier length limit on the generated type name. Worth reconsidering.

Copy link
Member Author

@wolenetz wolenetz Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like leaving this with just a fuzzy recollection, so I checked the Chromium prototype's commit description for the change I landed that added this IDL:

Caveat: "sequence<A> or sequence<V>" cannot be disambiguated by the
bindings, so sequence<A or V> is used in this change. Regardless, the
eventual implementation would need to validate that all in the
sequence are either A or all are V (along with the usual validation
that appended chunks or frames also appear to use the most recent
SourceBufferConfig).

Likewise, my fuzzy recollection of identifier length was fuzzy. In that same change I did run into identifier length limits, but solved that by adding the long identifier into an exception list for that check in the generator:

Caveat: The bindings generator requires help when generated union type
identifiers are too long for some platforms. This change adds a
seventh case to the existing hard-coded lists of names that need
shortening with the generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants