Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying and optimizing VideoFrame readback #157

Closed
chcunningham opened this issue Mar 26, 2021 · 10 comments
Closed

Simplifying and optimizing VideoFrame readback #157

chcunningham opened this issue Mar 26, 2021 · 10 comments
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).

Comments

@chcunningham
Copy link
Collaborator

(Full credit to @sandersdan for the following analysis and design. I'm just turning his doc into a bug.)

The VideoFrame.planes API allows reading back individual planes, but it has drawbacks:

  • The implementation may map and unmap the underlying frame for each plane, which can be slow when all planes are read.
  • It’s overly complicated to read all planes (partners have filed bugs showing its easy to get wrong)
  • Compared to its complexity, it doesn’t actually offer much control.

This issue proposes a simpler API for copying all planes at once.

Concepts

Region

A region is a {top, left, width, height} dictionary. VideoFrame should be updated to use this structure:

  • Add codedRegion, which is always {top: 0, left: 0, width: codedWidth, height: codedHeight}.
    • We could deprecate codedWidth and codedHeight, but this only seems to complicate constructing VideoFrames.
  • Add visibleRegion, replacing the crop parameters.

The default region is visibleRegion.

elementSize

The number of bytes in a sample. Each Plane has an elementSize attribute.

Stride

The number of bytes from the start of one row to the start of the next row, must be at least elementSize * region.width.

The default stride is exactly elementSize * region.width.

Layout

A layout is a list of {offset, stride} dictionaries, one for each plane. offset is the location in a buffer that a plane starts.

The default offset is tightly packed (each plane immediately follows the previous), and either all offsets should be provided or none should be.

API

// Returns the size in bytes of a buffer that can hold a copy of |region|.
VideoFrame.allocationSize({region});
Plane.allocationSize({region, stride});

// Copies all planes into an output buffer |dst|.
VideoFrame.readInto(dst, {region, layout});  // returns |layout|
Plane.readInto(dst, {region, stride});       // returns |stride|

// Returns a new frame containing a copy of a region. Holding the new frame for an
// extended period of time will not stall decoding.
VideoFrame.copy({region});

Changes to the planes API

  • Plane
    • Add codedRegion, visibleRegion.
    • Add elementSize.
    • Rename columns to codedWidth and rows to codedHeight.
    • Deprecate length. Apps should call allocationSize().
    • Deprecate stride. It’s not relevant anymore.
  • PlaneInit
    • Deprecate rows.
    • Add offset.

Examples

Copy visibleRegion to a new buffer (packed)

let buf = new ArrayBuffer(frame.allocationSize());
let layout = frame.readInto(buf);

// For 1080p I420, |layout| = [{offset: 0, stride: 1920},
//                             {offset: 2073600, stride: 960},
//                             {offset: 2592000, stride: 960}]

Copy codedRegion to a WASM frame pool

let region = frame.codedRegion;
let layout = wasmApp.allocateYUV(region.width, region.height);
frame.readInto(wasmApp.memory, {region, layout});
@chcunningham chcunningham changed the title Simplifying and optimizing VideoFrame Simplifying and optimizing VideoFrame read-back Mar 29, 2021
@chcunningham chcunningham changed the title Simplifying and optimizing VideoFrame read-back Simplifying and optimizing VideoFrame readback Mar 29, 2021
@chcunningham chcunningham added this to the 2021 Q1 milestone Mar 29, 2021
@dalecurtis
Copy link
Contributor

dalecurtis commented Apr 28, 2021

One of the key aspects of the current Planes.readInto() is that it's synchronous. Since frames are already being delivered asynchronously via decoding callbacks and the MediaStreamTrackProcessor, it would be unfortunate to add another hop in order to access information that may be readily available without an async hop.

I think this API needs to have a readIntoSync() method and potentially a readable attribute that indicates when readInto can be completed synchronously. E.g., the frame is a simple cpu memory backed frame or otherwise synchronously mappable. A texture backed frame would not be available for synchronous readback.

Without a readable attribute readIntoSync() would need to return null or otherwise fail, so I defer to the common spec idioms for whether we'd want an attribute versus just returning null.

@sandersdan
Copy link
Contributor

sandersdan commented Apr 28, 2021

Clarification: since the proposal @chcunningham posted, we determined that the default implementation of readInto() should be async so that we can do async texture readback in the implementation. Synchronous texture readback is strongly discouraged by Chrome's GPU experts.

That leaves a gap where some frames can be read synchronously but there is no way to express that without adding new methods. It's not yet clear how important the difference is, but the proposal to close the gap is as described by @dalecurtis above; add a fallible readIntoSync() and a readable attribute to signal when readIntoSync() is expected to work.

I'll also throw in my own bike shed, I have a preference to call the attribute readableSync as just readable is rather generic.

@dalecurtis
Copy link
Contributor

Additional bike shed colors:

  • canReadSync
  • mappable

@chcunningham
Copy link
Collaborator Author

chcunningham commented May 4, 2021

I opened some design discussion comments in the PR, but I'm resolving those in favor of focusing the discussion here where it's more visible/readable. Three points for discussion:

  1. We need asynchronous readback (as Dan notes above). However, we may also want synchronous readback where feasible (with a naming bikeshed above for the method and capability signaling).

    1. We are working to collect performance data on sync vs async to see if it really matters since one async method is much cleaner if there is no real performance loss.
  2. Both sync and async methods under consideration in point 1 live on VideoFrame. Do we still have a need for a readInto() method (currently sync) on planes? Planes.readInto() affords the power to read just one plane when desired. A bit niche, and can be solved via other mechanisms using videoFrame.readInto().

    1. example of another mechanism:

      let options = {layout: [{},                // y plane, default layout
                              {discard: true},   // u plane, discarded
                              {discard: true}]}; // v plane, discarded
      frame.readInto(buff, options);
      
  3. If the outcome of point 2 is to remove planes.readInto(), should we also remove videoFrame.planes entirely? The remaining Planes attributes were mainly provided to assist in calling planes.readInto().

@chcunningham chcunningham added the breaking Interface changes that would break current usage (producing errors or undesired behavior). label May 12, 2021
@chcunningham
Copy link
Collaborator Author

Triage note: marked 'breaking' as, as we are proposing a whole new readback API and discussing removal of old paths.

@chcunningham
Copy link
Collaborator Author

Another consideration: naming bikeshed method to get the pixel data

  • currently frame.readInto(dest)
  • encoded*Chunk interfaces now have chunk.copyTo(dest). Decided against copy*In*to(), as described in this thread. So maybe this should be simply frame.readTo(dest)
  • Open question whether read prefix is useful vs just calling this copy.. folks seem to lean keeping read as part of history (video frame "read back") and hint that this isn't always a simple copy from cpu memory (e.g. may need copy and unswizzle from gpu / texture etc). Personally, I like 'read' prefix ok.

In conclusion, I propose we call it videoFrame.readTo(dest).

@dalecurtis
Copy link
Contributor

I think the difference is subtle enough that if every other interface will have copyTo() instead, then calling this one copyTo() sounds better to me in terms of consistency.

@chcunningham
Copy link
Collaborator Author

I'm also fine with that.

@padenot
Copy link
Collaborator

padenot commented May 19, 2021

3\. If the outcome of point 2 is to remove planes.readInto(), should we also remove videoFrame.planes entirely? The remaining Planes attributes were mainly provided to assist in calling planes.readInto().

Access to each plane individually is also needed for WebGPU/WebGL interrop, although we don't really know the preferred API shape.

copyTo is fine to me if we want uniformity, yes.

@chcunningham
Copy link
Collaborator Author

Fixed by #208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Interface changes that would break current usage (producing errors or undesired behavior).
Projects
None yet
Development

No branches or pull requests

4 participants