Skip to content
This repository has been archived by the owner on Jul 25, 2020. It is now read-only.

Adaptive streaming

Brion Vibber edited this page Jun 25, 2015 · 5 revisions

Adaptive streaming for OGVKit

You know how Netflix will bump the resolution of your TV shows down and up to match your network conditions? And how it mostly does that smoothly without annoying you as much as it used to when it would stop and go "BUFFERING" for 30 seconds? Yeah, that's nice. Let's try to do that too.

Performance budget

The key of any sort of adaptive streaming is measuring something to adapt to. We have two main resources that we need to budget in streaming video playback:

  1. Network bandwidth
  2. Processor time

Networking bandwidth is obviously super important: on a mobile device you're fairly likely to be on a cellular network with limited bandwidth. And however you're connected, varying network conditions may require bumping our bandwidth usage down (or allow bumping it back up) to maintain smooth streaming.

We also care about processor time: the codecs use the main ARM CPU cores, and we can only shove so many pixels per second down the pipe. On slower devices or videos with high frame rates, the highest resolutions might be too slow for the CPU to decode and play back in real time.

Both figures can be measured in terms of throughput (activity per time): it takes 300ms to download 2MB, or one frame is decoded in 8ms.

Where things become useful is when we compare those numbers to a 'budget' target; this can be expressed in terms of playback time: it takes 150ms to download the amount of data needed to play 1s of media, or it takes 8ms to decode a frame that is on screen for 33ms.

That gives us a ratio which tells us how well we can maintain this throughput, and if we can/should switch to another input stream.

Prep: multi-source data

Instead of passing a single URL, we want to pass a source set with multiple items. Basically, the set of available transcodes with their formats, bitrates, and resolutions.

For MediaWiki items this is available via data attributes on the source elements of a video element in content, or through the videoinfo API for a lookup by title.

For MPEG-DASH streaming it should be available from the manifest. (Research me)

Stage 1: pause-and-seek

The first stage for adding adapting streaming is to detect when we're exceeding our bandwidth or CPU budget so much we're pausing, and take advantage of those pauses to switch sources.

Primitive way:

  • pause
  • open a new input stream with the new source
  • open a new decoder instance with the new stream
  • save the current playback position
  • close old decoder
  • attach new decoder
  • seek to the saved position on the new decoder/stream
  • play

There will be a pause for buffering, and probably a jump back to the nearest keyframe.

Stage 2: switch at keyframe

Cleaner switching should be doable by swapping the decoder instance at a keyframe boundary. Ideally something like:

  • estimate when current buffered data will run out, and save that time
  • open a new input stream with the new source
  • open a new decoder instance with the new stream
  • seek to the saved position on the new decoder/stream, while playback continues on the first
  • save the actual keyframe position we seeked to on the new stream
  • wait for playback on the first decoder to reach the keyframe position
  • pause
  • close old decoder
  • attach new decoder
  • play

This should in many cases avoid a significant pause, but will have audio discontinuities and may not be totally smooth.

Stage 3: seamless audio

To be fancier, need to determine where we are in the audio queues nearly sample-accurate. May have to divide up audio buffers to avoid discontinuities, if the audio packets don't line up right.

Metadata? DASH?

Plans above are fairly basic and should work with existing transcodes produced on Wikimedia Commons, for both .ogv and .webm.

One obvious downside is that prepping a new stream switch requires a few seeks to get the headers and cue data filled out. These can be pre-fetched though.

Things might be able to be better/smoother/faster/more consistent with more metadata or specially-formatted streams. Research MPEG-DASH more...!