We illustrate HTTP Live Streaming using - hls.js.
Because HLS is based on HTTP, any ordinary web server can originate the stream.
HLS
is a streaming protocol developed by Apple to deliver media content over the internet using HTTP. It breaks the overall stream into a sequence of small HTTP-based file downloads, each download loading one short chunk of an overall potentially unbounded transport stream. It uses a (unique) "playlist" file that describes the "segments" files to be played. It uses a dedicated library Once these files are available for reading (in the browser), the library hls.js
will download the playlist and consequently segments to be played. It handles entirely the playback process. Elixir
will serve these files.
❗ This protocole has high latency: you will experience up to 20 seconds delay.
Server: An HLS stream originates from a server where (in on-demand streaming) the media file is stored, or where (in live streaming) the stream is created. Because HLS is based on HTTP, any ordinary web server can originate the stream.
Two main processes take place on the server:
- Encoding: The video data is reformatted so that any device can recognize and interpret the data. HLS must use H.264 or H.265 encoding.
- Segmenting: The video is divided up into segments a few seconds in length. The length of the segments can vary, although the default length is 6 seconds (until 2016 it was 10 seconds).
Our job here is to:
- capture the built-in webcam stream
- transform the images server-side. We ran the "hello world" of computer vision, namely face detection with the
Haar Cascade model
. This is powered by Evision (OpenCV). The model is present by default in the source code ofEvision
and has a loader for it. - send the transformed images back to the browser. They are played back by the Javacript library
hls.js
. It is based on the MediaSource API.
This relies heavily on FFmpeg to get frames from the input video source and build HLS segments and the playlist.
We propose two versions: a Livebook
and an Elixir/Plug
app.
Why? The Livebook is an easy and self contained app but the code is slighlty different from the web app from whom you may borrow the code.
❗ We need to have FFmpeg
installed but also fsevent
on MacOS or inotify
for Linux on which depends FlieSystem
.
❗ You might encounter sometimes the error "segmentation fault". No further explanation on this.
This gives a nice introduction on how to use the amazing Kino.JS.Live
module.
The default directory to which all the files are saved is your home directory. All the files will be saved in 3 folders: "./priv/input", "./priv/output" and "./priv/hls". You need to clean these folders.
To run the web app, you fork the repo:
open http://localhost:4000 && mix run --no-halt
The web app is minimal in the sense that it is is a Plug
app.
We run a tpc listener on port 4000 with Bandit
to communicate with the browser.
We use a raw WebSocket in the browser. The backend uses the library websock_adapter.
We use it to send binary data (an ArryBuffer
) from the browser to the Elixir backend. Check this blog.
We securized the WS connection with a CSRFToken.
We have a Plug
router that:
- serves the static files: the (unique) HTML page, the associated Javascript module and the HLS files,
- handles the
WebSocket
connection.
We run FFmpeg
as "kept alive" with ExCmd.Process
. This is crucial for the process.
We run a file watcher process with file_system
. It will detect when FFmpeg
will have built the HLS playlist and segments.
The process is able to handle "small" frames (640x480) at 30 fps. We use approx 300 I/O per second. The memory seems stable around 70MB during a 20 min test.
The browser will ask to control your webcam.
Once you click on "start", a WebSocket connection is instantiated with the backend. The browser will produce video chunks and send them to the server. The server will extract all the frames, save them into files, pass the file to the Evision process for face detection. The frames will be glued altogether (respecting the order) to produce video chunks ready for the browser to consume them. They is a file watching process to detect when the playlist is ready, and we pass this message to the browser. The HLS Javascript library will then ready the playlist and download on a regular basis the new segments it needs.
graph TD
F -->|mediaRecorder <br>send video chunk| A
A -- WebSocketHandler.init --> A
A[WebSocketHandler] -->|ffmpeg_capture: <br>make frames| B[WebSocketHandler]
B -->|send: process_queue| C[WebSocketHandler]
C -->|send: ffmpeg_rebuild<br>make segments & playlist| E[WebSocketHandler]
E -->|send: playlist_ready| F[Browser]
G[FileWatcher] -->|send: playlist_created| E[WebSocketHandler]
F -- hls.js <br> GET playlist / segments--> H[Elixir / Plug]
We defined four routes.
The root "/" sends the HTML text. It is parsed to add a CSRFToken.
The "/js/main.js" route sends the Javascript file when the browser calls it.
The ""/hls/:file" route sends the HLS segments when the browser calls them.
The "/socket" route upgrade to a WebSocket connection after the token validation (comparison between the token saved in the session and the one received in the query string).
This module serves the files. Since we want to pass a "CSRFToken" to the Javascript, we use EEX.eval_file
to parse the "index.html.heex" text.
def serve_homepage(conn, %{csrf_token: token}) do
html = EEx.eval_file("./priv/index.html.heex", csrf_token: token)
Plug.Conn.send_resp(conn, 200, html)
end
It essentially uses FileSystem
. We declare which folder we want to monitor. Whenever a change in the file system occurs in this folder, an event is emitted. We exploit it to send to the caller process (the WebSocketHandler) a message.
More precisely, we monitor the creation of the "playlist.m3u8" file.
graph LR
A[FileWatcher.init] --pid = FileSystem.start_link <br>dirs: priv/hls <br> --> D[FileSystem.subscribe pid]
E[handle_info] -- :file_event <br> .m3u8--> G[send: playlist_created<br> to WebSocketHandler]
It runs FFmpeg
as "keep alive" processes via ExCmd
. This is crucial as we pipe the stdin to FFmpeg.
When we receive video chunks in binary form via the WebSocket (approx 300KB, depending on the height/width of your video HTMLElement), we pass it to FFmpeg to extract all the frames, at 30 fps. Each frame is saved into a file (approx 10kB).
The other FFmpeg process is when we rebuild HLS video chunks from the transformed frames. FFmpeg will update a playlist and produce HLS segments. The FFmpeg process must not die in order not to append "#EXT-X-ENDLIST" at the end of the playlist.
The general form of a FFmpeg command is:
ffmpeg [GeneralOptions] [InputFileOptions] -i input [OutputFileOptions] output
We use Evision
to detect faces and draw rectangles around the region of interest found.
❗ When you run the code, you will see that the Haar Cascade models produces a lot of false positives.
A visualisation may perhaps help to understand.
graph LR
E[handle_in: binary_msg] --> F[FFmpeg process<br> save frames into files]
F -- craete files <br>priv/input--> G[send: ffmpeg_process]
H[handle_info: ffmpeg_process] --read files <br>priv/input--> J[Enqueue new files]
J --> K[send: process_queue]
L[handle_info: process_queue] --> M[Process files with Evision<br>
detect_and_draw_faces]
M -- create files<br>priv/output--> O[send: ffmpeg_rebuild <br> when queue empty]
P[handle_info: ffmpeg_rebuild when chunk_id == 5] -- files in<br>priv/output--> Q[Pass files to FFmpeg segment process]
Q--> R[files in priv/hls]
UU[FileWatcher] -- event<br>playlist created-->U[handle_info: playlist_created, init: true]
U --> V[send playlist_ready to browser]
When the browser initiates a WebSocket connection, the backend will respond with:
graph TD
A[WebSocketHandler.init] --> B[FileWatcher]
A --> C[FFmpegProcessor]
C --> D[FFmpeg Capture Process]
C --> E[FFmpeg Rebuild Process]
A --> F[load_haar_cascade]
D --> G[State]
E-->G
F-->G
For clarity, the Javascript module is separated into its own file. It is served by Elixir/Plug.
We load hls.js
from a CDN:
<script src="https://cdn.jsdelivr.net/npm/hls.js@latest" defer></script>
We need to use DOMContentLoaded
, so the "main.js" module starts with:
window.addEventListener('DOMContentLoaded', ()=> {...})
We use the WebSocket API:
let socket = new WebSocket(
`ws://localhost:4000/socket?csrf_token=${csrfToken}`
);
We use the WebRTC API method getUserMedia
to display the built-in into a <video>
element:
let stream = await navigator.mediaDevices.getUserMedia({
video: { width: 640, height: 480 },
audio: false,
});
video1.srcObject = stream;
We use the MediaRecorder
API to record the streams and push chunks every 1000ms to the connected server via the WebSocket:
let mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = async ({ data }) => {
if (!isReady) return;
if (data.size > 0) {
console.log(data.size);
const buffer = await data.arrayBuffer();
if (socket.readyState === WebSocket.OPEN) {
socket.send(buffer);
}
}
};
mediaRecorder.start(1_000);