Skip to content

Commit

Permalink
Driver/Input: Migrate audio backend to Symphonia (serenity-rs#89)
Browse files Browse the repository at this point in the history
This extensive PR rewrites the internal mixing logic of the driver to use symphonia for parsing and decoding audio data, and rubato to resample audio. Existing logic to decode DCA and Opus formats/data have been reworked as plugins for symphonia. The main benefit is that we no longer need to keep yt-dlp and ffmpeg processes alive, saving a lot of memory and CPU: all decoding can be done in Rust! In exchange, we now need to do a lot of the HTTP handling and resumption ourselves, but this is still a huge net positive.

`Input`s have been completely reworked such that all default (non-cached) sources are lazy by default, and are no longer covered by a special-case `Restartable`. These now span a gamut from a `Compose` (lazy), to a live source, to a fully `Parsed` source. As mixing is still sync, this includes adapters for `AsyncRead`/`AsyncSeek`, and HTTP streams.

`Track`s have been reworked so that they only contain initialisation state for each track. `TrackHandles` are only created once a `Track`/`Input` has been handed over to the driver, replacing `create_player` and related functions. `TrackHandle::action` now acts on a `View` of (im)mutable state, and can request seeks/readying via `Action`.

Per-track event handling has also been improved -- we can now determine and propagate the reason behind individual track errors due to the new backend. Some `TrackHandle` commands (seek etc.) benefit from this, and now use internal callbacks to signal completion.

Due to associated PRs on felixmcfelix/songbird from avid testers, this includes general clippy tweaks, API additions, and other repo-wide cleanup. Thanks go out to the below co-authors.

Co-authored-by: Gnome! <45660393+GnomedDev@users.noreply.github.com>
Co-authored-by: Alakh <36898190+alakhpc@users.noreply.github.com>
  • Loading branch information
3 people committed Nov 19, 2023
1 parent 6c6ffa7 commit 8cc7a22
Show file tree
Hide file tree
Showing 136 changed files with 9,693 additions and 4,823 deletions.
15 changes: 12 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ jobs:
- Windows
- driver only
- gateway only
- legacy tokio

include:
- name: beta
Expand Down Expand Up @@ -75,6 +74,16 @@ jobs:
sudo apt-get update
sudo apt-get install -y libopus-dev
- name: Install yt-dlp (Unix)
if: runner.os != 'Windows'
run: |
sudo wget https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -O /usr/local/bin/yt-dlp
sudo chmod a+rx /usr/local/bin/yt-dlp
- name: Install yt-dlp (Windows)
if: runner.os == 'Windows'
run: choco install yt-dlp

- name: Setup cache
if: runner.os != 'macOS'
uses: actions/cache@v2
Expand Down Expand Up @@ -175,9 +184,9 @@ jobs:
- name: 'Build serenity/voice_receive'
working-directory: examples
run: cargo build -p voice_receive
- name: 'Build serenity/voice_storage'
- name: 'Build serenity/voice_cached_audio'
working-directory: examples
run: cargo build -p voice_storage
run: cargo build -p voice_cached_audio
- name: 'Build twilight'
working-directory: examples
run: cargo build -p twilight
17 changes: 9 additions & 8 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,14 @@ Songbird's **driver** is a mixed sync/async system for running voice connections
Audio processing remains synchronous for the following reasons:
* Encryption, encoding, and mixing are compute bound tasks which cannot be subdivided cleanly by the Tokio executor. Having these block the scheduler's finite thread count has a significant impact on servicing other tasks.
* `Read` and `Seek` are considerably more user-friendly to use, implement, and integrate than `AsyncRead`, `AsyncBufRead`, and `AsyncSeek`.
* Symphonia implements all of its functionality based on synchronous I/O.

## Tasks
Songbird subdivides voice connection handling into several long- and short-lived tasks.

* **Core**: Handles and directs commands received from the driver. Responsible for connection/reconnection, and creates network tasks.
* **Mixer**: Combines audio sources together, Opus encodes the result, and encrypts the built packets every 20ms. Responsible for handling track commands/state. ***Synchronous***.
* **Thread Pool**: A dynamically sized thread-pool for I/O tasks. Creates lazy tracks using `Compose` if sync creation is needed, otherwise spawns a tokio task. Seek operations always go to the thread pool. ***Synchronous***.
* **Disposer**: Used by mixer thread to dispose of data with potentially long/blocking `Drop` implementations (i.e., audio sources). ***Synchronous***.
* **Events**: Stores and runs event handlers, tracks event timing, and handles
* **Websocket**: *Network task.* Sends speaking status updates and keepalives to Discord, and receives client (dis)connect events.
Expand All @@ -52,23 +54,22 @@ src/driver/*
## Audio handling

### Input
Inputs are raw audio sources: composed of a `Reader` (which can be `Read`-only or `Read + Seek`), a framing mechanism, and a codec.
Several wrappers exist to add `Seek` capabilities to one-way streams via storage or explicitly recreating the struct.
Inputs are audio sources supporting lazy initialisation, being either:
* **lazy inputs**—a trait object which allows an instructions to create an audio source to be cheaply stored. This will be initialised when needed either synchronously or asynchronously based on what which methods the trait object supports.
* **live inputs**—a usable audio object implementing `MediaSource: Read + Seek`. `Seek` support may be dummied in, as seek use and support is gated by `MediaSource`. These can be passed in at various stages of processing by symphonia.

Framing is not always needed (`Raw`), but makes it possible to consume the correct number of bytes needed to decode one audio packet (and/or simplify skipping through the stream).
Currently, Opus and raw (`i16`/`f32`) audio sources are supported, though only the DCA framing for Opus is implemented.
At present, the use of the FFmpeg executable allows us to receive raw input, but at heavy memory cost.
Further implementations are possible in the present framework (e.g., WebM/MKV and Ogg containers, MP3 and linked FFI FFmpeg as codecs).
Several wrappers exist to add `Seek` capabilities to one-way streams via storage or explicitly recreating the struct, `AsyncRead` adapters, and raw audio input adapters.

Internally, the mixer uses floating-point audio to prevent clipping and allow more granular volume control.
If a source is known to use the Opus codec (and is the only source), then it can bypass mixing altogether.
Symphonia is used to demux and decode input files in a variety of formats into this floating-point buffer: songbird supports all codecs and containers which are part of the symphonia project, while adding support for Opus decoding and DCA1 container files.
If a source uses the Opus codec (and is the only source), then it can bypass mixing and re-encoding altogether, saving CPU cycles per server.

```
src/input/*
```

### Tracks
Tracks hold additional state which is expected to change over the lifetime of a track: position, play state, and modifiers like volume.
Tracks hold additional state which is expected to change over the lifetime of a track: position, play state, and modifiers like volume.
Tracks (and their handles) also allow per-source events to be inserted.

Tracks are defined in user code, where they are fully modifiable, before being passed into the driver.
Expand Down
112 changes: 81 additions & 31 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,28 @@
authors = ["Kyle Simpson <kyleandrew.simpson@gmail.com>"]
description = "An async Rust library for the Discord voice API."
documentation = "https://docs.rs/songbird"
edition = "2018"
edition = "2021"
homepage = "https://github.com/serenity-rs/songbird"
include = ["src/**/*.rs", "Cargo.toml", "build.rs"]
keywords = ["discord", "api", "rtp", "audio"]
license = "ISC"
name = "songbird"
readme = "README.md"
repository = "https://github.com/serenity-rs/songbird.git"
version = "0.3.0"
version = "0.2.2"
rust-version = "1.61"

[dependencies]
derivative = "2"
pin-project = "1"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tracing = { version = "0.1", features = ["log"] }
tracing-futures = "0.2"
symphonia-core = "0.5"

[dependencies.once_cell]
version = "1"
optional = true

[dependencies.async-trait]
optional = true
Expand All @@ -45,28 +50,50 @@ version = "5"
[dependencies.discortp]
features = ["discord-full"]
optional = true
version = "0.4"
version = "0.5"

# Temporary hack to pin MSRV.
[dependencies.flume]
optional = true
version = "0.10"

[dependencies.futures]
version = "0.3"

[dependencies.parking_lot]
[dependencies.lazy_static]
optional = true
version = "0.12"
version = "1"

[dependencies.pin-project]
[dependencies.parking_lot]
optional = true
version = "1"
version = "0.12"

[dependencies.rand]
optional = true
version = "0.8"

[dependencies.reqwest]
optional = true
default-features = false
features = ["stream"]
version = "0.11"

[dependencies.ringbuf]
optional = true
version = "0.2"

[dependencies.rubato]
optional = true
version = "0.12"

[dependencies.rusty_pool]
optional = true
version = "0.7"

[dependencies.serde-aux]
default-features = false
optional = true
version = "3"

[dependencies.serenity]
optional = true
version = "0.11"
Expand All @@ -81,11 +108,29 @@ version = "0.1"
optional = true
version = "1"

[dependencies.symphonia]
optional = true
default-features = false
version = "0.5"
git = "https://github.com/FelixMcFelix/Symphonia"
branch = "songbird-fixes"

[dependencies.symphonia-core]
optional = true
version = "0.5"
git = "https://github.com/FelixMcFelix/Symphonia"
branch = "songbird-fixes"

[dependencies.tokio]
optional = true
version = "1.0"
default-features = false

[dependencies.tokio-util]
optional = true
version = "0.7"
features = ["io"]

[dependencies.twilight-gateway]
optional = true
version = "0.12.0"
Expand All @@ -106,7 +151,7 @@ version = "2"

[dependencies.uuid]
optional = true
version = "0.8"
version = "1"
features = ["v4"]

[dependencies.xsalsa20poly1305]
Expand All @@ -116,7 +161,10 @@ features = ["std"]

[dev-dependencies]
criterion = "0.3"
ntest = "0.8"
symphonia = { version = "0.5", features = ["mp3"], git = "https://github.com/FelixMcFelix/Symphonia", branch = "songbird-fixes" }
utils = { path = "utils" }
tokio = { version = "1", features = ["rt", "rt-multi-thread"] }

[features]
# Core features
Expand All @@ -126,45 +174,48 @@ default = [
"gateway",
]
gateway = [
"gateway-core",
"tokio/sync",
"tokio/time",
]
gateway-core = [
"dashmap",
"flume",
"once_cell",
"parking_lot",
"pin-project",
]
driver = [
"async-tungstenite",
"driver-core",
"tokio/fs",
"tokio/io-util",
"tokio/macros",
"tokio/net",
"tokio/process",
"tokio/rt",
"tokio/sync",
"tokio/time",
]
driver-core = [
driver = [
"async-trait",
"async-tungstenite",
"audiopus",
"byteorder",
"discortp",
"reqwest",
"flume",
"lazy_static",
"parking_lot",
"rand",
"ringbuf",
"rubato",
"serde-aux",
"serenity-voice-model",
"streamcatcher",
"symphonia",
"symphonia-core",
"rusty_pool",
"tokio-util",
"tokio/fs",
"tokio/io-util",
"tokio/macros",
"tokio/net",
"tokio/process",
"tokio/rt",
"tokio/sync",
"tokio/time",
"typemap_rev",
"url",
"uuid",
"xsalsa20poly1305",
]
rustls = ["async-tungstenite/tokio-rustls-webpki-roots", "rustls-marker"]
native = ["async-tungstenite/tokio-native-tls", "native-marker"]
rustls = ["async-tungstenite/tokio-rustls-webpki-roots", "reqwest/rustls-tls", "rustls-marker"]
native = ["async-tungstenite/tokio-native-tls", "native-marker", "reqwest/native-tls"]
serenity-rustls = ["serenity/rustls_backend", "rustls", "gateway", "serenity-deps"]
serenity-native = ["serenity/native_tls_backend", "native", "gateway", "serenity-deps"]
twilight-rustls = ["twilight", "twilight-gateway/rustls-native-roots", "rustls", "gateway"]
Expand All @@ -178,8 +229,6 @@ rustls-marker = []
native-marker = []

# Behaviour altering features.
youtube-dlc = []
yt-dlp = []
builtin-queue = []

# Used for docgen/testing/benchmarking.
Expand All @@ -189,6 +238,7 @@ internals = []
[[bench]]
name = "base-mixing"
path = "benches/base-mixing.rs"
required-features = ["internals"]
harness = false

[[bench]]
Expand Down
4 changes: 4 additions & 0 deletions Makefile.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ dependencies = ["format"]
[tasks.build-variants]
dependencies = ["build", "build-gateway", "build-driver"]

[tasks.check]
args = ["check", "--features", "full-doc"]
dependencies = ["format"]

[tasks.clippy]
args = ["clippy", "--features", "full-doc", "--", "-D", "warnings"]
dependencies = ["format"]
Expand Down
Loading

0 comments on commit 8cc7a22

Please sign in to comment.