Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Waveform research for audio files #62

Closed
zackkrida opened this issue May 17, 2021 · 5 comments
Closed

Waveform research for audio files #62

zackkrida opened this issue May 17, 2021 · 5 comments
Assignees

Comments

@zackkrida
Copy link
Member

For each audio track, we will want to display a waveform preview. This is both aesthetically pleasing but also informative to users. Individuals looking for audio can get a lot of information from the waveform. Some examples:

  • Get a sense of the dynamic changes within a piece of music
  • Tell if something is a one-shot (sound effects, single notes, etc.) or a full track
  • Tell if something is of poor quality (too quiet, too loud)

Most waveform previews require an entire track to be downloaded and analysed client-side. We should look into solutions that support server-side-rendering or can take metadata in some type of simple text format. I'm not sure how much spectral data it even takes to render an audio waveform; if the data required is too large, we might not want to include it as an api field, but do something similar to the image thumbnail cache instead.

prior art

I found a developer discussing some different approaches they took for performance in a gist

@obulat
Copy link
Contributor

obulat commented Jun 15, 2021

There are three strategies we can use to show audio waveforms on the front end:

1. Generate the waveform entirely on the frontend.

To do this, we would use the Web Audio API and its ability to analyze the audio data after it has been downloaded to the user's device. The libraries we can use for this are wavesurfer.js and peaks.js.

Pros

  • Loading the waveform on the frontend doesn't require us to download and/or analyze the audio files from the sources, so the server processing time is minimized.
  • wavesurfer.js also provides play/pause/seek functionality for the frontend, and is very flexible so we can adjust its appearance according to our designs.
  • Customizable style of the waveform graphics.

Cons

  • Performance: On the POC page you can see the waveforms appearing one-by-one in the period of a couple of seconds (on a good network connection). The user's device needs to download all the audio files before creating the waveform, and the process of waveform generation can be CPU intensive.
  • without the CORS headers on the source, the audio cannot be downloaded, and the waveform cannot be displayed (as with ccmixter.org in the POC page).

2. Generate the waveform data on the backend and draw it on the frontend

This approach would require us to download the audio files, analyze them to create a .dat or a .json file, and fetch this data together with the audio metadata when displaying the search results.

Pros

  • Smaller download size, as the user will only download the smaller waveform data file, not the entire audio file.
  • Customizable style of the waveform graphics.

Cons

  • We would have to have an additional audio analysis/ waveform generation step on the backend. This could be similar to the way we create image thumbnails.
  • This would consume more resouces on the user's device than simply displaying an image as in solution Remove link to old CC Search #3.

3. Generate waveform images in a process similar to the way we create image thumbnails, and use the images on the frontend.

We could use the approach similar to #2, but instead of creating the json/dat files, create an image.

Pros

  • This could be the most performant strategy for the user, as there is no waveform drawing step on the user's device.

Cons

  • difficult to change the style of the waveform. If the design needs change, we would have to recreate all the images.
  • We would either have to create images at different resolutions for different device sizes, or use incorrect resolutions on some devices.

I would suggest setting up the backend for generating waveform data in .dat binary format using audiowaveform(https://github.com/bbc/audiowaveform) library similar to the way we are creating the image thumbnails now.
While working on that we could keep using the wavesurfer.js or peaks.js for generating waveforms on the frontend as a fall-back while the waveform data is not available on the backend.
Both wavesurfer.js and peaks.js have complex functionality that we do not need (resampling, drawing several channels, zooming in, adding annotations, etc). They are also browser-based JavaScript libraries, so they freely call browser-specific APIs (window, navigator) during the import process (and not element initialization), so when loading in Nuxt SSR we have to handle errors related to window or navigator not being available on the server. For these reasons, we might consider creating our own implementation ready for SSR, using binary data and canvas for rendering.

Generating waveforms on the frontend: Technical details

On the frontend, audio can be displayed using <audio> HTML element, or Web Audio API. The main advantage of the Web Audio API in our case is the fact that it provides a way to analyze the audio and create the waveform based on this analysis on the frontend. On the other hand, when using <audio> element, wavesurfer.js allows loading pre-generated peaks data.

Data file format

The data we need to draw the waveform is generated in form of an array of numbers that show how high the lines should go. wavesurfer.js expects peaks to be in range of [-1, 1], whereas in peaks.js the values are in range of [-128, +127]. BBC audiowaveform library can analyze an audio file and generate waveform data in two formats: binary dat and json. There is a good description of these formats in (audiowaveform documentation)[https://github.com/bbc/audiowaveform/blob/master/doc/DataFormat.md].
peaks.js can use both .dat and .json formats, and wavesurfer.js can only use json.
peaks.js uses the waveforms-data.js library to handle the binary format. Along with reading the binary file, it also supports resampling to add the zooming capability, which is unnecessary for our purposes.

For a sample with 1 channel, sample rate of 44100, 256 samples_per_pixel, 8 bits and length of 33472, the .json data file is 227KB. The .dat file for the same audio is only 67 KB, so it would be preferrable both for server costs and for user data usage to use the binary format.

Drawing the waveforms

Both peaks.js and wavesurfer.js use <canvas> to draw the audio waveforms. wavesurfer.js uses vanilla JavaScript to draw, while peaks.js drawing logic is based on konvajs. This might be due to the fact that peaks.js is an older library, and needed more compatibility functions in the past.

Other alternatives reviewed

There is Vue-based library Vue-audio-visual generating waveforms using Web Audio API. However, it does not allow customizing the audio controls to our requirements, or using the waveform for playing/pausing the audio.

I have added a page with audio waveforms generated using vue-audio-visual and wavesurfer-js in waveform_exploration branch. It is also available as a preview on heroku: just click on the 'Audio' tab in the search results.

@dhruvkb
Copy link
Member

dhruvkb commented Aug 19, 2021

Approach 1 cannot be used as the WebAudio API does not allow reading contents of the audio if the request is made to a cross-origin resource.

Approach 3 is not server-friendly because keeping a library of waveform images for every audio file in the catalog is not feasible. Also images will neither be interactive (required for seeking) nor XY scalable (required for good UI).

References for implementations towards approach 2:

@zackkrida
Copy link
Member Author

If there's consensus that Approach 2 is the way to move forward amongst the @WordPress/openverse-developers, we can close this.

I personally think it's the best balance between performance and visual quality

@obulat
Copy link
Contributor

obulat commented Aug 19, 2021

Closing this issue as the Approach 2 is being implemented in the linked PRs.

@obulat obulat closed this as completed Aug 19, 2021
@zackkrida
Copy link
Member Author

Will re-open if someone is opposed to Approach 2 then 😄

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants