AFEC is a cross platform audio feature extraction and sound classification CLI tool written in C++. It analyzes audio files and saves a set of musically interesting audio-features into a sqlite database, which can then be used for other tasks - e.g. to organize sample libraries or to ease finding sounds with specific audio features (key, BPM, sound classes, RMS and so on).
It's a CLI tool only to generate data - it does NOT provide a GUI to view the analyzed audio features or to preview audio files. There are a few basic python dash based GUIs available in the AFEC-Visualizers repository, to debug AFEC's results.
An experimental GUI with a 2d t-SNE classification cluster is available in the AFEC-Explorer repository.
The AFEC Crawler was initially created for the Sononym project. This open sourced version got forked off from the initial release of Sononym at version 1.0. It's not compatible with Sononym's internal sample crawler and will not try to be in future. AFEC was released as an open source project, in the hope to be useful for other audio projects. The original authors of this project are no longer part of the Sononym project.
ACEC is using a subset of the analyzed low-level audio-features to evaluate a pretrained, bootstrap aggregated gradient boosted machine (LightGBM) classification model.
There are other experimental classification models in the source tree such as a simple ANN, RBM, SVM, KNN, Naive Bayes and Random Forest Tree implemented in Shark C++, but only the LightGBM model is used in production as pretrained model. So AFEC also can be used to experiment with audio classification.
See ./Scripts/ModelCreator for scripts that train and test various classification models.
There are also various keras / tensorflow experiments in the AFEC-Classifiers repository, which are using the same data than the internal C++ tools.
Prebuilt binaries can be downloaded here.
Usage:
Crawler[.exe] [options] <paths...>
Synopsis:
Recursively search for audio files in the path(s) and write high or low-level
audio features into the given sqlite database.
Options:
-h [ --help ] Show help message.
-v [ --version ] Show version, build number and other infos.
-l [ --level ] arg (=high) Create a 'high' or 'low' level database.
-m [ --model ] arg Specify the 'Classifiers' and 'OneShot-Categories'
model files that should be used for level='high'.
When not specified, the default models from the
crawler's resource dir are used. Set to 'none' to
explicitely avoid loading the a default model -
e.g. --model "None" --model "None" will disable
both.
-o [ --out ] arg Set destination directory/db_name.db or just a
directory. When only a directory is specified, the
database filename will be: 'afec-ll.db' or
'afec.db', depending on the level. When no
directory or file is specified, the database will
be written into the current working dir.
--paths arg One or more paths to a folder or single audio file
which should be analyzed. Can also be passed as
last (positional) argument.
When all given paths are sub paths of the 'out' db
path, all file paths within the database will be
relative to the out dir, else absolute paths.
Usage:
ModelTester[.exe] [options] <input.db>
Synopsis:
Train and evaluate various classification models to see how they perform against
the given input. Input is an AFEC low-level descriptor sqlite database which is
used as train and test set.
Options:
-h [ --help ] Show help message.
-a [ --all ] arg (=0) When enabled, test all models instead of just the
the default model.
-r [ --repeat ] arg (=10) Number of times the test should be repeated.
-s [ --seed ] arg (=-1) Set random seed, if any, in order to replicate tests.
-b [ --bagging ] arg (=0) When enabled, test bagging ensemble models instead
of 'raw' ones.
-i [ --src_database ] arg The low-level descriptor db file to create the
train and test data from. Can also be passed as
last (positional) argument.
Usage:
ModelCreator[.exe] [options] <input.db>
Synopsis:
Train and evaluate the default classification model which is defined in
`Source/Crawler/Export/DefaultClassificationModel.h` and create an ensemble
model from the best performing ones.
Options:
-h [ --help ] Show help message.
-r [ --repeat ] arg (=8) Number of times to repeat the model creation with
different training set folds, to choose the best
one along all runs.
-s [ --seed ] arg (=-1) Random seed, if any, in order to replicate runs.
-o [ --dest_model ] arg Destination name and path of the resulting model
file.
When not specified, the model file will be written
into the crawler's resource directory.
-i [ --src_database ] arg The low-level descriptor db file to create the
train and test data from. Can also be passed as
last (positional) argument.```
AFEC can read and thus analyze the following audio file-formats:
- Waveform Audio File (.wav): Windows, OSX, Linux
- Audio Interchange (.aif, .aiff, .aifc): Windows, OSX, Linux
- Free Lossless Audio Codec (.fla, .flac): Windows, OSX, Linux
- OGG Vorbis (.ogg): Windows, OSX, Linux
- MPEG-1 Audio Layer 2 (.mp2): Windows, OSX, Linux
- MPEG-1 Audio Layer 3 (.mp3): Windows, OSX, Linux
- MPEG-4 Part 14 (.mp4, .mp4a, .m4a): Windows & OSX only
- Core Audio Format (.caf): OSX only
- Windows Media Audio (.wma): Windows only
- NeXT/Sun Audio (.au): Windows & OSX only
- Advanced Audio Coding (.aac): Windows & OSX only
- Apple SouND (.snd): Windows & OSX only
Internal analyzation sample rate currently is hardcoded to 44100 Hz.
The FFT Frame Size is 2048 samples.
The FFT Hop Size is 1024 samples.
High-level features are written in a sqlite database which uses the following column names and types.
The column name ending specifies the data type (except for the first 3 columns):
- S: String
- R: Real number or integer
- VR: Vector of real numbers in JSON format
- VVR: Vector of a vector of real numbers in JSON format
- ...
filename
(TEXT):
Absolute or relative path from the database path and name of the analyzed file.modtime
(INTEGER):
File modification date in time_t units (unix timestamp).status
(TEXT):
"succeeded" or some human readable error message, in case the file could not be opened or read.
file_type_S
(TEXT):
The file's normalized file extension.file_size_R
(INTEGER):
The file's original raw size in bytes.file_length_R
(REAL):
Audio stream's total length in seconds.file_sample_rate_R
(INTEGER):
Sampling rate in HZ.file_channel_count_R
(INTEGER):
Number of audio channels in the file.file_bit_depth_R
(INTEGER):
Audio file-format bit depth.
class_signature_VR
(TEXT):
JSON array of real numbers. Prediction result of the classification model.
Can be used instad of the normalizedclass_strengths_VR
to find similar sounds with a similar class signature.classes_VS
(TEXT: JSON_STRING_ARRAY):
JSON array of strings. Name of "strong" predicted classes - strongest ones first.class_strengths_VR
(TEXT: JSON_TEXT_ARRAY):
JSON array of real numbers. Normalized, clamped prediction result of the "strong" predicted classes - strongest ones first.
category_signature_VR
(TEXT: JSON_NUMBER_ARRAY):
JSON array of real numbers. Prediction result of the categorization model.
Can be used instad of the normalizedcategory_strengths_VR
to find similar sounds with a similar category signature.categories_VS
(TEXT: JSON_STRING_ARRAY):
JSON array of strings. Name of "strong" predicted categories - strongest ones first.category_strengths_VR
(TEXT: JSON_NUMBER_ARRAY):
JSON array of real numbers. Normalized, clamped prediction result of the "strong" predicted categories - strongest ones first.
base_note_R
(REAL):
Most dominant key note (if any) in the entire file. Should be used in combination withbase_note_confidence_R
only.base_note_confidence_R
(REAL):
Normalized detection confidence value of the base note.peak_db_R
(REAL):
Peak value accross all channels in dB.rms_db_R
(REAL):
RMS value accross all channels in dB.bpm_R
(REAL):
Most dominant BPM (if any) in the entire file accross all channels. Should be used in combination withbpm_confidence_R
to be useful.bpm_confidence_R
(REAL):
Normalized detection confidence value of the BPM detection.
brightness_R
(REAL):
Overal sound's perceived brightness, calculated from the spectral centroid and rolloff.noisiness_R
(REAL):
Overal sound's noisiness level, calculated from the spectral flatness.harmonicity_R
(REAL):
Overal sound's harmonicity level, calculated from the auto correlation, pitch confidence and spectral flatness.
spectral_flatness_R
(REAL):
Mean of audible low level spectral_flatness_VR (spectral flatness)spectral_flux_R
(REAL):
Mean of audible low level spectral_flux_VR (spectral flux)spectral_complexity_R
(REAL):
Mean of audible low level spectral_complexity_VR (spectral complexity measure based on a sharpened spectrum)spectral_contrast_R
(REAL):
Mean of audible low level spectral_contrast_VR (spectral contrast)spectral_inharmonicity_R
(REAL):
Mean of audible low level spectral_inharmonicity_VR (inharmonicity based on a sharpened spectrum)
spectrum_signature_VVR
(TEXT: JSON_NUMBER_ARRAY_ARRAY):
JSON array of an array of real numbers. 14 bands for 64 time frames (resampled), which can be used an iconic signature alike view of the entire audio file's spectrum.
The 14 spectral bands end at 50, 100, 200, 400, 630, 920, 1270, 1720, 2320, 3150, 4400, 6400.0, 9500, 15500 HZ
pitch_VR
(TEXT: JSON_NUMBER_ARRAY):
JSON array of real numbers. Cleaned pitch note values for for each fft time frame.pitch_confidence_R
(REAL):
Mean value of all pitch note value detection confidences.
peak_VR
(TEXT: JSON_NUMBER_ARRAY):
JSON array of real numbers. Peak value in dB for for each fft time frame.
Low-level features are written in a sqlite database which uses the following column names and types.
Just like for high-level features, the column name ending specifies the data type.
_VVR columns in the database are saved as binary mspack blobs, to save disk space.
Note: All vector features contain the following additional statistical features as well:
min
, max
, median
, mean
, gmean
(geographic mean), variance
, centroid
, spread
, skewness
, kurtosis
, flatness
, dmean
, dvariance
(1st deviation)
filename
(absolute or relative path to the analyzed file)modtime
(file modification date in unix timestamps)status
("succeeded" or some human readable error message)
file_type_S
(normalized file extension)file_size_R
(bytes)file_length_R
(seconds)file_sample_rate_R
(Hz)file_channel_count_R
file_bit_depth_R
effectve_length_48dB_R
(gate > 48dB)effectve_length_24dB_R
effectve_length_12dB_R
amplitude_silence_VR
(1 for silence, 0 for non silence)amplitude_peak_VR
amplitude_rms_VR
amplitude_envelope_VR
spectral_rms_VR
(spectral rms)spectral_centroid_VR
(spectral centroid)spectral_spread_VR
(spectral spread)spectral_skewness_VR
(spectral skewness)spectral_kurtosis_VR
(spectral kurtosis)spectral_flatness_VR
(spectral flatness)spectral_rolloff_VR
(spectral rolloff)spectral_flux_VR
(spectral flux)spectral_inharmonicity_VR
(inharmonicity based on a sharpened spectrum)spectral_complexity_VR
(spectral complexity measure based on a sharpened spectrum)spectral_contrast_VR
(spectral contrast)
f0_VR
(in Hz for each FFT frame)f0_confidence_VR
(0-1 for each F0)failsafe_f0_VR
(falling back to last stable F0)
tristimulus1_VR
(mixture of harmonics, timbre based on the F0 detection)tristimulus2_VR
tristimulus3_VR
auto_correlation_VR
rhythm_complex_onsets_VR
(onset value for each fft frame)rhythm_complex_onset_count_R
(number of detected onsets)rhythm_complex_onset_contrast_R
rhythm_complex_onset_frequency_mean_R
rhythm_complex_onset_strength_R
(overall strength)
rhythm_percussive_onsets_VR
(see rhythm_complex)rhythm_percussive_onset_count_R
rhythm_percussive_onset_contrast_R
rhythm_percussive_onset_frequency_mean_R
rhythm_percussive_onset_strength_R
rhythm_complex_tempo_R
(BPM)rhythm_complex_tempo_confidence_R
(0-1)rhythm_percussive_tempo_R
rhythm_percussive_tempo_confidence_R
rhythm_final_tempo_R
rhythm_final_tempo_confidence_R
spectral_rms_bands_VVR
(14 RMS values for every band - see also Spectral Features)spectral_flatness_bands_VVR
spectral_flux_bands_VVR
spectral_complexity_bands_VVR
spectral_contrast_bands_VVR
frequency_bands_VVR
(50, 100, 150, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500, 19000, 22050 Hz)
cepstrum_bands_VVR
(MFCC values)
- cmake 2.8 or later
- git-lfs (download at https://git-lfs.github.com/)
- VisualStudio 2015 or later with C++ support (C++14)
- cmake 2.8 or later
- git-lfs (download at https://git-lfs.github.com/)
- OSX 10.11 or later
- XCode 7 or later with OSX 10.11 SDK and command line tools installed
- cmake 2.8 or later (
apt-get install cmake
on Ubuntu) - ninja build system (
apt-get install ninja-build
on Ubuntu) - git-lfs (
apt-get install git-lfs
on Ubuntu) - gcc-7.4 (ubuntu 18.04's default compiler,
apt-get install build-essentials
on Ubuntu). - PkgConfig and Threads (usually already installed)
- libmpg123 headers and library (
apt-get install libmpg123-dev
on Ubuntu)
AFEC uses the following third-party libraries, which are bundled in the 3rdParty
folder, including precompiled static libraries for Windows (Visual C++) OSX (Clang) and Linux (GCC). Note: if you're trying to build AFEC on Linux with gcc-8 or later, you may get linker errors and then need to recompile a few of the C++ third party libraries. There are build scripts in the Linux/
sub folders in each third party library to do so.
- SharkC++: Used for various classification test models and for the model ensemble generation.
- LightGBM: The default classification model.
- TinyDNN: DNN experiments (should be removed).
- Aubio: For pitch/key detection and partly for BPM detection.
- LibXtract: To calculate Mel Frequency Cepstral Coefficients.
- Resample: Normalize sample rates of analyzed audio files.
- Flac: Flac file decoding.
- OGG & OGG Vorbis: OGG file decoding.
- Mpg123: Mpg file decoding on Linux.
- Boost: Dependency of SharkC++ and used in various places internally.
- Sqlite: SQLite database support.
- ZLib: Dependency of Sqlite.
- OpenBLAS: Used on Windows as dependency of SharkC++
- CTPL: Enabled multi-processing in the crawler via a thread pool
- Msgpack: Optionally packing of JSON in sqlite database (disabled).
- Iconv: Unicode string UTF8 and platform encoding.
The precompiled 3rd party libraries are stored via git lfs, so please ensure the lfs files are checked out:
git lfs pull
then go to ./Build
and run:
./Build/build.sh|bat
The resulting Visual Studio (Windows), XCode (OSX) or Makefiles (Linux) files can then be found at ./Build/Out
.
After building, the produced binaries can be found at ./Dist/[Debug|Release]
.
AFEC was originally created by Eduard Müller and Ingolf Wagner
GNU General Public License v3.0 or later See COPYING to see the full text.
The bundled third-party libraries may use different licenses. Please have a look at the 3rdParty
folder to see which ones.
Patches are welcome: please fork the latest git repository and create a feature branch.