Skip to content

Using pre trained audio classification models available with this library

Jyotika Singh edited this page May 20, 2022 · 5 revisions

Classifying with Pre-trained Models

There are three models that have been pre-trained and provided in this project. They are as follows.

Music genre

Contains a pre-trained SVM classifier to classify audio into 10 music genres - blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock. This classifier was trained using mfcc, gfcc, spectral and chroma features. The baseline dataset used in GTZAN.

The following commands in Python can be used to classify your data.

from pyAudioProcessing.run_classification import classify_ms

# Classify a single file
results = classify_genre(file="/Users/xyz/Documents/audio.wav")

# Classify multiple files with known paths
results = classify_genre(
    file_names={
        "audios_1": ["/Users/xyz/Documents/audio.wav", "/Users/xyz/Desktop/sound.wav"], 
        "audios_2": ["/Users/xyz/Downloads/sound_4.wav"]
    }
)

# Classify multiple files stored in the directory structure as specified in the readme
# folder -> sub-folder/s -> audio files
results = classify_genre(folder_path="/Users/xyz/Documents/audios")

Music versus Speech

Contains a pre-trained SVM classifier that classifying audio into two possible classes - music and speech. This classifier was trained using mfcc, spectral and chroma features. The baseline dataset used was curated specifically for this purpose.

from pyAudioProcessing.run_classification import classify_ms

# Classify a single file
results = classify_ms(file="/Users/xyz/Documents/audio.wav")

# Classify multiple files with known paths
results = classify_ms(
    file_names={
        "audios_1": ["/Users/xyz/Documents/audio.wav", "/Users/xyz/Desktop/sound.wav"], 
        "audios_2": ["/Users/xyz/Downloads/sound_4.wav"]
    }
)

# Classify multiple files stored in the directory structure as specified in the readme
# folder -> sub-folder/s -> audio files
results = classify_ms(folder_path="/Users/xyz/Documents/audios")

Music versus Speech versus Birds

Contains pre-trained SVM classifier that classifying audio into three possible classes - music, speech and birds. This classifier was trained using mfcc, spectral and chroma features. The baseline dataset used was curated specifically for this purpose.

from pyAudioProcessing.run_classification import classify_ms

# Classify a single file
results = classify_msb(file="/Users/xyz/Documents/audio.wav")

# Classify multiple files with known paths
results = classify_msb(
    file_names={
        "audios_1": ["/Users/xyz/Documents/audio.wav", "/Users/xyz/Desktop/sound.wav"], 
        "audios_2": ["/Users/xyz/Downloads/sound_4.wav"]
    }
)

# Classify multiple files stored in the directory structure as specified in the readme
# folder -> sub-folder/s -> audio files
results = classify_msb(folder_path="/Users/xyz/Documents/audios")

Sample results look like

{'audios_1': {'audio.wav': {'probabilities': [0.8899067858599712, 0.011922234412695229, 0.0981709797273336], 'classes': ['music', 'speech', 'birds']}, ...}