Skip to content

Latest commit

 

History

History
88 lines (47 loc) · 5.12 KB

README.md

File metadata and controls

88 lines (47 loc) · 5.12 KB

INVESTIGATING THE GENERALIZATION ABILITIES OF A DEEP LEARNING METHOD FOR SOUND SOURCE LOCALIZATION USING SMALL-SIZED MICROPHONE ARRAYS

Description

In this work, we test the ability of a Convolutional Neural Network (CNN) trained on a specific environmental condition to localize the Direction Of Arrival (DOA) of a sound source in different settings from the training ones. To this end, we generated 3 datasets via simulation techniques that address different acoustic parameters: room volume, microphone array position in the room and distance of the source.

Prerequisities

RIRgenerator: ehabets/RIR-Generator: Generating room impulse responses (github.com)

Trained models

Model for resolution 30°: https://drive.google.com/drive/folders/1vfMAvJECAkNPA6yMZTZlBgEz5T0uRLjF?usp=sharing

Model for resolution 10°: https://drive.google.com/drive/folders/1hWLM7Omebq4AG1-U4RkYXROunNbwzj7Y?usp=sharing

How to use

Source Codes

The Matlab script and the two notebooks are based on the paper Deep learning assisted sound source localiztion using two orthogonal first-order differential microphone arrays [Nian Liu, Huawei Chen, Kunkun Songgong, et al].

Dataset Generator:

dataset_gen_livescript.mlx generates the dataset for training a Neural Net able to performe source localization. We used rir-generator to simulate the room acoustics.

The script has the following structure:

  • Room simulation Parameters -> Initialization of all the parameters used to simulate the room acoustics;

  • Train dataset generation -> To generate the dataset for the training, first we randomly select 300 sentences from the TIMIT/TRAIN dataset. And since we want to reach a dataset size of 6000 datapoints, we locate 1 sentence in 30 different DOAs;

  • Validation dataset generation -> Procedure as above, but since we consider 100 sentences, associated to 10 DOAs, we reach a dataset of size 1000

  • Test dataset generation -> We randomly select 10 sentences

    • Test dataset for resolution 10° -> reaches a size of 360,

    • Test dataset for resolution 30° -> reaches a size of 120;

  • Custom tests -> we generate the datasets for testing the generalization abilities of the NN

    • Receiver position -> we move the experimental strumentation from the center of the room at mid height to top-right corner at floor height,

    • Source distance -> we move radially the sources starting from near positions from the microphones' center to far positions,

    • Room volume -> we mantain the shape of the room while increasing linearly the three dimensions.

Training Convolutional Neural Network

Neural_Net_Training.ipynb is subdivided into 4 sections:

  1. Imports and auxiliary functions where we import libraries and define the core functions for feature extraction
  2. Feature extraction and data exploration where we plot and play some simulated examples together with the SI features
  3. Training where we show the learning curves
  4. Testing where we evaluated the models' performances

PLEASE NOTE: The actual training of the neural networks has been performed on a local machine with a GPU. In this Colab we uploaded the training output for easy consultation of the loss and accuracy curves

Testing CNN's soundness

In Various_Room_Tests.ipynb we explore the generalization properties of the NNs by testing their ability to adapt to the following conditions:

  • Room volume: changing room volume while mantaining the same shape
  • Source distance: changing the distance (radially) of the sources from the microphones
  • Receiver position: changing the receivers' position by moving it from the center of the room to the top-right corner, near the floor

Report

A detailed description of our work can be found here

Contacts