Sound Event Detection with FilterAugment

Last update: Aug 28, 2022

Related tags

Deep Learning FilterAugSED

Overview

Sound Event Detection with FilterAugment

Official implementation of

Heavily Augmented Sound Event Detection utilizing Weak Predictions (DCASE2021 Challenge Task 4 technical report)
by Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park

- arXiv version has updates on some minor errors
FilterAugment: An Acoustic Environmental Data Augmentation Method (Submitted to ICASSP 2022)
by Hyeonuk Nam, Seong-Hu Kim, Yong-Hwa Park
- Implementation for 2nd paper that includes updated version of FilterAugment is incomplete for now. It will be updated soon!

Ranked on [3rd place] in IEEE DCASE 2021 Task 4.

FilterAugment

Filter Augment is an audio data augmentation method newly proposed on the above papers for training acoustic models in audio/speech tasks. It applies random weights on randomly selected frequency bands. For more details, refer to the papers mentioned above.

This example shows two types of FilterAugment applied on log mel spectrogram of a 10-second audio clip. (a) shows original log mel spectrogram, (b) shows log mel spectrogram applied by step type FilterAugment (c) shows log mel spectrogram applied by linear type Filter Augment.
Applied filters are shown below. Filter (d) is applied on (a) to result in (b), and filter (e) is applied on (a) to result in (c)

Step type FilterAugment shows several frequency bands that are uniformly increased or decreased in amplitude, while linear type FilterAugment shows continous filter that shows certain peaks and dips.
On our participation on DCASE2021 challenge task 4, we used prototype FilterAugment which is step type FilterAugment without hyperparameter minimum bandwith. The code for this prototype is defiend as "filt_aug_dcase" at utils/data_aug.py @ line 107
Code for updated FilterAugment including step and linear type for ICASSP submission is defiend as "filt_aug_icassp" at utils/data_aug.py @ line 126

Requirements

Python version of 3.7.10 is used with following libraries

pytorch==1.8.0
pytorch-lightning==1.2.4
pytorchaudio==0.8.0
scipy==1.4.1
pandas==1.1.3
numpy==1.19.2

other requrements in requirements.txt

Datasets

You can download datasets by reffering to DCASE 2021 Task 4 description page or DCASE 2021 Task 4 baseline. Then, set the dataset directories in config yaml files accordingly. You need DESED real datasets (weak/unlabeled in domain/validation/public eval) and DESED synthetic datasets (train/validation).

Training

You can train and save model in exps folder by running:

python main.py

model settings:

There are 5 configuration files in this repo. Default setting is (ICASSP setting)(./configs/config_icassp.yaml), the optimal linear type FilterAugment described in paper submitted to ICASSP. There are 4 other model settings in DCASE tech report. To train for model 1, 2, 3 or 4 from the DCASE tech report or ICASSP setting, you can run the following code instead.

# for example, to train model 3:
python main.py --confing model3

Results of DCASE settings (model 1~4) on DESED Real Validation dataset:

Model	PSDS-scenario1	PSDS-scenario2	Collar-based F1
1	0.408	0.628	49.0%
2	0.414	0.608	49.2%
3	0.381	0.660	31.8%
4	0.052	0.783	19.8%

these results are based on train models with single run for each setting

Results of ICASSP settings on DESED Real Validation dataset:

Methods	PSDS-scenario1	PSDS-scenario2	Collar-based F1	Intersection-based F1
w/o FiltAug	0.387	0.598	47.7%	70.8%
step FiltAug	0.412	0.634	47.4%	71.2%
linear FiltAug	0.413	0.636	49.0%	73.5%

These results are based on max values of each metric for 3 separate runs on each setting (refer to paper for details).

Reference

DCASE 2021 Task 4 baseline

Citation & Contact

If this repository helped your works, please cite papers below!

@techreport{Nam2021,
    Author = "Nam, Hyeonuk and Ko, Byeong-Yun and Lee, Gyeong-Tae and Kim, Seong-Hu and Jung, Won-Ho and Choi, Sang-Min and Park, Yong-Hwa",
    title = "Heavily Augmented Sound Event Detection utilizing Weak Predictions",
    institution = "DCASE2021 Challenge",
    year = "2021",
    month = "June",
}

@article{nam2021filteraugment,
  title={FilterAugment: An Acoustic Environmental Data Augmentation Method},
  author={Hyeonuk Nam and Seoung-Hu Kim and Yong-Hwa Park},
  journal={arXiv preprint arXiv:2107.13260},
  year={2021}
}

Please contact Hyeonuk Nam at [email protected] for any query.

Sound Event Detection with FilterAugment

Related tags

Overview

Sound Event Detection with FilterAugment

FilterAugment

Requirements

Datasets

Training

model settings:

Results of DCASE settings (model 1~4) on DESED Real Validation dataset:

Results of ICASSP settings on DESED Real Validation dataset:

Reference

Citation & Contact

Owner

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

PyTorch-lightning implementation of the ESFW module proposed in our paper Edge-Selective Feature Weaving for Point Cloud Matching

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

thundernet ncnn

PolyTrack: Tracking with Bounding Polygons

[Link]mareteutral - pars tradg wth M []

The implementation of ICASSP 2020 paper "Pixel-level self-paced learning for super-resolution"

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Recovering Brain Structure Network Using Functional Connectivity

CL-Gym: Full-Featured PyTorch Library for Continual Learning

Exploiting a Zoo of Checkpoints for Unseen Tasks

DeLighT: Very Deep and Light-Weight Transformers

PyTorch Implementations for DeeplabV3 and PSPNet

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Custom Implementation of Non-Deep Networks

Official Code for "Non-deep Networks"

PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Sound Event Detection with FilterAugment

Related tags

Overview

Sound Event Detection with FilterAugment

FilterAugment

Requirements

Datasets

Training

model settings:

Results of DCASE settings (model 1~4) on DESED Real Validation dataset:

Results of ICASSP settings on DESED Real Validation dataset:

Reference

Citation & Contact

Owner

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

ICRA 2021 "Towards Precise and Efficient Image Guided Depth Completion"

PyTorch-lightning implementation of the ESFW module proposed in our paper Edge-Selective Feature Weaving for Point Cloud Matching

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

thundernet ncnn

PolyTrack: Tracking with Bounding Polygons

[Link]mareteutral - pars tradg wth M []

The implementation of ICASSP 2020 paper "Pixel-level self-paced learning for super-resolution"

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Recovering Brain Structure Network Using Functional Connectivity

CL-Gym: Full-Featured PyTorch Library for Continual Learning

Exploiting a Zoo of Checkpoints for Unseen Tasks

DeLighT: Very Deep and Light-Weight Transformers

PyTorch Implementations for DeeplabV3 and PSPNet

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Custom Implementation of Non-Deep Networks

Official Code for "Non-deep Networks"

PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.