Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Related tags

Deep Learninguclser20
Overview

Unsupervised Contrastive Learning of
Sound Event Representations

This repository contains the code for the following paper. If you use this code or part of it, please cite:

Eduardo Fonseca, Diego Ortego, Kevin McGuinness, Noel E. O'Connor, Xavier Serra, "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

arXiv slides poster blog post video

We propose to learn sound event representations using the proxy task of contrasting differently augmented views of sound events, inspired by SimCLR [1]. The different views are computed by:

  • sampling TF patches at random within every input clip,
  • mixing resulting patches with unrelated background clips (mix-back), and
  • other data augmentations (DAs) (RRC, compression, noise addition, SpecAugment [2]).

Our proposed system is illustrated in the figure.

system

Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels. Please check our paper for more details, or have a quicker look at our slide deck, poster, blog post, or video presentation (see links above).

This repository contains the framework that we used for our paper. It comprises the basic stages to learn an audio representation via unsupervised contrastive learning, and then evaluate the representation via supervised sound event classifcation. The system is implemented in PyTorch.

Dependencies

This framework is tested on Ubuntu 18.04 using a conda environment. To duplicate the conda environment:

conda create --name <envname> --file spec-file.txt

Directories and files

FSDnoisy18k/ includes folders to locate the FSDnoisy18k dataset and a FSDnoisy18k.py to load the dataset (train, val, test), including the data loader for contrastive and supervised training, applying transforms or mix-back when appropriate
config/ includes *.yaml files defining parameters for the different training modes
da/ contains data augmentation code, including augmentations mentioned in our paper and more
extract/ contains feature extraction code. Computes an .hdf5 file containing log-mel spectrograms and associated labels for a given subset of data
logs/ folder for output logs
models/ contains definitions for the architectures used (ResNet-18, VGG-like and CRNN)
pth/ contains provided pre-trained models for ResNet-18, VGG-like and CRNN
src/ contains functions for training and evaluation in both supervised and unsupervised fashion
main_train.py is the main script
spec-file.txt contains conda environment specs

Usage

(0) Download the dataset

Download FSDnoisy18k [3] from Zenodo through the dataset companion site, unzip it and locate it in a given directory. Fix paths to dataset in ctrl section of *.yaml. It can be useful to have a look at the different training sets of FSDnoisy18k: a larger set of noisy labels and a small set of clean data [3]. We use them for training/validation in different ways.

(1) Prepare the dataset

Create an .hdf5 file containing log-mel spectrograms and associated labels for each subset of data:

python extract/wav2spec.py -m test -s config/params_unsupervised_cl.yaml

Use -m with train, val or test to extract features from each subset. All the extraction parameters are listed in params_unsupervised_cl.yaml. Fix path to .hdf5 files in ctrl section of *.yaml.

(2) Run experiment

Our paper comprises three training modes. For convenience, we provide yaml files defining the setup for each of them.

  1. Unsupervised contrastive representation learning by comparing differently augmented views of sound events. The outcome of this stage is a trained encoder to produce low-dimensional representations. Trained encoders are saved under results_models/ using a folder name based on the string experiment_name in the corresponding yaml (make sure to change it).
CUDA_VISIBLE_DEVICES=0 python main_train.py -p config/params_unsupervised_cl.yaml &> logs/output_unsup_cl.out
  1. Evaluation of the representation using a previously trained encoder. Here, we do supervised learning by minimizing cross entropy loss without data agumentation. Currently, we load the provided pre-trained models sitting in pth/ (you can change this in main_train.py, search for select model). We follow two evaluation methods:

    • Linear Evaluation: train an additional linear classifier on top of the pre-trained unsupervised embeddings.

      CUDA_VISIBLE_DEVICES=0 python main_train.py -p config/params_supervised_lineval.yaml &> logs/output_lineval.out
      
    • End-to-end Fine Tuning: fine-tune entire model on two relevant downstream tasks after initializing with pre-trained weights. The two downstream tasks are:

      • training on the larger set of noisy labels and validate on train_clean. This is chosen by selecting train_on_clean: 0 in the yaml.
      • training on the small set of clean data (allowing 15% for validation). This is chosen by selecting train_on_clean: 1 in the yaml.

      After choosing the training set for the downstream task, run:

      CUDA_VISIBLE_DEVICES=0 python main_train.py -p config/params_supervised_finetune.yaml &> logs/output_finetune.out
      

The setup in the yaml files should provide the best results reported in our paper. JFYI, the main flags that determine the training mode are downstream, lin_eval and method in the corresponding yaml (they are already adequately set in each yaml).

(3) See results:

Check the logs/*.out for printed results at the end. Main evaluation metric is balanced (macro) top-1 accuracy. Trained models are saved under results_models/models* and some metrics are saved under results_models/metrics*.

Model Zoo

We provide pre-trained encoders as described in our paper, for ResNet-18, VGG-like and CRNN architectures. See pth/ folder. Note that better encoders could likely be obtained through a more exhaustive exploration of the data augmentation compositions, thus defining a more challenging proxy task. Also, we trained on FSDnoisy18k due to our limited compute resources at the time, yet this framework can be directly applied to other larger datasets such as FSD50K or AudioSet.

Citation

@inproceedings{fonseca2021unsupervised,
  title={Unsupervised Contrastive Learning of Sound Event Representations},
  author={Fonseca, Eduardo and Ortego, Diego and McGuinness, Kevin and O'Connor, Noel E. and Serra, Xavier},
  booktitle={2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2021},
  organization={IEEE}
}

Contact

You are welcome to contact [email protected] should you have any question/suggestion. You can also create an issue.

Acknowledgment

This work is a collaboration between the MTG-UPF and Dublin City University's Insight Centre. This work is partially supported by Science Foundation Ireland (SFI) under grant number SFI/15/SIRG/3283 and by the Young European Research University Network under a 2020 mobility award. Eduardo Fonseca is partially supported by a Google Faculty Research Award 2018. The authors are grateful for the GPUs donated by NVIDIA.

References

[1] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in Int. Conf. on Mach. Learn. (ICML), 2020

[2] Park et al., SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. InterSpeech 2019

[3] E. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, X. Serra, "Learning Sound Event Classifiers from Web Audio with Noisy Labels", In proceedings of ICASSP 2019, Brighton, UK

Owner
Eduardo Fonseca
Returning research intern at Google Research | PhD candidate at Music Technology Group, Universitat Pompeu Fabra
Eduardo Fonseca
Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

Chris Rockwell 95 Nov 22, 2022
Sleep staging from ECG, assisted with EEG

Sleep_Staging_Knowledge Distillation This codebase implements knowledge distillation approach for ECG based sleep staging assisted by EEG based sleep

2 Dec 12, 2022
OpenVisionAPI server

🚀 Quick start An instance of ova-server is free and publicly available here: https://api.openvisionapi.com Checkout ova-client for a quick demo. Inst

Open Vision API 93 Nov 24, 2022
a project for 3D multi-object tracking

a project for 3D multi-object tracking

155 Jan 04, 2023
Yoga - Yoga asana classifier for python

Yoga Asana Classifier Description Hi welcome to my new deep learning project "Yo

Programminghut 35 Dec 12, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
Benchmark VAE - Library for Variational Autoencoder benchmarking

Documentation pythae This library implements some of the most common (Variational) Autoencoder models. In particular it provides the possibility to pe

1.1k Jan 02, 2023
FOSS Digital Asset Distribution Platform built on Frappe.

Digistore FOSS Digital Assets Marketplace. Distribute digital assets, like a pro. Video Demo Here Features Create, attach and list digital assets (PDF

Mohammad Hussain Nagaria 30 Dec 08, 2022
Source for the paper "Universal Activation Function for machine learning"

Universal Activation Function Tensorflow and Pytorch source code for the paper Yuen, Brosnan, Minh Tu Hoang, Xiaodai Dong, and Tao Lu. "Universal acti

4 Dec 03, 2022
Deeper DCGAN with AE stabilization

AEGeAN Deeper DCGAN with AE stabilization Parallel training of generative adversarial network as an autoencoder with dedicated losses for each stage.

Tyler Kvochick 36 Feb 17, 2022
HyperPose is a library for building high-performance custom pose estimation applications.

HyperPose is a library for building high-performance custom pose estimation applications.

TensorLayer Community 1.2k Jan 04, 2023
"Neural Turing Machine" in Tensorflow

Neural Turing Machine in Tensorflow Tensorflow implementation of Neural Turing Machine. This implementation uses an LSTM controller. NTM models with m

Taehoon Kim 1k Dec 06, 2022
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions

A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions Kapoutsis, A.C., Chatzichristofis,

Athanasios Ch. Kapoutsis 5 Oct 15, 2022
Accurate identification of bacteriophages from metagenomic data using Transformer

PhaMer is a python library for identifying bacteriophages from metagenomic data. PhaMer is based on a Transorfer model and rely on protein-based vocab

Kenneth Shang 9 Nov 30, 2022
Adaptive FNO transformer - official Pytorch implementation

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers This repository contains PyTorch implementation of the Adaptive Fourier Neu

NVIDIA Research Projects 77 Dec 29, 2022
Doing the asl sign language classification on static images using graph neural networks.

SignLangGNN When GNNs 💜 MediaPipe. This is a starter project where I tried to implement some traditional image classification problem i.e. the ASL si

10 Nov 09, 2022
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

text recognition toolbox 1. 项目介绍 该项目是基于pytorch深度学习框架,以统一的改写方式实现了以下6篇经典的文字识别论文,论文的详情如下。该项目会持续进行更新,欢迎大家提出问题以及对代码进行贡献。 模型 论文标题 发表年份 模型方法划分 CRNN 《An End-t

168 Dec 24, 2022
Tools for investing in Python

InvestOps Original repository on GitHub Original author is Magnus Erik Hvass Pedersen Introduction This is a Python package with simple and effective

24 Nov 26, 2022
Re-implementation of the vector capsule with dynamic routing

VectorCapsule Re-implementation of the vector capsule with dynamic routing We implement the vector capsule and dynamic routing via graph neural networ

ZhenchaoTang 10 Feb 10, 2022