A whale detector design for the Kaggle whale-detector challenge!

Overview

CNN (InceptionV1) + STFT based Whale Detection Algorithm

So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The objective of this challenge was to basically do a binary classification, (hence really a detection), on the existance of whale signals in the water.

It's a pretty cool problem that resonates with prior work I have done in underwater perception algorithm design - a freakishly hard problem I may add. (The speed of sound changes on you, multiple reflections from the environment, but probably the hardest of all being that it's hard to gather ground-truth). (<--- startup idea? πŸ’₯ )

Anyway! My approach is to first transform the 1D acoustic time-domain signal into a 2D time-frequency representation via the Short-Time-Fourier-Transform (STFT). We do this in the following way:

(Where K_F is the raw number of STFT frequency bands, n is the discrete time index, m is the temporal index of each STFT pixel, x[n] the raw audio signal being transformed, and k representing the index of each STFT pixel's frequency). In this way, we break the signal down into it's constituent time-frequency energy cells, (which are now pixels), but more crucially, we get a representation that has distinct features across time and frequency that will be correlated with each other. This then makes it ripe for a Convolutional Neural Network (CNN) to chew into.

Here is what a whale-signal's STFT looks like:

Pos whale spectrogram

Similarly, here's what a signal's STFT looks like without any whale signal. (Instead, there seems to be some short-time but uber wide band interference at some point in time).

Neg whale spectrogram

It's actually interesting, because there are basically so many more ways in which a signal can manifest itself as not a whale signal, VS as actually being a whale signal. Does that mean we can also frame the problem as learning the manifold of whale-signals and simply do outlier analysis on that? Something to think about. :)

Code Usage:

Ok - let us now talk about how to use the code:

The first thing you need to do is install PyTorch of course. Do this from here. I use a conda environment as they recommend, and I recommend you do the same.

Once this is done, activate your PyTorch environment.

Now we need to download the raw data. You can get that from Kaggle's site here. Unzip this data at a directory of your choosing. For the purpose of this tutorial, I am going to assume that you placed and unzipped the data as such: /Users/you/data/whaleData/. (We will only be using the training data so that we can split it into train/val/test. The reason is that we do not have access to Kaggle's test labels).

We are now going to do the following steps:

  • Convert the audio files into numpy STFT tensors:
    • python whaleDataCreatorToNumpy.py -s 1 -dataDir /Users/you/data/whaleData/train/ -labelcsv /Users/you/data/whaleData/train.csv -dataDirProcessed /Users/you/data/whaleData/processedData/ -ds 0.42 -rk 20 200
    • The -s 1 flag says we want to save the results, the -ds 0.42 says we want to downsample the STFT image by this amount, (to help with computation time), and the -rk 20 200 says that we want the "rows kept" to be indexed from 20 to 200. This is because the STFT is conjugate symmetric, but also because we make a determination by first swimming in the data, (I swear this pun is not intentional), that most of the informational content lies between those bands. (Again, the motivation is computational here as well).
  • Convert and split the STFT tensors into PyTorch training/val/test Torch tensors:
    • python whaleDataCreatorNumpyToTorchTensors.py -numpyDataDir /Users/you/data/whaleData/processedData/
    • Here, the original numpy tensors are first split and normalized, and then saved off into PyTorch tensors. (The split percentages are able to be user defined, I set the defaults set 20% for validation and 10% test). The PyTorch tensors are saved in the same directory as above.
  • Run the CNN classifier!
    • We are now ready to train the classifier! I have already designed an Inception-V1 CNN architecture, that can be loaded up automatically, and we can use this as so. The input dimensions are also guaranteed to be equal to the STFT image sizes here. At any rate, we do this like so:
    • python whaleClassifier.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -g 0 -e 1 -lr 0.0002 -L2 0.01 -mb 4 -dp 0 -s 3 -dnn 'inceptionModuleV1_75x45'
    • The g term controls whether or not we want to use a GPU to trian, e controls the number of epochs we want to train over, lr is the learning rate, L2 is the L2 penalization amount for regularization, mb is the minibatch size, (which will be double this as the training composes a mini-batch to have an equal number of positive and negative samples), dp controls data parallelism (moot without multiple GPUs, and is really just a flag on whether or not to use multiple GPUs), s controls when and how often we save the net weights and validation losses, (option 3 saves the best performing model), and finally, -dnn is a flag that controls which DNN architecture we want to use. In this way, you can write your own DNN arch, and then simply call it by whatever name you give it for actual use. (I did this after I got tired of hard-coding every single DNN I designed).
    • If everything is running smoothly, you should see something like this as training progresses:
    • The "time" here just shows how long it takes between the reporting of each validation score. (Since I ran this on my CPU, it's 30 seconds / report, but expect this to be at least an order of magnitude faster on a respectable GPU).
  • Evauluate the results!
    • When your training is complete, you can then then run this script to give you automatically generated ROC and PR curves for your network's performance:
    • python resultsVisualization.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -netDir .
    • After a good training session, you should get results that look like so:
    • I also show the normalized training / validation likelihoods and accuracies for the duration of the session:

So wow! An AUC of 0.9669! Not too shabby! Can still be improved, but considering the data looks like this below, our InceptionV1-CNN isn't doing too bad either. πŸ’₯

Owner
Tarin Ziyaee
Eng Manager @Facebook FRL neural interfaces | Director R&D @CTRL-labs neural inferfaces. | CTO @Voyage, autonomous vehicles | Perception @Apple Autonomous
Tarin Ziyaee
DeceFL: A Principled Decentralized Federated Learning Framework

DeceFL: A Principled Decentralized Federated Learning Framework This repository comprises codes that reproduce experiments in Ye, et al (2021), which

Huazhong Artificial Intelligence Lab (HAIL) 10 May 31, 2022
This repository contains several jupyter notebooks to help users learn to use neon, our deep learning framework

neon_course This repository contains several jupyter notebooks to help users learn to use neon, our deep learning framework. For more information, see

Nervana 92 Jan 03, 2023
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" 🌟 🌟 . πŸŽ“ Re

Shuai Shen 87 Dec 28, 2022
This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

CG3 This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning]. R

12 Oct 28, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs This is an implemetation of the paper Few-shot Relation Extraction via Baye

MilaGraph 36 Nov 22, 2022
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

Manipulator Learning This repository contains a set of manipulation environments that are compatible with OpenAI Gym and simulated in pybullet. In par

STARS Laboratory 5 Dec 08, 2022
This porject is intented to build the most accurate model for predicting the porbability of loan default

Estimating-Loan-Default-Probability IBA ML2 Mid-project / Kaggle Competition This porject is intented to build the most accurate model for predicting

Adil Gahramanov 1 Jan 24, 2022
[NeurIPS 2021] SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

SSUL - Official Pytorch Implementation (NeurIPS 2021) SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning Sun

Clova AI Research 44 Dec 27, 2022
Adversarially Learned Inference

Adversarially Learned Inference Code for the Adversarially Learned Inference paper. Compiling the paper locally From the repo's root directory, $ cd p

Mohamed Ishmael Belghazi 308 Sep 24, 2022
An implementation of EWC with PyTorch

EWC.pytorch An implementation of Elastic Weight Consolidation (EWC), proposed in James Kirkpatrick et al. Overcoming catastrophic forgetting in neural

Ryuichiro Hataya 166 Dec 22, 2022
Large-scale language modeling tutorials with PyTorch

Large-scale language modeling tutorials with PyTorch μ•ˆλ…•ν•˜μ„Έμš”. μ €λŠ” TUNiBμ—μ„œ λ¨Έμ‹ λŸ¬λ‹ μ—”μ§€λ‹ˆμ–΄λ‘œ 근무 쀑인 κ³ ν˜„μ›…μž…λ‹ˆλ‹€. 이 μžλ£ŒλŠ” λŒ€κ·œλͺ¨ μ–Έμ–΄λͺ¨λΈ κ°œλ°œμ— ν•„μš”ν•œ μ—¬λŸ¬κ°€μ§€ κΈ°μˆ λ“€μ„ μ†Œκ°œλ“œλ¦¬κΈ° μœ„ν•΄ λ§ˆλ ¨ν•˜μ˜€μœΌλ©° 기본적으둜

TUNiB 172 Dec 29, 2022
Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Yixuan Su 61 Jan 03, 2023
Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate

News 05/17/2021 To make the comparison on ZJU-MoCap easier, we save quantitative and qualitative results of other methods at here, including Neural Vo

ZJU3DV 748 Jan 07, 2023
Microscopy Image Cytometry Toolkit

Cytokit Cytokit is a collection of tools for quantifying and analyzing properties of individual cells in large fluorescent microscopy datasets with a

Hammer Lab 106 Jan 06, 2023
A state-of-the-art semi-supervised method for image recognition

Mean teachers are better role models Paper ---- NIPS 2017 poster ---- NIPS 2017 spotlight slides ---- Blog post By Antti Tarvainen, Harri Valpola (The

Curious AI 1.4k Jan 06, 2023
Code for "Multi-Compound Transformer for Accurate Biomedical Image Segmentation"

News The code of MCTrans has been released. if you are interested in contributing to the standardization of the medical image analysis community, plea

97 Jan 05, 2023
Multi Camera Calibration

Multi Camera Calibration 'modules/camera_calibration/app/camera_calibration.cpp' is for calculating extrinsic parameter of each individual cameras. 'm

7 Dec 01, 2022
An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

Andrew Stelmach 1 Jan 05, 2022
The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

ISC21-Descriptor-Track-1st The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track. You can check our solution

lyakaap 75 Jan 08, 2023