An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Related tags

Deep Learningbassl
Overview

KakaoBrain pytorch pytorch-lightning

BaSSL

This is an official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL) [arxiv]

  • The method is a self-supervised learning algorithm that learns a model to capture contextual transition across boundaries during the pre-training stage. To be specific, the method leverages pseudo-boundaries and proposes three novel boundary-aware pretext tasks effective in maximizing intra-scene similarity and minimizing inter-scene similarity, thus leading to higher performance in video scene segmentation task.

1. Environmental Setup

We have tested the implementation on the following environment:

  • Python 3.7.7 / PyTorch 1.7.1 / torchvision 0.8.2 / CUDA 11.0 / Ubuntu 18.04

Also, the code is based on pytorch-lightning (==1.3.8) and all necessary dependencies can be installed by running following command.

$ pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install -r requirements.txt

# (optional) following installation of pillow-simd sometimes brings faster data loading.
$ pip uninstall pillow && CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

2. Prepare Data

We provide data download script for raw key-frames of MovieNet-SSeg dataset, and our re-formatted annotation files applicable for BaSSL. FYI, our script will automatically download and decompress data---1) key-frames (160G), 2) annotations (200M)---into /bassl/data/movienet .

# download movienet data
$ cd <path-to-root>
$ bash script/download_movienet_data.sh
# 
   
    /bassl/data
   
movienet
│─ 240P_frames
│    │─ tt0120885                 # movie id (or video id)
│    │    │─ shot_0000_img_0.jpg
│    │    │─ shot_0000_img_1.jpg
│    │    │─ shot_0000_img_2.jpg  # for each shot, three key-frames are given.
|    |    ::    │─ shot_1256_img_2.jpg
│    |    
│    │─ tt1093906
│         │─ shot_0000_img_0.jpg
│         │─ shot_0000_img_1.jpg
│         │─ shot_0000_img_2.jpg
|         :
│         │─ shot_1270_img_2.jpg
│
│─anno
     │─ anno.pretrain.ndjson
     │─ anno.trainvaltest.ndjson
     │─ anno.train.ndjson
     │─ anno.val.ndjson
     │─ anno.test.ndjson
     │─ vid2idx.json

3. Train (Pre-training and Fine-tuning)

We use Hydra to provide flexible training configurations. Below examples explain how to modify each training parameter for your use cases.
We assume that you are in (i.e., root of this repository).

3.1. Pre-training

(1) Pre-training BaSSL
Our pre-training is based on distributed environment (multi-GPUs training) using ddp environment supported by pytorch-lightning.
The default setting requires 8-GPUs (of V100) with a batch of 256. However, you can set the parameter config.DISTRIBUTED.NUM_PROC_PER_NODE to the number of gpus you can use or change config.TRAIN.BATCH_SIZE.effective_batch_size. You can run a single command cd bassl; bash ../scripts/run_pretrain_bassl.sh or following full command:

cd <path-to-root>/bassl
EXPR_NAME=bassl
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.DISTRIBUTED.NUM_NODES=1 \
    config.DISTRIBUTED.NUM_PROC_PER_NODE=8 \
    config.TRAIN.BATCH_SIZE.effective_batch_size=256

Note that the checkpoints are automatically saved in bassl/pretrain/ckpt/ and log files (e.g., tensorboard) are saved in `bassl/pretrain/logs/ .

(2) Running with various loss combinations
Each objective can be turned on and off independently.

cd <path-to-root>/bassl
EXPR_NAME=bassl_all_pretext_tasks
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.LOSS.shot_scene_matching.enabled=true \
    config.LOSS.contextual_group_matching.enabled=true \
    config.LOSS.pseudo_boundary_prediction.enabled=true \
    config.LOSS.masked_shot_modeling.enabled=true

(3) Pre-training shot-level pre-training baselines
Shot-level pre-training methods can be trained by setting config.LOSS.sampling_method.name as one of followings:

  • instance (Simclr_instance), temporal (Simclr_temporal), shotcol (Simclr_NN).
    And, you can choose two more options: bassl (BaSSL), and bassl+shotcol (BaSSL+ShotCoL).
    Below example is for Simclr_NN, i.e., ShotCoL. Choose your favorite option ;)
cd <path-to-root>/bassl
EXPR_NAME=Simclr_NN
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/main.py \
    config.EXPR_NAME=${EXPR_NAME} \
    config.LOSS.sampleing_method.name=shotcol \

3.2. Fine-tuning

(1) Simple running a single command to fine-tune pre-trained models
Firstly, download the checkpoints provided in Model Zoo section and move them into bassl/pretrain/ckpt.

cd <path-to-root>/bassl

# for fine-tuning BaSSL (10 epoch)
bash ../scripts/finetune_bassl.sh

# for fine-tuning Simclr_NN (i.e., ShotCoL)
bash ../scripts/finetune_shot-level_baseline.sh

The full process (i.e., extraction of shot-level representation followed by fine-tuning) is described in below.

(2) Extracting shot-level features from shot key-frames
For computational efficiency, we pre-extract shot-level representation and then fine-tune pre-trained models.
Set LOAD_FROM to EXPR_NAME used in the pre-training stage and change config.DISTRIBUTED.NUM_PROC_PER_NODE as the number of GPUs you can use. Then, the extracted shot-level features are saved in /bassl/data/movienet/features/ .

cd <path-to-root>/bassl
LOAD_FROM=bassl
WORK_DIR=$(pwd)
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/pretrain/extract_shot_repr.py \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	+config.LOAD_FROM=${LOAD_FROM}

(3) Fine-tuning and evaluation

cd <path-to-root>/bassl
WORK_DIR=$(pwd)

# Pre-training methods: bassl and bassl+shotcol
# which learn CRN network during the pre-training stage
LOAD_FROM=bassl
EXPR_NAME=transfer_finetune_${LOAD_FROM}
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/finetune/main.py \
	config.TRAIN.BATCH_SIZE.effective_batch_size=1024 \
	config.EXPR_NAME=${EXPR_NAME} \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	config.TRAIN.OPTIMIZER.lr.base_lr=0.0000025 \
	+config.PRETRAINED_LOAD_FROM=${LOAD_FROM}

# Pre-training methods: instance, temporal, shotcol
# which DO NOT learn CRN network during the pre-training stage
# thus, we use different base learning rate (determined after hyperparameter search)
LOAD_FROM=shotcol_pretrain
EXPR_NAME=finetune_scratch_${LOAD_FROM}
PYTHONPATH=${WORK_DIR} python3 ${WORK_DIR}/finetune/main.py \
	config.TRAIN.BATCH_SIZE.effective_batch_size=1024 \
	config.EXPR_NAME=${EXPR_NAME} \
	config.DISTRIBUTED.NUM_NODES=1 \
	config.DISTRIBUTED.NUM_PROC_PER_NODE=1 \
	config.TRAIN.OPTIMIZER.lr.base_lr=0.000025 \
	+config.PRETRAINED_LOAD_FROM=${LOAD_FROM}

4. Model Zoo

We provide pre-trained checkpoints trained in a self-supervised manner.
After fine-tuning with the checkpoints, the models will give scroes that are almost similar to ones shown below.

Method AP Checkpoint (pre-trained)
SimCLR (instance) 51.51 download
SimCLR (temporal) 50.05 download
SimCLR (NN) 51.17 download
BaSSL (10 epoch) 56.26 download
BaSSL (40 epoch) 57.40 download

5. Citation

If you find this code helpful for your research, please cite our paper.

@article{mun2022boundary,
  title={Boundary-aware Self-supervised Learning for Video Scene Segmentation},
  author={Mun, Jonghwan and Shin, Minchul and Han, Gunsu and
          Lee, Sangho and Ha, Sungsu and Lee, Joonseok and Kim, Eun-sol},
  journal={arXiv preprint arXiv:2201.05277},
  year={2022}
}

6. Contact for Issues

Jonghwan Mun, [email protected]
Minchul Shin, [email protected]

7. License

This project is licensed under the terms of the Apache License 2.0. Copyright 2021 Kakao Brain Corp. All Rights Reserved.

Owner
Kakao Brain
Kakao Brain Corp.
Kakao Brain
😇A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc

------ Update September 2018 ------ It's been a year since TorchMoji and DeepMoji were released. We're trying to understand how it's being used such t

Hugging Face 865 Dec 24, 2022
Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

26 Dec 07, 2022
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI G

Robin Henry 99 Dec 12, 2022
JupyterLite demo deployed to GitHub Pages 🚀

JupyterLite Demo JupyterLite deployed as a static site to GitHub Pages, for demo purposes. ✨ Try it in your browser ✨ ➡️ https://jupyterlite.github.io

JupyterLite 223 Jan 04, 2023
TAug :: Time Series Data Augmentation using Deep Generative Models

TAug :: Time Series Data Augmentation using Deep Generative Models Note!!! The package is under development so be careful for using in production! Fea

35 Dec 06, 2022
Tensorflow/Keras Plug-N-Play Deep Learning Models Compilation

DeepBay This project was created with the objective of compile Machine Learning Architectures created using Tensorflow or Keras. The architectures mus

Whitman Bohorquez 4 Sep 26, 2022
Educational API for 3D Vision using pose to control carton.

Educational API for 3D Vision using pose to control carton.

41 Jul 10, 2022
Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021.

UniRE Source code for "UniRE: A Unified Label Space for Entity Relation Extraction.", ACL2021. Requirements python: 3.7.6 pytorch: 1.8.1 transformers:

Wang Yijun 109 Nov 29, 2022
Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SemCo The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

42 Nov 14, 2022
Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)

IGNN Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" [paper] [supp] Prepare datasets 1 Download training dataset

Shangchen Zhou 278 Jan 03, 2023
Dynamic hair modeling from monocular videos using deep neural networks

Dynamic Hair Modeling The source code of the networks for our paper "Dynamic hair modeling from monocular videos using deep neural networks" (SIGGRAPH

53 Oct 18, 2022
Python implementation of NARS (Non-Axiomatic-Reasoning-System)

Python implementation of NARS (Non-Axiomatic-Reasoning-System)

Bowen XU 11 Dec 20, 2022
HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep.

HODEmu HODEmu, is both an executable and a python library that is based on Ragagnin 2021 in prep. and emulates satellite abundance as a function of co

Antonio Ragagnin 1 Oct 13, 2021
The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

MultiModal-Collaborative (MMC) Learning Framework for integrating RGB and Thermal spectral modalities This is the official code for NeurIPS 2021 Machi

NeurAI 12 Nov 02, 2022
Agent-based model simulator for air quality and pandemic risk assessment in architectural spaces

Agent-based model simulation for air quality and pandemic risk assessment in architectural spaces. User Guide archABM is a fast and open source agent-

Vicomtech 10 Dec 05, 2022
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Håkon Hukkelås 30 Nov 18, 2022
Visualization toolkit for neural networks in PyTorch! Demo -->

FlashTorch A Python visualization toolkit, built with PyTorch, for neural networks in PyTorch. Neural networks are often described as "black box". The

Misa Ogura 692 Dec 29, 2022
This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

Sophia Althammer 33 Aug 26, 2022
Good Classification Measures and How to Find Them

Good Classification Measures and How to Find Them This repository contains supplementary materials for the paper "Good Classification Measures and How

Yandex Research 7 Nov 13, 2022
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 06, 2022