Official implementation of TMANet.

Related tags

Deep LearningTMANet
Overview

Temporal Memory Attention for Video Semantic Segmentation, arxiv

PWC PWC

Introduction

We propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50. (Accepted by ICIP2021)

If this codebase is helpful for you, please consider give me a star โญ ๐Ÿ˜Š .

image

Updates

2021/1: TMANet training and evaluation code released.

2021/6: Update README.md:

  • adding some Camvid dataset download links;
  • update 'camvid_video_process.py' script.

Usage

  • Install mmseg

    • Please refer to mmsegmentation to get installation guide.
    • This repository is based on mmseg-0.7.0 and pytorch 1.6.0.
  • Clone the repository

    git clone https://github.com/wanghao9610/TMANet.git
    cd TMANet
    pip install -e .
  • Prepare the datasets

    • Download Cityscapes dataset and Camvid dataset.

    • For Camvid dataset, we need to extract frames from downloaded videos according to the following steps:

      • Download the raw video from here, in which I provide a google drive link to download.
      • Put the downloaded raw video(e.g. 0016E5.MXF, 0006R0.MXF, 0005VD.MXF, 01TP_extract.avi) to ./data/camvid/raw .
      • Download the extracted images and labels from here and split.txt file from here, untar the tar.gz file to ./data/camvid , and we will get two subdirs "./data/camvid/images" (stores the images with annotations), and "./data/camvid/labels" (stores the ground truth for semantic segmentation). Reference the following shell command:
        cd TMANet
        cd ./data/camvid
        wget https://drive.google.com/file/d/1FcVdteDSx0iJfQYX2bxov0w_j-6J7plz/view?usp=sharing
        # or first download on your PC then upload to your server.
        tar -xf camvid.tar.gz 
      • Generate image_sequence dir frame by frame from the raw videos. Reference the following shell command:
        cd TMANet
        python tools/convert_datasets/camvid_video_process.py
    • For Cityscapes dataset, we need to request the download link of 'leftImg8bit_sequence_trainvaltest.zip' from Cityscapes dataset official webpage.

    • The converted/downloaded datasets store on ./data/camvid and ./data/cityscapes path.

      File structure of video semantic segmentation dataset is as followed.

      โ”œโ”€โ”€ data                                              โ”œโ”€โ”€ data                              
      โ”‚   โ”œโ”€โ”€ cityscapes                                    โ”‚   โ”œโ”€โ”€ camvid                        
      โ”‚   โ”‚   โ”œโ”€โ”€ gtFine                                    โ”‚   โ”‚   โ”œโ”€โ”€ images                    
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{img_suffix}                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{img_suffix}                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{img_suffix}                   โ”‚   โ”‚   โ”œโ”€โ”€ annotations               
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train.txt             
      โ”‚   โ”‚   โ”œโ”€โ”€ leftImg8bit                               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val.txt               
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ test.txt              
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{seg_map_suffix}               โ”‚   โ”‚   โ”œโ”€โ”€ labels                    
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{seg_map_suffix}               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{seg_map_suffix}   
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{seg_map_suffix}               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{seg_map_suffix}   
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{seg_map_suffix}   
      โ”‚   โ”‚   โ”œโ”€โ”€ leftImg8bit_sequence                      โ”‚   โ”‚   โ”œโ”€โ”€ image_sequence            
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{sequence_suffix}              โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{sequence_suffix}              โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{sequence_suffix}              
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   
      
  • Evaluation

    • Download the trained models for Cityscapes and Camvid. And put them on ./work_dirs/{config_file}
    • Run the following command(on Cityscapes):
    sh eval.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py
  • Training

    • Please download the pretrained ResNet-50 model, and put it on ./init_models .
    • Run the following command(on Cityscapes):
    sh train.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py

    Note: the above evaluation and training shell commands execute on Cityscapes, if you want to execute evaluation or training on Camvid, please replace the config file on the shell command with the config file of Camvid.

Citation

If you find TMANet is useful in your research, please consider citing:

@misc{wang2021temporal,
    title={Temporal Memory Attention for Video Semantic Segmentation}, 
    author={Hao Wang and Weining Wang and Jing Liu},
    year={2021},
    eprint={2102.08643},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Thanks mmsegmentation contribution to the community!

Owner
wanghao
wanghao
This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.

ReceptiveFieldAnalysisToolbox This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures usin

84 Nov 23, 2022
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

DRSAN A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution Karam Park, Jae Woong Soh, and Nam Ik Cho Environments U

4 May 10, 2022
Differentiable molecular simulation of proteins with a coarse-grained potential

Differentiable molecular simulation of proteins with a coarse-grained potential This repository contains the learned potential, simulation scripts and

UCL Bioinformatics Group 44 Dec 10, 2022
Differentiable simulation for system identification and visuomotor control

gradsim gradSim: Differentiable simulation for system identification and visuomotor control gradSim is a unified differentiable rendering and multiphy

105 Dec 18, 2022
YoloAll is a collection of yolo all versions. you you use YoloAll to test yolov3/yolov5/yolox/yolo_fastest

ๅฎ˜ๆ–น่ฎจ่ฎบ็พค QQ็พค๏ผš552703875 ๅพฎไฟก็พค๏ผš15158106211(ๅ…ˆๅŠ ไฝœ่€…ๅพฎไฟก๏ผŒๅ†้‚€่ฏทๅ…ฅ็พค) YoloAll้กน็›ฎ็ฎ€ไป‹ YoloAllๆ˜ฏไธ€ไธชๅฐ†ๅฝ“ๅ‰ไธปๆตYolo็‰ˆๆœฌ้›†ๆˆๅˆฐๅŒไธ€ไธชUI็•Œ้ขไธ‹็š„ๆŽจ็†้ข„ๆต‹ๅทฅๅ…ทใ€‚ๅฏไปฅ่ฟ…้€Ÿๅˆ‡ๆขไธๅŒ็š„yolo็‰ˆๆœฌ๏ผŒๅนถไธ”ๅฏไปฅ้’ˆๅฏนๅ›พ็‰‡๏ผŒ่ง†้ข‘๏ผŒๆ‘„ๅƒๅคด็ ๆต่ฟ›่กŒๅฎžๆ—ถๆŽจ็†๏ผŒๅฏไปฅๅพˆๆ–นไพฟ๏ผŒ็›ด่ง‚

DL-Practise 244 Jan 01, 2023
The VeriNet toolkit for verification of neural networks

VeriNet The VeriNet toolkit is a state-of-the-art sound and complete symbolic interval propagation based toolkit for verification of neural networks.

9 Dec 21, 2022
SPTAG: A library for fast approximate nearest neighbor search

SPTAG: A library for fast approximate nearest neighbor search SPTAG SPTAG (Space Partition Tree And Graph) is a library for large scale vector approxi

Microsoft 4.3k Jan 01, 2023
SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

Training Deep Learning Models on The Edge Training on the Edge enables continuous learning from new data for deployed neural networks on memory-constr

Brown University Scale Lab 4 Nov 18, 2022
A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.

Quick Notetaker add-on for NVDA The Quick Notetaker add-on is a wonderful tool which allows writing notes quickly and easily anytime and from any app

5 Dec 06, 2022
Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

Second-order Neural ODE Optimizer (NeurIPS 2021 Spotlight) [arXiv] โœ”๏ธ faster convergence in wall-clock time | โœ”๏ธ O(1) memory cost | โœ”๏ธ better test-tim

Guan-Horng Liu 39 Oct 22, 2022
a pytorch implementation of auto-punctuation learned character by character

Learning Auto-Punctuation by Reading Engadget Articles Link to Other of my work ๐ŸŒŸ Deep Learning Notes: A collection of my notes going from basic mult

Ge Yang 137 Nov 09, 2022
Underwater image enhancement

LANet Our work proposes an adaptive learning attention network (LANet) to solve the problem of color casts and low illumination in underwater images.

LiuShiBen 7 Sep 14, 2022
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and ap

3.4k Jan 04, 2023
Sequence-tagging using deep learning

Classification using Deep Learning Requirements PyTorch version = 1.9.1+cu111 Python version = 3.8.10 PyTorch-Lightning version = 1.4.9 Huggingface

Vineet Kumar 2 Dec 20, 2022
A framework for GPU based high-performance medical image processing and visualization

FAST is an open-source cross-platform framework with the main goal of making it easier to do high-performance processing and visualization of medical images on heterogeneous systems utilizing both mu

Erik Smistad 315 Dec 30, 2022
Yggdrasil - A simplistic bot designed to streamline your server experience

Ygggdrasil A simplistic bot designed to streamline your server experience. Desig

Sntx_ 1 Dec 14, 2022
Annotate datasets with a semi-trained or fully trained YOLOv5 model

YOLOv5 Auto Annotator Annotate datasets with a semi-trained or fully trained YOLOv5 model Prerequisites Ubuntu =20.04 Python =3.7 System dependencie

Akash James 3 May 14, 2022
Converts geometry node attributes to built-in attributes

Attribute Converter Simplifies converting attributes created by geometry nodes to built-in attributes like UVs or vertex colors, as a single click ope

Ivan Notaros 12 Dec 22, 2022
A texturizer that I just made. Nothing special here.

texturizer This is a little project that I did with an hour's time. It texturizes an image given a image and a texture to texturize it with. There is

1 Nov 11, 2021
The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

UNet-SIDE The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network. For Super Reso

TIANTIAN XU 1 Jan 13, 2022