YolactEdge: Real-time Instance Segmentation on the Edge

Overview

YolactEdge: Real-time Instance Segmentation on the Edge

██╗   ██╗ ██████╗ ██╗      █████╗  ██████╗████████╗    ███████╗██████╗  ██████╗ ███████╗
╚██╗ ██╔╝██╔═══██╗██║     ██╔══██╗██╔════╝╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ╚████╔╝ ██║   ██║██║     ███████║██║        ██║       █████╗  ██║  ██║██║  ███╗█████╗  
  ╚██╔╝  ██║   ██║██║     ██╔══██║██║        ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝  
   ██║   ╚██████╔╝███████╗██║  ██║╚██████╗   ██║       ███████╗██████╔╝╚██████╔╝███████╗
   ╚═╝    ╚═════╝ ╚══════╝╚═╝  ╚═╝ ╚═════╝   ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images. This is the code for our paper.

For a real-time demo and more samples, check out our demo video.

example-gif-1

example-gif-2

example-gif-3

Installation

See INSTALL.md.

Model Zoo

We provide baseline YOLACT and YolactEdge models trained on COCO and YouTube VIS (our sub-training split, with COCO joint training).

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands.

YouTube VIS models:

Method Backbone  mAP AGX-Xavier FPS RTX 2080 Ti FPS weights
YOLACT R-50-FPN 44.7 8.5 59.8 download | mirror
YolactEdge
(w/o TRT)
R-50-FPN 44.2 10.5 67.0 download | mirror
YolactEdge R-50-FPN 44.0 32.4 177.6 download | mirror
YOLACT R-101-FPN 47.3 5.9 42.6 download | mirror
YolactEdge
(w/o TRT)
R-101-FPN 46.9 9.5 61.2 download | mirror
YolactEdge R-101-FPN 46.2 30.8 172.7 download | mirror

COCO models:

Method    Backbone     mAP Titan Xp FPS AGX-Xavier FPS RTX 2080 Ti FPS weights
YOLACT MobileNet-V2 22.1 - 15.0 35.7 download | mirror
YolactEdge MobileNet-V2 20.8 - 35.7 161.4 download | mirror
YOLACT R-50-FPN 28.2 42.5 9.1 45.0 download | mirror
YolactEdge R-50-FPN 27.0 - 30.7 140.3 download | mirror
YOLACT R-101-FPN 29.8 33.5 6.6 36.5 download | mirror
YolactEdge R-101-FPN 29.5 - 27.3 124.8 download | mirror

Getting Started

Follow the installation instructions to set up required environment for running YolactEdge.

See instructions to evaluate and train with YolactEdge.

Colab Notebook

Try out our Colab Notebook with a live demo to learn about basic usage.

If you are interested in evaluating YolactEdge with TensorRT, we provide another Colab Notebook with TensorRT environment configuration on Colab.

Evaluation

Quantitative Results

# Convert each component of the trained model to TensorRT using the optimal settings and evaluate on the YouTube VIS validation set (our split).
python3 eval.py --trained_model=./weights/yolact_edge_vid_847_50000.pth

# Evaluate on the entire COCO validation set.
python3 eval.py --trained_model=./weights/yolact_edge_54_800000.pth

# Output a COCO JSON file for the COCO test-dev. The command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively. These files can then be submitted to the website for evaluation.
python3 eval.py --trained_model=./weights/yolact_edge_54_800000.pth --dataset=coco2017_testdev_dataset --output_coco_json

Qualitative Results

# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.3.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --display

Benchmarking

# Benchmark the trained model on the COCO validation set.
# Run just the raw model on the first 1k images of the validation set
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --benchmark --max_images=1000

Notes

Inference using models trained with YOLACT

If you have a pre-trained model with YOLACT, and you want to take advantage of either TensorRT feature of YolactEdge, simply specify the --config=yolact_edge_config in command line options, and the code will automatically detect and convert the model weights to be compatible.

python3 eval.py --config=yolact_edge_config --trained_model=./weights/yolact_base_54_800000.pth

Inference without Calibration

If you want to run inference command without calibration, you can either run with FP16-only TensorRT optimization, or without TensorRT optimization with corresponding configs. Refer to data/config.py for examples of such configs.

# Evaluate YolactEdge with FP16-only TensorRT optimization with '--use_fp16_tensorrt' option (replace all INT8 optimization with FP16).
python3 eval.py --use_fp16_tensorrt --trained_model=./weights/yolact_edge_54_800000.pth

# Evaluate YolactEdge without TensorRT optimization with '--disable_tensorrt' option.
python3 eval.py --disable_tensorrt --trained_model=./weights/yolact_edge_54_800000.pth

Images

# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png

# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png

# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder

Video

# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
# If video_multiframe > 1, then the trt_batch_size should be increased to match it or surpass it. 
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=0

# Process a video and save it to another file. This is unoptimized.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4

Use the help option to see a description of all available command line arguments:

python eval.py --help

Training

Make sure to download the entire dataset using the commands above.

  • To train, grab an imagenet-pretrained model and put it in ./weights.
    • For Resnet101, download resnet101_reducedfc.pth from here.
    • For Resnet50, download resnet50-19c8e357.pth from here.
    • For MobileNetV2, download mobilenet_v2-b0353104.pth from here.
  • Run one of the training commands below.
    • Note that you can press ctrl+c while training and it will save an *_interrupt.pth file at the current iteration.
    • All weights are saved in the ./weights directory by default with the file name __.pth.
# Trains using the base edge config with a batch size of 8 (the default).
python train.py --config=yolact_edge_config

# Resume training yolact_edge with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_edge_config --resume=weights/yolact_edge_10_32100.pth --start_iter=-1

# Use the help option to see a description of all available command line arguments
python train.py --help

Training on video dataset

# Pre-train the image based model
python train.py --config=yolact_edge_youtubevis_config

# Train the flow (warping) module
python train.py --config=yolact_edge_vid_trainflow_config --resume=./weights/yolact_edge_youtubevis_847_50000.pth

# Fine tune the network jointly
python train.py --config=yolact_edge_vid_config --resume=./weights/yolact_edge_vid_trainflow_144_100000.pth

Custom Datasets

You can also train on your own dataset by following these steps:

  • Depending on the type of your dataset, create a COCO-style (image) or YTVIS-style (video) Object Detection JSON annotation file for your dataset. The specification for this can be found here for COCO and YTVIS respectively. Note that we don't use some fields, so the following may be omitted:
    • info
    • liscense
    • Under image: license, flickr_url, coco_url, date_captured
    • categories (we use our own format for categories, see below)
  • Create a definition for your dataset under dataset_base in data/config.py (see the comments in dataset_base for an explanation of each field):
my_custom_dataset = dataset_base.copy({
    'name': 'My Dataset',

    'train_images': 'path_to_training_images',
    'train_info':   'path_to_training_annotation',

    'valid_images': 'path_to_validation_images',
    'valid_info':   'path_to_validation_annotation',

    'has_gt': True,
    'class_names': ('my_class_id_1', 'my_class_id_2', 'my_class_id_3', ...),

    # below is only needed for YTVIS-style video dataset.

    # whether samples all frames or key frames only.
    'use_all_frames': False,

    # the following four lines define the frame sampling strategy for the given dataset.
    'frame_offset_lb': 1,
    'frame_offset_ub': 4,
    'frame_offset_multiplier': 1,
    'all_frame_direction': 'allway',

    # 1 of K frames is annotated
    'images_per_video': 5,

    # declares a video dataset
    'is_video': True
})
  • Note that: class IDs in the annotation file should start at 1 and increase sequentially on the order of class_names. If this isn't the case for your annotation file (like in COCO), see the field label_map in dataset_base.
  • Finally, in yolact_edge_config in the same file, change the value for 'dataset' to 'my_custom_dataset' or whatever you named the config object above. Then you can use any of the training commands in the previous section.

Citation

If you use this code base in your work, please consider citing:

@article{yolactedge,
  author    = {Haotian Liu and Rafael A. Rivera Soto and Fanyi Xiao and Yong Jae Lee},
  title     = {YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS)},
  journal   = {arXiv preprint arXiv:2012.12259},
  year      = {2020},
}
@inproceedings{yolact-iccv2019,
  author    = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
  title     = {YOLACT: {Real-time} Instance Segmentation},
  booktitle = {ICCV},
  year      = {2019},
}

Contact

For questions about our paper or code, please contact Haotian Liu or Rafael A. Rivera-Soto.

Owner
Haotian Liu
Haotian Liu
A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

ARES This repository contains the code for ARES (Adversarial Robustness Evaluation for Safety), a Python library for adversarial machine learning rese

Tsinghua Machine Learning Group 377 Dec 20, 2022
Few-Shot Object Detection via Association and DIscrimination

Few-Shot Object Detection via Association and DIscrimination Code release of our NeurIPS 2021 paper: Few-Shot Object Detection via Association and DIs

Cao Yuhang 49 Dec 18, 2022
A Dataset for Direct Quotation Extraction and Attribution in News Articles.

DirectQuote - A Dataset for Direct Quotation Extraction and Attribution in News Articles DirectQuote is a corpus containing 19,760 paragraphs and 10,3

THUNLP-MT 9 Sep 23, 2022
Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Single-view robot pose and joint angle estimation via render & compare Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic CVPR: Conference on C

Yann Labbé 51 Oct 14, 2022
Example how to deploy deep learning model with aiohttp.

aiohttp-demos Demos for aiohttp project. Contents Imagetagger Deep Learning Image Classifier URL shortener Toxic Comments Classifier Moderator Slack B

aio-libs 661 Jan 04, 2023
GND-Nets (Graph Neural Diffusion Networks) in TensorFlow.

GNDC For submission to IEEE TKDE. Overview Here we provide the implementation of GND-Nets (Graph Neural Diffusion Networks) in TensorFlow. The reposit

Wei Ye 3 Aug 08, 2022
SberSwap Video Swap base on deep learning

SberSwap Video Swap base on deep learning

Sber AI 431 Jan 03, 2023
Python scripts form performing stereo depth estimation using the CoEx model in ONNX.

ONNX-CoEx-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the CoEx model in ONNX. Stereo depth estimation on the

Ibai Gorordo 8 Dec 29, 2022
Consistency Regularization for Adversarial Robustness

Consistency Regularization for Adversarial Robustness Official PyTorch implementation of Consistency Regularization for Adversarial Robustness by Jiho

40 Dec 17, 2022
基于Paddle框架的fcanet复现

fcanet-Paddle 基于Paddle框架的fcanet复现 fcanet 本项目基于paddlepaddle框架复现fcanet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: frazerlin-fcanet 数据准备 本项目已挂

QuanHao Guo 7 Mar 07, 2022
Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Transfer-Learn is an open-source and well-documented library for Transfer Learning. It is based on pure PyTorch with high performance and friendly API. Our code is pythonic, and the design is consist

THUML @ Tsinghua University 2.2k Jan 03, 2023
CTC segmentation python package

CTC segmentation CTC segmentation can be used to find utterances alignments within large audio files. This repository contains the ctc-segmentation py

Ludwig Kürzinger 217 Jan 04, 2023
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation

SUO-SLAM This repository hosts the code for our CVPR 2022 paper "Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation". ArXiv li

Robot Perception & Navigation Group (RPNG) 97 Jan 03, 2023
Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly Code for this paper Ultra-Data-Efficient GAN Tra

VITA 77 Oct 05, 2022
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022
Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir.

NetScanner.py Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir. Linux'da Kullanımı: git clone https://github.com/

4 Aug 23, 2021
Permute Me Softly: Learning Soft Permutations for Graph Representations

Permute Me Softly: Learning Soft Permutations for Graph Representations

Giannis Nikolentzos 7 Jul 10, 2022
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020 Oral) Figure: Face image editing controlled via style images and segmenta

Peihao Zhu 579 Dec 30, 2022
sense-py-AnishaBaishya created by GitHub Classroom

Compute Statistics Here we compute statistics for a bunch of numbers. This project uses the unittest framework to test functionality. Pass the tests T

1 Oct 21, 2021
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 01, 2023