PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Overview

PixelPick

This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

[Project page] [Paper]

Table of contents

Abstract

A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training. In this work, we show that in order to achieve a good level of segmentation performance, all you need are a few well-chosen pixel labels. We make the following contributions: (i) We investigate the novel semantic segmentation setting in which labels are supplied only at sparse pixel locations, and show that deep neural networks can use a handful of such labels to good effect; (ii) We demonstrate how to exploit this phenomena within an active learning framework, termed PixelPick, to radically reduce labelling cost, and propose an efficient “mouse-free” annotation strategy to implement our approach; (iii) We conduct extensive experiments to study the influence of annotation diversity under a fixed budget, model pretraining, model capacity and the sampling mechanism for picking pixels in this low annotation regime; (iv) We provide comparisons to the existing state of the art in semantic segmentation with active learning, and demonstrate comparable performance with up to two orders of magnitude fewer pixel annotations on the CamVid, Cityscapes and PASCAL VOC 2012 benchmarks; (v) Finally, we evaluate the efficiency of our annotation pipeline and its sensitivity to annotator error to demonstrate its practicality. Our code, models and annotation tool will be made publicly available.

Installation

Prerequisites

Our code is based on Python 3.8 and uses the following Python packages.

torch>=1.8.1
torchvision>=0.9.1
tqdm>=4.59.0
cv2>=4.5.1.48
Clone this repository
git clone https://github.com/NoelShin/PixelPick.git
cd PixelPick
Download dataset

Follow one of the instructions below to download a dataset you are interest in. Then, set the dir_dataset variable in args.py to the directory path which contains the downloaded dataset.

  • For CamVid, you need to download SegNet-Tutorial codebase as a zip file and use CamVid directory which contains images/annotations for training and test after unzipping it. You don't need to change the directory structure. [CamVid]

  • For Cityscapes, first visit the link and login to download. Once downloaded, you need to unzip it. You don't need to change the directory structure. It is worth noting that, if you set downsample variable in args.py (4 by default), it will first downsample train and val images of Cityscapes and store them within {dir_dataset}_d{downsample} folder which will be located in the same directory of dir_dataset. This is to enable a faster dataloading during training. [Cityscapes]

  • For PASCAL VOC 2012, the dataset will be automatically downloaded via torchvision.datasets.VOCSegmentation. You just need to specify which directory you want to download it with dir_dataset variable. If the automatic download fails, you can manually download through the following page (you don't need to untar VOCtrainval_11-May-2012.tar file which will be downloaded). [PASCAL VOC 2012 segmentation]

For more details about the data we used to train/validate our model, please visit datasets directory and find {camvid, cityscapes, voc}_{train, val}.txt file.

Train and validate

By default, the current code validates the model every epoch while training. To train a MobileNetv2-based DeepLabv3+ network, follow the below lines. (The pretrained MobileNetv2 will be loaded automatically.)

cd scripts
sh pixelpick-dl-cv.sh

Benchmark results

For CamVid and Cityscapes, we report the average of 5 different runs and 3 different runs for PASCAL VOC 2012. Please refer to our paper for details. ± one std of mean IoU is denoted.

CamVid
model backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%)
PixelPick MobileNetv2 20 (0.012) 50.8 ± 0.2
PixelPick MobileNetv2 40 (0.023) 53.9 ± 0.7
PixelPick MobileNetv2 60 (0.035) 55.3 ± 0.5
PixelPick MobileNetv2 80 (0.046) 55.2 ± 0.7
PixelPick MobileNetv2 100 (0.058) 55.9 ± 0.1
Fully-supervised MobileNetv2 360x480 (100) 58.2 ± 0.6
PixelPick ResNet50 20 (0.012) 59.7 ± 0.9
PixelPick ResNet50 40 (0.023) 62.3 ± 0.5
PixelPick ResNet50 60 (0.035) 64.0 ± 0.3
PixelPick ResNet50 80 (0.046) 64.4 ± 0.6
PixelPick ResNet50 100 (0.058) 65.1 ± 0.3
Fully-supervised ResNet50 360x480 (100) 67.8 ± 0.3
Cityscapes

Note that to make training time manageable, we train on the quarter resolution (256x512) of the original Cityscapes images (1024x2048).

model backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%)
PixelPick MobileNetv2 20 (0.015) 52.0 ± 0.6
PixelPick MobileNetv2 40 (0.031) 54.7 ± 0.4
PixelPick MobileNetv2 60 (0.046) 55.5 ± 0.6
PixelPick MobileNetv2 80 (0.061) 56.1 ± 0.3
PixelPick MobileNetv2 100 (0.076) 56.5 ± 0.3
Fully-supervised MobileNetv2 256x512 (100) 61.4 ± 0.5
PixelPick ResNet50 20 (0.015) 56.1 ± 0.4
PixelPick ResNet50 40 (0.031) 60.0 ± 0.3
PixelPick ResNet50 60 (0.046) 61.6 ± 0.4
PixelPick ResNet50 80 (0.061) 62.3 ± 0.4
PixelPick ResNet50 100 (0.076) 62.8 ± 0.4
Fully-supervised ResNet50 256x512 (100) 68.5 ± 0.3
PASCAL VOC 2012
model backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%)
PixelPick MobileNetv2 10 (0.009) 51.7 ± 0.2
PixelPick MobileNetv2 20 (0.017) 53.9 ± 0.8
PixelPick MobileNetv2 30 (0.026) 56.7 ± 0.3
PixelPick MobileNetv2 40 (0.034) 56.9 ± 0.7
PixelPick MobileNetv2 50 (0.043) 57.2 ± 0.3
Fully-supervised MobileNetv2 N/A (100) 57.9 ± 0.5
PixelPick ResNet50 10 (0.009) 59.7 ± 0.8
PixelPick ResNet50 20 (0.017) 65.6 ± 0.5
PixelPick ResNet50 30 (0.026) 66.4 ± 0.2
PixelPick ResNet50 40 (0.034) 67.2 ± 0.1
PixelPick ResNet50 50 (0.043) 67.4 ± 0.5
Fully-supervised ResNet50 N/A (100) 69.4 ± 0.3

Models

model dataset backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%) Download
PixelPick CamVid MobileNetv2 100 (0.058) 56.1 Link
PixelPick CamVid ResNet50 100 (0.058) TBU TBU
PixelPick Cityscapes MobileNetv2 100 (0.076) 56.8 Link
PixelPick Cityscapes ResNet50 100 (0.076) 63.3 Link
PixelPick VOC 2012 MobileNetv2 50 (0.043) 57.4 Link
PixelPick VOC 2012 ResNet50 50 (0.043) 68.0 Link

PixelPick mouse-free annotation tool

Code for the annotation tool will be made available.

Citation

To be updated.

Acknowledgements

We borrowed code for the MobileNetv2-based DeepLabv3+ network from https://github.com/Shuai-Xie/DEAL.

If you have any questions, please contact us at {gyungin, weidi, samuel}@robots.ox.ac.uk.

Owner
Gyungin Shin
Serving others
Gyungin Shin
This is the code used in the paper "Entity Embeddings of Categorical Variables".

This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kagg

Cheng Guo 845 Nov 29, 2022
A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

ANEMONE A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21 Dependencies python==3.6.1 dgl==

Graph Analysis & Deep Learning Laboratory, GRAND 30 Dec 14, 2022
Weight estimation in CT by multi atlas techniques

maweight A Python package for multi-atlas based weight estimation for CT images, including segmentation by registration, feature extraction and model

György Kovács 0 Dec 24, 2021
Self-supervised Deep LiDAR Odometry for Robotic Applications

DeLORA: Self-supervised Deep LiDAR Odometry for Robotic Applications Overview Paper: link Video: link ICRA Presentation: link This is the correspondin

Robotic Systems Lab - Legged Robotics at ETH Zürich 181 Dec 29, 2022
Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Regression Transformer Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression . Development se

International Business Machines 27 Jan 05, 2023
🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds (CVPR 2020) This is the official implementation of RandLA-Net (CVPR2020, Oral

Qingyong 1k Dec 30, 2022
Code for Active Learning at The ImageNet Scale.

Code for Active Learning at The ImageNet Scale. This repository implements many popular active learning algorithms and allows training with torch's DDP.

Zeyad Emam 47 Dec 12, 2022
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
GRF: Learning a General Radiance Field for 3D Representation and Rendering

GRF: Learning a General Radiance Field for 3D Representation and Rendering [Paper] [Video] GRF: Learning a General Radiance Field for 3D Representatio

Alex Trevithick 243 Dec 29, 2022
Async API for controlling Hue Lights

Hue API Async API for controlling Hue Lights Documentation: hue-api.nirantak.com Source: github.com/nirantak/hue-api Installation This is an async cli

Nirantak Raghav 4 Nov 16, 2022
PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

Sapana Chaudhary 25 Dec 08, 2022
Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

EfficientZero (NeurIPS 2021) Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021. Environments Effi

Weirui Ye 671 Jan 03, 2023
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

transformer-slt This repository gathers data and code supporting the experiments in the paper Better Sign Language Translation with STMC-Transformer.

Kayo Yin 107 Dec 27, 2022
A Pytorch implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE

SMU_pytorch A Pytorch Implementation of SMU: SMOOTH ACTIVATION FUNCTION FOR DEEP NETWORKS USING SMOOTHING MAXIMUM TECHNIQUE arXiv https://arxiv.org/ab

Fuhang 36 Dec 24, 2022
Julia package for multiway (inverse) covariance estimation.

TensorGraphicalModels TensorGraphicalModels.jl is a suite of Julia tools for estimating high-dimensional multiway (tensor-variate) covariance and inve

Wayne Wang 3 Sep 23, 2022
The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

IGMTF The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting". Requirements The framework

Wentao Xu 24 Dec 05, 2022
🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Monitor deep learning model training and hardware usage from mobile. 🔥 Features Monitor running experiments from mobile phone (or laptop) Monitor har

labml.ai 1.2k Dec 25, 2022
structured-generative-modeling

This repository contains the implementation for the paper Information Theoretic StructuredGenerative Modeling, Specially thanks for the open-source co

0 Oct 11, 2021
This is the official source code of "BiCAT: Bi-Chronological Augmentation of Transformer for Sequential Recommendation".

BiCAT This is our TensorFlow implementation for the paper: "BiCAT: Sequential Recommendation with Bidirectional Chronological Augmentation of Transfor

John 15 Dec 06, 2022
LIVECell - A large-scale dataset for label-free live cell segmentation

LIVECell dataset This document contains instructions of how to access the data associated with the submitted manuscript "LIVECell - A large-scale data

Sartorius Corporate Research 112 Jan 07, 2023