TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Last update: Jan 06, 2023

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

- Semantic Flow for Fast and Accurate Scene Parsing
- Code and models: https://github.com/lxtGH/SFSegNets

Implemented Papers

Image Classification
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
- ResNet: Deep Residual Learning for Image Recognition
- DenseNet: Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Semantic Segmentation
- DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
- PSPNet: Pyramid Scene Parsing Network
- DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
- Asymmetric Non-local Neural Networks for Semantic Segmentation
- Semantic Flow for Fast and Accurate Scene Parsing
Object Detection
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- YOLOv3: An Incremental Improvement
- FPN: Feature Pyramid Networks for Object Detection
Pose Estimation
- CPM: Convolutional Pose Machines
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Instance Segmentation
- Mask R-CNN
Generative Adversarial Networks
- Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

ImageNet (Center Crop Test): 224x224

Model	Train	Test	Top-1	Top-5	BS	Iters	Scripts
ResNet50	train	val	77.54	93.59	512	30W	ResNet50
ResNet101	train	val	78.94	94.56	512	30W	ResNet101
ShuffleNetV2x0.5	train	val	60.90	82.54	1024	40W	ShuffleNetV2x0.5
ShuffleNetV2x1.0	train	val	69.71	88.91	1024	40W	ShuffleNetV2x1.0
DFNetV1	train	val	70.99	89.68	1024	40W	DFNetV1
DFNetV2	train	val	74.22	91.61	1024	40W	DFNetV2

Semantic Segmentation

Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769

Model	Backbone	Train	Test	mIOU	BS	Iters	Scripts
PSPNet	3x3-Res101	train	val	78.20	8	4W	PSPNet
DeepLabV3	3x3-Res101	train	val	79.13	8	4W	DeepLabV3

ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520

Model	Backbone	Train	Test	mIOU	PixelACC	BS	Iters	Scripts
PSPNet	3x3-Res50	train	val	41.52	80.09	16	15W	PSPNet
DeepLabv3	3x3-Res50	train	val	42.16	80.36	16	15W	DeepLabV3
PSPNet	3x3-Res101	train	val	43.60	81.30	16	15W	PSPNet
DeepLabv3	3x3-Res101	train	val	44.13	81.42	16	15W	DeepLabV3

Object Detection

Pascal VOC2007/2012 (Single Scale Test): 20 Classes

Model	Backbone	Train	Test	mAP	BS	Epochs	Scripts
SSD300	VGG16	07+12_trainval	07_test	0.786	32	235	SSD300
SSD512	VGG16	07+12_trainval	07_test	0.808	32	235	SSD512
Faster R-CNN	VGG16	07_trainval	07_test	0.706	1	15	Faster R-CNN

Pose Estimation

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

Mask R-CNN

Generative Adversarial Networks

Pix2pix
CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Resume Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Validate

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag

Testing:

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

Demos with TorchCV

Example output of VGG19-OpenPose

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Implemented Papers

QuickStart with TorchCV

Performances with TorchCV

Image Classification

Semantic Segmentation

Object Detection

Pose Estimation

Instance Segmentation

Generative Adversarial Networks

DataSets with TorchCV

Commands with TorchCV

Demos with TorchCV

Owner

Donny You

Implementation of E(n)-Transformer, which extends the ideas of Welling's E(n)-Equivariant Graph Neural Network to attention

This is an official source code for implementation on Extensive Deep Temporal Point Process

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

A light and fast one class detection framework for edge devices. We provide face detector, head detector, pedestrian detector, vehicle detector......

Toolbox of models, callbacks, and datasets for AI/ML researchers.

Contrastively Disentangled Sequential Variational Audoencoder

GraPE is a Rust/Python library for high-performance Graph Processing and Embedding.

Improving Non-autoregressive Generation with Mixup Training

Quantized tflite models for ailia TFLite Runtime

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Explainability for Vision Transformers (in PyTorch)

Find-Lane-Line - Use openCV library and Python to detect the road-lane-line

Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image

Simulation of self-focusing of laser beams in condensed media

A python library for implementing a recommender system

This repository contains datasets and baselines for benchmarking Chinese text recognition.

Code for DeepCurrents: Learning Implicit Representations of Shapes with Boundaries

Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

基于Pytorch实现优秀的自然图像分割框架！(包括FCN、U-Net和Deeplab)