An addernet CUDA version

Last update: Jun 20, 2022

Related tags

Overview

Training addernet accelerated by CUDA

Usage

cd adder_cuda
python setup.py install
cd ..
python main.py

Environment

pytorch 1.10.0 CUDA 11.3

benchmark

version	training_time_per_batch/s
raw	1.61
torch.cdist	1.49
cuda_unoptimized	0.4508
this work	0.3158

The CUDA version of AdderNet has achieved a 5× speed increase over the original version. There seems to be some bugs in the Cuda_unoptimized version, causing the model to fail to converge. Its speed is still listed here for comparison. The experiment was run on RTX 2080Ti platform, and ResNet-20 based on CIFAR-10 was trained.

Time(%)	Time	Calls	Avg	Min	Max	Name
48.57	30.4752s	3920	7.7743ms	162.70us	12.271ms	CONV_BACKWARD
34.85	21.8686s	19680	1.1112ms	5.3770us	11.827ms	_ZN2at6native27unrolled_elementwise_kernel...
7.46	4.67901s	5920	790.37us	26.529us	1.5841ms	CONV
2.24	1.40372s	3920	358.09us	31.298us	845.80us	col2im_kernel
2.10	1.31882s	36862	35.777us	1.4720us	276.24us	vectorized_elementwise_kernel
1.43	900.03ms	5920	152.03us	7.9040us	372.40us	im2col_kernel

Here is the time distribution of training an epoch. If you are interested, you can continue to optimize the CUDA kernel.

An addernet CUDA version

Related tags

Overview

Training addernet accelerated by CUDA

Usage

Environment

benchmark

Owner

LingXY

Official code for article "Expression is enough: Improving traﬀic signal control with advanced traﬀic state representation"

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Using VideoBERT to tackle video prediction

Most popular metrics used to evaluate object detection algorithms.

Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution

In this project, two programs can help you take full agvantage of time on the model training with a remote server

git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

CARL provides highly configurable contextual extensions to several well-known RL environments.

Robotics with GPU computing

Image Completion with Deep Learning in TensorFlow

Object-aware Contrastive Learning for Debiased Scene Representation

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Simple tutorials using Google's TensorFlow Framework

A new test set for ImageNet

A curated list of Generative Deep Art projects, tools, artworks, and models

An implementation of Equivariant e2 convolutional kernals into a convolutional self attention network, applied to radio astronomy data.

Face and other object detection using OpenCV and ML Yolo