Are Convolutional Neural Networks or Transformers more like human vision?

This repository contains the code and fine-tuned models of popular Convolutional Neural Networks (CNNs) and the recently proposed Vision Transformer (ViT) on the augmented Imagenet dataset and the shape/texture bias tests run on the Stylized Imagenet dataset.

This work compares CNNs and the ViT against humans in terms of error consistency beyond traditional metrics. Through these tests, we were able to show that recently proposed self-attention based Transformer models have more human-like errors that traditional CNNs.

Colab

You can directly run tests on the results using a Google Colaboratory without needing to install anything on your local machine. Click "Open in Colab" below:

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

If you use our experimental results or fine-tuned models, please cite:

@article{tuli2021cogsci,
      title={Are Convolutional Neural Networks or Transformers more like human vision?}, 
      author={Shikhar Tuli and Ishita Dasgupta and Erin Grant and Thomas L. Griffiths},
      year={2021},
      eprint={2105.07197},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Study of human inductive biases in CNNs and Transformers.

Related tags

Overview

Are Convolutional Neural Networks or Transformers more like human vision?

Colab

Developer

Cite this work

Owner

Shikhar Tuli

Easy and comprehensive assessment of predictive power, with support for neuroimaging features

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

An automated facial recognition based attendance system (desktop application)

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

PyKaldi GOP-DNN on Epa-DB

Bayesian dessert for Lasagne

ML models implementation practice

Perturb-and-max-product: Sampling and learning in discrete energy-based models

Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction".

Code and data for the paper "Hearing What You Cannot See"

Code for "Learning to Regrasp by Learning to Place"

A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

A Keras implementation of CapsNet in the paper: Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules

A library for hidden semi-Markov models with explicit durations

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

Ranking Models in Unlabeled New Environments （iccv21）

PyTorch code for the "Deep Neural Networks with Box Convolutions" paper