Implementation of the HMAX model of vision in PyTorch

Overview

PyTorch implementation of HMAX

PyTorch implementation of the HMAX model that closely follows that of the MATLAB implementation of The Laboratory for Computational Cognitive Neuroscience:

http://maxlab.neuro.georgetown.edu/hmax.html

The S and C units of the HMAX model can almost be mapped directly onto TorchVision's Conv2d and MaxPool2d layers, where channels are used to store the filters for different orientations. However, HMAX also implements multiple scales, which doesn't map nicely onto the existing TorchVision functionality. Therefore, each scale has its own Conv2d layer, which are executed in parallel.

Here is a schematic overview of the network architecture:

layers consisting of units with increasing scale
S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1
 \ /   \ /   \ /   \ /   \ /   \ /   \ /   \ /
  C1    C1    C1    C1    C1    C1    C1    C1
   \     \     \    |     /     /     /     /
           ALL-TO-ALL CONNECTIVITY
   /     /     /    |     \     \     \     \
  S2    S2    S2    S2    S2    S2    S2    S2
   |     |     |     |     |     |     |     |
  C2    C2    C2    C2    C2    C2    C2    C2

Installation

This script depends on the NumPy, SciPy, PyTorch and TorchVision packages.

Clone the repository somewhere and run the example.py script:

git clone https://github.com/wmvanvliet/pytorch_hmax
python example.py

Usage

See the example.py script on how to run the model on 10 example images.

You might also like...
Pytorch implementation of
Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Token Labeling: Training an 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (arxiv) This is a Pytorch implementation of our te

This repository contains a pytorch implementation of
This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

PyTorch implementation of
PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

mlp-mixer-pytorch PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021) Usage import torch from mlp_mixer

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.
Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

A PyTorch Implementation of ViT (Vision Transformer)
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Pytorch implementation of the DeepDream computer vision algorithm
Pytorch implementation of the DeepDream computer vision algorithm

deep-dream-in-pytorch Pytorch (https://github.com/pytorch/pytorch) implementation of the deep dream (https://en.wikipedia.org/wiki/DeepDream) computer

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Unofficial PyTorch implementation of MobileViT based on paper
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Comments
  • Provide direct (not nested) path to stimuli

    Provide direct (not nested) path to stimuli

    Hi,

    great repo and effort. I really admire your courage to write HMAX in python. I have a question about loading data in, namely about this part of the code: https://github.com/wmvanvliet/pytorch_hmax/blob/master/example.py#L18

    I know that by default, the ImageFolder expects to have nested folders (as stated in docs or mentioned in this issue) but it's quite clumsy in this case. Eg even if you look at your example, having subfolders for each photo just doesn't look good. Would you have a way how to go around this? Any suggestion on how to provide only a path to all images and not this nested path? I was reading some discussions but haven't figured out how to implement it.


    One more question (I didn't want to open an extra issue for that), shouldn't in https://github.com/wmvanvliet/pytorch_hmax/blob/master/example.py#L28 be batch_size=len(images)) instead of batch_size=10 (written symbolically)?

    Thanks.

    opened by jankaWIS 5
  • Input of non-square images fails

    Input of non-square images fails

    Hi again, I was playing a bit around and discovered that it fails for non-square dimensional images, i.e. where height != width. Maybe I was looking wrong or missed something, but I haven't found it mentioned anywhere and the docs kind of suggests that it can be any height and any width. The same goes for the description of the layers (e.g. s1). In the other issue, you mentioned that

    One thing you may want to add to this transformer pipeline is a transforms.Resize followed by a transforms.CenterCrop to ensure all images end up having the same height and width

    but didn't mention why. Why is it not possible for non-square images? Is there any workaround if one doesn't want to crop? Maybe to pad like in this post*?

    To demonstrate the issue:

    import os
    import torch
    from torch.utils.data import DataLoader
    from torchvision import datasets, transforms
    import pickle
    
    import hmax
    
    path_hmax = './'
    # Initialize the model with the universal patch set
    print('Constructing model')
    model = hmax.HMAX(os.path.join(path_hmax,'universal_patch_set.mat'))
    
    # A folder with example images
    example_images = datasets.ImageFolder(
        os.path.join(path_hmax,'example_images'),
        transform=transforms.Compose([
            transforms.Resize((400, 500)),
            transforms.CenterCrop((400, 500)),
            transforms.Grayscale(),
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x * 255),
        ])
    )
    
    # A dataloader that will run through all example images in one batch
    dataloader = DataLoader(example_images, batch_size=10)
    
    # Determine whether there is a compatible GPU available
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    
    # Run the model on the example images
    print('Running model on', device)
    model = model.to(device)
    for X, y in dataloader:
        s1, c1, s2, c2 = model.get_all_layers(X.to(device))
    
    print('[done]')
    

    will give an error in the forward function:

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    [<ipython-input-7-a6bab15d9571>](https://localhost:8080/#) in <module>()
         33 model = model.to(device)
         34 for X, y in dataloader:
    ---> 35     s1, c1, s2, c2 = model.get_all_layers(X.to(device))
         36 
         37 # print('Saving output of all layers to: output.pkl')
    
    4 frames
    [/gdrive/MyDrive/Colab Notebooks/data_HMAX/pytorch_hmax/hmax.py](https://localhost:8080/#) in forward(self, c1_outputs)
        285             conv_output = conv_output.view(
        286                 -1, self.num_orientations, self.num_patches, conv_output_size,
    --> 287                 conv_output_size)
        288 
        289             # Pool over orientations
    
    RuntimeError: shape '[-1, 4, 400, 126, 126]' is invalid for input of size 203616000
    

    *Code for that:

    import torchvision.transforms.functional as F
    
    class SquarePad:
        def __call__(self, image):
            max_wh = max(image.size)
            p_left, p_top = [(max_wh - s) // 2 for s in image.size]
            p_right, p_bottom = [max_wh - (s+pad) for s, pad in zip(image.size, [p_left, p_top])]
            padding = (p_left, p_top, p_right, p_bottom)
            return F.pad(image, padding, 0, 'constant')
    
    target_image_size = (224, 224)  # as an example
    # now use it as the replacement of transforms.Pad class
    transform=transforms.Compose([
        SquarePad(),
        transforms.Resize(target_image_size),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ])
    
    opened by jankaWIS 1
Releases(v0.2)
  • v0.2(Jul 7, 2022)

    For this version, I've modified the HMAX code a bit to exactly match that of the original MATLAB code of Maximilian Riesenhuber. This is a bit slower and consumes a bit more memory, as the code needs to work around some subtle differences between the MATLAB and PyTorch functions. Perhaps in the future, we could add an "optimized" model that is allowed to deviate from the reference implementation for increased efficiency, but for now I feel it is more important to follow the reference implementation to the letter.

    Major change: default C2 activation function is now 'euclidean' instead of 'gaussian'.

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Jul 7, 2022)

Owner
Marijn van Vliet
Research Software Engineer.
Marijn van Vliet
SASM - simple crossplatform IDE for NASM, MASM, GAS and FASM assembly languages

SASM (SimpleASM) - простая кроссплатформенная среда разработки для языков ассемблера NASM, MASM, GAS, FASM с подсветкой синтаксиса и отладчиком. В SA

Dmitriy Manushin 5.6k Jan 06, 2023
Invertible conditional GANs for image editing

Invertible Conditional GANs This is the implementation of the IcGAN model proposed in our paper: Invertible Conditional GANs for image editing. Novemb

Guim 278 Dec 12, 2022
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

TorchRL Disclaimer This library is not officially released yet and is subject to change. The features are available before an official release so that

Meta Research 860 Jan 07, 2023
YuNetのPythonでのONNX、TensorFlow-Lite推論サンプル

YuNet-ONNX-TFLite-Sample YuNetのPythonでのONNX、TensorFlow-Lite推論サンプルです。 TensorFlow-LiteモデルはPINTO0309/PINTO_model_zoo/144_YuNetのものを使用しています。 Requirement Op

KazuhitoTakahashi 8 Nov 17, 2021
Implement A3C for Mujoco gym envs

pytorch-a3c-mujoco Disclaimer: my implementation right now is unstable (you ca refer to the learning curve below), I'm not sure if it's my problems. A

Andrew 70 Dec 12, 2022
Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation"

TriBERT This repository contains the code for the NeurIPS 2021 paper titled "TriBERT: Full-body Human-centric Audio-visual Representation Learning for

UBC Computer Vision Group 8 Aug 31, 2022
Framework web SnakeServer.

SnakeServer - Framework Web 🐍 Documentação oficial do framework SnakeServer. Conteúdo Sobre Como contribuir Enviar relatórios de segurança Pull reque

Jaedson Silva 0 Jul 21, 2022
Implementation of Multistream Transformers in Pytorch

Multistream Transformers Implementation of Multistream Transformers in Pytorch. This repository deviates slightly from the paper, where instead of usi

Phil Wang 47 Jul 26, 2022
Meta Language-Specific Layers in Multilingual Language Models

Meta Language-Specific Layers in Multilingual Language Models This repo contains the source codes for our paper On Negative Interference in Multilingu

Zirui Wang 20 Feb 13, 2022
Comp445 project - Data Communications & Computer Networks

COMP-445 Data Communications & Computer Networks Change Python version in Conda

Peng Zhao 2 Oct 03, 2022
Lua-parser-lark - An out-of-box Lua parser written in Lark

An out-of-box Lua parser written in Lark Such parser handles a relaxed version o

Taine Zhao 2 Jul 19, 2022
Pytorch implementation of Implicit Behavior Cloning.

Implicit Behavior Cloning - PyTorch (wip) Pytorch implementation of Implicit Behavior Cloning. Install conda create -n ibc python=3.8 pip install -r r

Kevin Zakka 49 Dec 25, 2022
上海交通大学全自动抢课脚本,支持准点开抢与抢课后持续捡漏两种模式。2021/06/08更新。

Welcome to Course-Bullying-in-SJTU-v3.1! 2021/6/8 紧急更新v3.1 更新说明 为了更好地保护用户隐私,将原来用户名+密码的登录方式改为微信扫二维码+cookie登录方式,不再需要配置使用pytesseract。在使用扫码登录模式时,请稍等,二维码将马

87 Sep 13, 2022
Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, and finding their unique parameters (e.g. death rate).

DINN We introduce Disease Informed Neural Networks (DINNs) — neural networks capable of learning how diseases spread, forecasting their progression, a

19 Dec 10, 2022
Read number plates with https://platerecognizer.com/

HASS-plate-recognizer Read vehicle license plates with https://platerecognizer.com/ which offers free processing of 2500 images per month. You will ne

Robin 69 Dec 30, 2022
NAS-Bench-x11 and the Power of Learning Curves

NAS-Bench-x11 NAS-Bench-x11 and the Power of Learning Curves Shen Yan, Colin White, Yash Savani, Frank Hutter. NeurIPS 2021. Surrogate NAS benchmarks

AutoML-Freiburg-Hannover 13 Nov 18, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Payphone 8 Nov 21, 2022
Official implement of "CAT: Cross Attention in Vision Transformer".

CAT: Cross Attention in Vision Transformer This is official implement of "CAT: Cross Attention in Vision Transformer". Abstract Since Transformer has

100 Dec 15, 2022
A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Segnet is deep fully convolutional neural network architecture for semantic pixel-wise segmentation. This is implementation of http://arxiv.org/pdf/15

Pradyumna Reddy Chinthala 190 Dec 15, 2022
Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.

PairRE Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors. This implementation of PairRE for Open Graph Benchmak datasets (

Alipay 65 Dec 19, 2022