This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Overview

Orientation independent Möbius CNNs





This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Background (tl;dr)

All derivations and a detailed description of the models are found in Section 5 of our paper. What follows is an informal tl;dr, summarizing the central aspects of Möbius CNNs.

Feature fields on the Möbius strip: A key characteristic of the Möbius strip is its topological twist, making it a non-orientable manifold. Convolutional weight sharing on the Möbius strip is therefore only well defined up to a reflection of kernels. To account for the ambiguity of kernel orientations, one needs to demand that the kernel responses (feature vectors) transform in a predictable way when different orientations are chosen. Mathematically, this transformation is specified by a group representation ρ of the reflection group. We implement three different feature field types, each characterized by a choice of group representation:

  • scalar fields are modeled by the trivial representation. Scalars stay invariant under reflective gauge transformations:

  • sign-flip fields transform according to the sign-flip representation of the reflection group. Reflective gauge transformations negate the single numerical coefficient of a sign-flip feature:

  • regular feature fields are associated to the regular representation. For the reflection group, this implies 2-dimensional features whose two values (channels) are swapped by gauge transformations:

Reflection steerable kernels (gauge equivariance):

Convolution kernels on the Möbius strip are parameterized maps

whose numbers of input and output channels depend on the types of feature fields between which they map. Since a reflection of a kernel should result in a corresponding transformation of its output feature field, the kernel has to obey certain symmetry constraints. Specifically, kernels have to be reflection steerable (or gauge equivariant), i.e. should satisfy:

The following table visualizes this symmetry constraint for any pair of input and output field types that we implement:

Similar equivariance constraints are imposed on biases and nonlinearities; see the paper for more details.

Isometry equivariance: Shifts of the Möbius strip along itself are isometries. After one revolution (a shift by 2π), points on the strip do not return to themselves, but end up reflected along the width of the strip:

Such reflections of patterns are explained away by the reflection equivariance of the convolution kernels. Orientation independent convolutions are therefore automatically equivariant w.r.t. the action of such isometries on feature fields. Our empirical results, shown in the table below, confirm that this theoretical guarantee holds in practice. Conventional CNNs, on the other hand, are explicitly coordinate dependent, and are therefore in particular not isometry equivariant.

Implementation

Neural network layers are implemented in nn_layers.py while the models are found in models.py. All individual layers and all models are unit tested in unit_tests.py.

Feature fields: We assume Möbius strips with a locally flat geometry, i.e. strips which can be thought of as being constructed by gluing two opposite ends of a rectangular flat stripe together in a twisted way. Feature fields are therefore discretized on a regular sampling grid on a rectangular domain of pixels. Note that this choice induces a global gauge (frame field), which is discontinuous at the cut.

In practice, a neural network operates on multiple feature fields which are stacked in the channel dimension (a direct sum). Feature spaces are therefore characterized by their feature field multiplicities. For instance, one could have 10 scalar fields, 4 sign-flip fields and 8 regular feature fields, which consume in total channels. Denoting the batch size by , a feature space is encoded by a tensor of shape .

The correct transformation law of the feature fields is guaranteed by the coordinate independence (steerability) of the network layers operating on it.

Orientation independent convolutions and bias summation: The class MobiusConv implements orientation independent convolutions and bias summations between input and output feature spaces as specified by the multiplicity constructor arguments in_fields and out_fields, respectively. Kernels are as usual discretized by a grid of size*size pixels. The steerability constraints on convolution kernels and biases are implemented by allocating a reduced number of parameters, from which the symmetric (steerable) kernels and biases are expanded during the forward pass.

Coordinate independent convolutions rely furthermore on parallel transporters of feature vectors, which are implemented as a transport padding operation. This operation pads both sides of the cut with size//2 columns of pixels which are 1) spatially reflected and 2) reflection-steered according to the field types. The stripes are furthermore zero-padded along their width.

The forward pass operates then by:

  • expanding steerable kernels and biases from their non-redundant parameter arrays
  • transport padding the input field array
  • running a conventional Euclidean convolution

As the padding added size//2 pixels around the strip, the spatial resolution of the output field agrees with that of the input field.

Orientation independent nonlinearities: Scalar fields and regular feature fields are acted on by conventional ELU nonlinearities, which are equivariant for these field types. Sign-flip fields are processed by applying ELU nonlinearities to their absolute value after summing a learnable bias parameter. To ensure that the resulting fields are again transforming according to the sign-flip representation, we multiply them subsequently with the signs of the input features. See the paper and the class EquivNonlin for more details.

Feature field pooling: The module MobiusPool implements an orientation independent pooling operation with a stride and kernel size of two pixels, thus halving the fields' spatial resolution. Scalar and regular feature fields are pooled with a conventional max pooling operation, which is for these field types coordinate independent. As the coefficients of sign-flip fields negate under gauge transformations, they are pooled based on their (gauge invariant) absolute value.

While the pooling operation is tested to be exactly gauge equivariant, its spatial subsampling interferes inevitably with its isometry equivariance. Specifically, the pooling operation is only isometry equivariant w.r.t. shifts by an even number of pixels. Note that the same issue applies to conventional Euclidean CNNs as well; see e.g. (Azulay and Weiss, 2019) or (Zhang, 2019).

Models: All models are implemented in models.py. The orientation independent models, which differ only in their field type multiplicities but agree in their total number of channels, are implemented as class MobiusGaugeCNN. We furthermore implement conventional CNN baselines, one with the same number of channels and thus more parameters (α=1) and one with the same number of parameters but less channels (α=2). Since conventional CNNs are explicitly coordinate dependent they utilize a naive padding operation (MobiusPadNaive), which performs a spatial reflection of feature maps but does not apply the unspecified gauge transformation. The following table gives an overview of the different models:

Data - Möbius MNIST

We benchmark our models on Möbius MNIST, a simple classification dataset which consists of MNIST digits that are projected on the Möbius strip. Since MNIST digits are gray-scale images, they are geometrically identified as scalar fields. The size of the training set is by default set to 12000 digits, which agrees with the rotated MNIST dataset.

There are two versions of the training and test sets which consist of centered and shifted digits. All digits in the centered datasets occur at the same location (and the same orientation) of the strip. The isometry shifted digits appear at uniformly sampled locations. Recall that shifts once around the strip lead to a reflection of the digits as visualized above. The following digits show isometry shifted digits (note the reflection at the cut):

To generate the datasets it is sufficient to call convert_mnist.py, which downloads the original MNIST dataset via torchvision and saves the Möbius MNIST datasets in data/mobius_MNIST.npz.

Results

The models can then be trained by calling, for instance,

python train.py --model mobius_regular

For more options and further model types, consult the help message: python train.py -h

The following table gives an overview of the performance of all models in two different settings, averaged over 32 runs:

The setting "shifted train digits" trains and evaluates on isometry shifted digits. To test the isometry equivariance of the models, we train them furthermore on "centered train digits", testing them then out-of-distribution on shifted digits. As one can see, the orientation independent models generalize well over these unseen variations while the conventional coordinate dependent CNNs' performance deteriorates.

Dependencies

This library is based on Python3.7. It requires the following packages:

numpy
torch>=1.1
torchvision>=0.3

Logging via tensorboard is optional.

Owner
Maurice Weiler
AI researcher with a focus on geometric and equivariant deep learning. PhD candidate under the supervision of Max Welling. Master's degree in Physics.
Maurice Weiler
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 04, 2023
PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

SelfReg PyTorch official implementation of Self-supervised Contrastive Regularization for Domain Generalization (SelfReg, https://arxiv.org/abs/2104.0

64 Dec 16, 2022
Interpretation of T cell states using reference single-cell atlases

Interpretation of T cell states using reference single-cell atlases ProjecTILs is a computational method to project scRNA-seq data into reference sing

Cancer Systems Immunology Lab 139 Jan 03, 2023
SoK: Vehicle Orientation Representations for Deep Rotation Estimation

SoK: Vehicle Orientation Representations for Deep Rotation Estimation Raymond H. Tu, Siyuan Peng, Valdimir Leung, Richard Gao, Jerry Lan This is the o

FIRE Capital One Machine Learning of the University of Maryland 12 Oct 07, 2022
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 46 Sep 29, 2022
ML for NLP and Computer Vision.

Sparrow is our open-source ML product. It runs on Skipper MLOps infrastructure.

Katana ML 2 Nov 28, 2021
An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

Text2Event An implementation for Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction Please contact Yaojie Lu (@

Roger 153 Jan 07, 2023
《Rethinking Sptil Dimensions of Vision Trnsformers》(2021)

Rethinking Spatial Dimensions of Vision Transformers Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh | Paper NAVER

NAVER AI 224 Dec 27, 2022
ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

ObjectDrawer-ToolBox is a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system, Object Drawer.

77 Jan 05, 2023
Semantic segmentation task for ADE20k & cityscapse dataset, based on several models.

semantic-segmentation-tensorflow This is a Tensorflow implementation of semantic segmentation models on MIT ADE20K scene parsing dataset and Cityscape

HsuanKung Yang 83 Oct 13, 2022
This is a collection of our NAS and Vision Transformer work.

This is a collection of our NAS and Vision Transformer work.

Microsoft 828 Dec 28, 2022
Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

SimCLS Code for our paper: "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021 1. How to Install Requirements

Yixin Liu 150 Dec 12, 2022
Libraries, tools and tasks created and used at DeepMind Robotics.

Libraries, tools and tasks created and used at DeepMind Robotics.

DeepMind 270 Nov 30, 2022
A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

Katsuya Hyodo 6 May 15, 2022
Fiddle is a Python-first configuration library particularly well suited to ML applications.

Fiddle Fiddle is a Python-first configuration library particularly well suited to ML applications. Fiddle enables deep configurability of parameters i

Google 227 Dec 26, 2022
Gradient representations in ReLU networks as similarity functions

Gradient representations in ReLU networks as similarity functions by Dániel Rácz and Bálint Daróczy. This repo contains the python code related to our

1 Oct 08, 2021
Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Trash-Sorter-Extraordinaire Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash

Rameen Mahmood 1 Nov 07, 2021
A criticism of a recent paper on buggy image downsampling methods in popular image processing and deep learning libraries.

A criticism of a recent paper on buggy image downsampling methods in popular image processing and deep learning libraries.

70 Jul 12, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

34 May 24, 2022