This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Overview

Swin Transformer for Semantic Segmentation of satellite images

This repo contains the supported code and configuration files to reproduce semantic segmentation results of Swin Transformer. It is based on mmsegmentaion. In addition, we provide pre-trained models for the semantic segmentation of satellite images into basic classes (vegetation, buildings, roads). The full description of this work is available on arXiv.

Application on the Ampli ANR project

Goal

This repo was used as part of the Ampli ANR projet.

The goal was to do semantic segmentation on satellite photos to precisely identify the species and the density of the trees present in the pictures. However, due to the difficulty of recognizing the exact species of trees in the satellite photos, we decided to reduce the number of classes.

Dataset sources

To train and test the model, we used data provided by IGN which concerns French departments (Hautes-Alpes in our case). The following datasets have been used to extract the different layers:

  • BD Ortho for the satellite images
  • BD Foret v2 for vegetation data
  • BD Topo for buildings and roads

Important: note that the data precision is 50cm per pixel.

Initially, lots of classes were present in the dataset. We reduced the number of classes by merging them and finally retained the following ones:

  • Dense forest
  • Sparse forest
  • Moor
  • Herbaceous formation
  • Building
  • Road

The purpose of the two last classes is twofold. We first wanted to avoid trapping the training into false segmentation, because buildings and roads were visually present in the satellite images and were initially assigned a vegetation class. Second, the segmentation is more precise and gives more identification of the different image elements.

Dataset preparation

Our training and test datasets are composed of tiles prepared from IGN open data. Each tile has a 1000x1000 resolution representing a 500m x 500m footprint (the resolution is 50cm per pixel). We mainly used data from the Hautes-Alpes department, and we took spatially spaced data to have as much diversity as possible and to limit the area without information (unfortunately, some places lack information).

The file structure of the dataset is as follows:

├── data
│   ├── ign
│   │   ├── annotations
│   │   │   ├── training
│   │   │   │   ├── xxx.png
│   │   │   │   ├── yyy.png
│   │   │   │   ├── zzz.png
│   │   │   ├── validation
│   │   ├── images
│   │   │   ├── training
│   │   │   │   ├── xxx.png
│   │   │   │   ├── yyy.png
│   │   │   │   ├── zzz.png
│   │   │   ├── validation

The dataset is available on download here.

Information on the training

During the training, a ImageNet-22K pretrained model was used (available here) and we added weights on each class because the dataset was not balanced in classes distribution. The weights we have used are:

  • Dense forest => 0.5
  • Sparse forest => 1.31237
  • Moor => 1.38874
  • Herbaceous formation => 1.39761
  • Building => 1.5
  • Road => 1.47807

Main results

Backbone Method Crop Size Lr Schd mIoU config model
Swin-L UPerNet 384x384 60K 54.22 config model

Here are some comparison between the original segmentation and the segmentation that has been obtained after the training (Hautes-Alpes dataset):

Original segmentation Segmentation after training

We have also tested the model on satellite photos from another French department to see if the trained model generalizes to other locations. We chose Cantal and here are a few samples of the obtained results:

Original segmentation Segmentation after training

These latest results show that the model is capable of producing a segmentation even if the photos are located in another department and even if there are a lot of pixels without information (in black), which is encouraging.

Limitations

As illustrated in the previous images that the results are not perfect. This is caused by the inherent limits of the data used during the training phase. The two main limitations are:

  • The satellite photos and the original segmentation were not made at the same time, so the segmentation is not always accurate. For example, we can see it in the following images: a zone is segmented as "dense forest" even if there are not many trees (that is why the segmentation after training, on the right, classed it as "sparse forest"):
Original segmentation Segmentation after training
  • Sometimes there are zones without information (represented in black) in the dataset. Fortunately, we can ignore them during the training phase, but we also lose some information, which is a problem: we thus removed the tiles that had more than 50% of unidentified pixels to try to improve the training.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Notes: During the installation, it is important to:

  • Install MMSegmentation in dev mode:
git clone https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation
pip install -e .
  • Copy the mmcv_custom and mmseg folders into the mmsegmentation folder

Inference

The pre-trained model (i.e. checkpoint file) for satellite image segmentation is available for download here.

# single-gpu testing
python tools/test.py <CONFIG_FILE> <SEG_CHECKPOINT_FILE> --eval mIoU

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --eval mIoU

# multi-gpu, multi-scale testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --aug-test --eval mIoU

Example on the Ampli ANR project:

# Evaluate checkpoint on a single GPU
python tools/test.py configs/swin/config_upernet_swin_large_patch4_window12_384x384_60k_ign.py checkpoints/ign_60k_swin_large_patch4_window12_384.pth --eval mIoU

# Display segmentation results
python tools/test.py configs/swin/config_upernet_swin_large_patch4_window12_384x384_60k_ign.py checkpoints/ign_60k_swin_large_patch4_window12_384.pth --show

Training

To train with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments] 

Example on the Ampli ANR project with the ImageNet-22K pretrained model (available here) :

python tools/train.py configs/swin/config_upernet_swin_large_patch4_window12_384x384_60k_ign.py --options model.pretrained="./model/swin_large_patch4_window12_384_22k.pth"

Notes:

  • use_checkpoint is used to save GPU memory. Please refer to this page for more details.
  • The default learning rate and training schedule is for 8 GPUs and 2 imgs/gpu.

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Citing this work

See the complete description of this work in the dedicated arXiv paper. If you use this work, please cite it:

@misc{guerin2021satellite,
      title={Satellite Image Semantic Segmentation}, 
      author={Eric Guérin and Killian Oechslin and Christian Wolf and Benoît Martinez},
      year={2021},
      eprint={2110.05812},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Other Links

Image Classification: See Swin Transformer for Image Classification.

Object Detection: See Swin Transformer for Object Detection.

Self-Supervised Learning: See MoBY with Swin Transformer.

Video Recognition, See Video Swin Transformer.

Owner
INSA Lyon - IT Engineering
Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach This is the implementation of traffic prediction code in DTMP based on PyTo

chenxin 1 Dec 19, 2021
Implementation for paper "Towards the Generalization of Contrastive Self-Supervised Learning"

Contrastive Self-Supervised Learning on CIFAR-10 Paper "Towards the Generalization of Contrastive Self-Supervised Learning", Weiran Huang, Mingyang Yi

Weiran Huang 13 Nov 30, 2022
PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs) (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Re

isaac 222 Dec 28, 2022
Implement Decoupled Neural Interfaces using Synthetic Gradients in Pytorch

disclaimer: this code is modified from pytorch-tutorial Image classification with synthetic gradient in Pytorch I implement the Decoupled Neural Inter

Andrew 114 Dec 22, 2022
Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

Implementation for the paper: Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao, Sumeet Ka

Nurendra Choudhary 8 Nov 15, 2022
X-modaler is a versatile and high-performance codebase for cross-modal analytics.

X-modaler X-modaler is a versatile and high-performance codebase for cross-modal analytics. This codebase unifies comprehensive high-quality modules i

910 Dec 28, 2022
Multiview 3D object detection on MultiviewC dataset through moft3d.

Voxelized 3D Feature Aggregation for Multiview Detection [arXiv] Multiview 3D object detection on MultiviewC dataset through VFA. Introduction We prop

Jiahao Ma 20 Dec 21, 2022
An SMPC companion library for Syft

SyMPC A library that extends PySyft with SMPC support SyMPC /ˈsɪmpəθi/ is a library which extends PySyft ≥0.3 with SMPC support. It allows computing o

Arturo Marquez Flores 0 Oct 13, 2021
PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

DRO: Deep Recurrent Optimizer for Structure-from-Motion This is the official PyTorch implementation code for DRO-sfm. For technical details, please re

Alibaba Cloud 56 Dec 12, 2022
Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

ConvNeXt-TF This repository provides TensorFlow / Keras implementations of different ConvNeXt [1] variants. It also provides the TensorFlow / Keras mo

Sayak Paul 87 Dec 06, 2022
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
Data Preparation, Processing, and Visualization for MoVi Data

MoVi-Toolbox Data Preparation, Processing, and Visualization for MoVi Data, https://www.biomotionlab.ca/movi/ MoVi is a large multipurpose dataset of

Saeed Ghorbani 51 Nov 27, 2022
Determined: Deep Learning Training Platform

Determined: Deep Learning Training Platform Determined is an open-source deep learning training platform that makes building models fast and easy. Det

Determined AI 2k Dec 31, 2022
Convert dog pictures into various painting styles. Try LimnPet

LimnPet Cartoon stylization service project Try our service » Home page · Team notion · Members 목차 프로젝트 소개 프로젝트 목표 사용한 기술스택과 수행도구 팀원 구현 기능 주요 기능 추가 기능

LiJell 7 Jul 14, 2022
Image Captioning using CNN and Transformers

Image-Captioning Keras/Tensorflow Image Captioning application using CNN and Transformer as encoder/decoder. In particulary, the architecture consists

24 Dec 28, 2022
DiffStride: Learning strides in convolutional neural networks

DiffStride is a pooling layer with learnable strides. Unlike strided convolutions, average pooling or max-pooling that require cross-validating stride values at each layer, DiffStride can be initiali

Google Research 113 Dec 13, 2022
Source Code For Template-Based Named Entity Recognition Using BART

Template-Based NER Source Code For Template-Based Named Entity Recognition Using BART Training Training train.py Inference inference.py Corpus ATIS (h

174 Dec 19, 2022
An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

AlphaCare Background AlphaCare is a work-in-progress, open-source Deep Learning Engine for Healthcare that aims to treat and prevent major diseases. T

Siraj Raval 44 Nov 05, 2022
Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

fix_m1_rgb Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr. No warranty provided for using th

Kevin Gao 116 Jan 01, 2023
Yolov5 deepsort inference,使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中

使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中。

813 Dec 31, 2022