A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

Last update: Jan 06, 2023

Overview

University1652-Baseline

[Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]

This repository contains the dataset link and the code for our paper University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization, ACM Multimedia 2020. The offical paper link is at https://dl.acm.org/doi/10.1145/3394171.3413896. We collect 1652 buildings of 72 universities around the world. Thank you for your kindly attention.

Task 1: Drone-view target localization. (Drone -> Satellite) Given one drone-view image or video, the task aims to find the most similar satellite-view image to localize the target building in the satellite view.

Task 2: Drone navigation. (Satellite -> Drone) Given one satellite-view image, the drone intends to find the most relevant place (drone-view images) that it has passed by. According to its flight history, the drone could be navigated back to the target place.

About Dataset
News
Code Features
Prerequisites
Getting Started
Citation

About Dataset

The dataset split is as follows:

Split	#imgs	#buildings	#universities
Training	50,218	701	33
Query_drone	37,855	701	39
Query_satellite	701	701	39
Query_ground	2,579	701	39
Gallery_drone	51,355	951	39
Gallery_satellite	951	951	39
Gallery_ground	2,921	793	39

More detailed file structure:

├── University-1652/
│   ├── readme.txt
│   ├── train/
│       ├── drone/                   /* drone-view training images 
│           ├── 0001
|           ├── 0002
|           ...
│       ├── street/                  /* street-view training images 
│       ├── satellite/               /* satellite-view training images       
│       ├── google/                  /* noisy street-view training images (collected from Google Image)
│   ├── test/
│       ├── query_drone/  
│       ├── gallery_drone/  
│       ├── query_street/  
│       ├── gallery_street/ 
│       ├── query_satellite/  
│       ├── gallery_satellite/ 
│       ├── 4K_drone/

We note that there are no overlaps between 33 univeristies of training set and 39 univeristies of test set.

News

1 Dec 2021 Fix the issue due to the latest torchvision, which do not allow the empty subfolder. Note that some buildings do not have google images.

3 March 2021 GeM Pooling is added. You may use it by --pool gem.

21 January 2021 The GPU-Re-Ranking, a GNN-based real-time post-processing code, is at Here.

21 August 2020 The transfer learning code for Oxford and Paris is at Here.

27 July 2020 The meta data of 1652 buildings, such as latitude and longitude, are now available at Google Driver. (You could use Google Earth Pro to open the kml file or use vim to check the value).
We also provide the spiral flight tour file at Google Driver. (You could open the kml file via Google Earth Pro to enable the flight camera).

26 July 2020 The paper is accepted by ACM Multimedia 2020.

12 July 2020 I made the baseline of triplet loss (with soft margin) on University-1652 public available at Here.

12 March 2020 I add the state-of-the-art page for geo-localization and tutorial, which will be updated soon.

Code Features

Now we have supported:

Float16 to save GPU memory based on apex
Multiple Query Evaluation
Re-Ranking
Random Erasing
ResNet/VGG-16
Visualize Training Curves
Visualize Ranking Result
Linear Warm-up

Prerequisites

Python 3.6
GPU Memory >= 8G
Numpy > 1.12.1
Pytorch 0.3+
[Optional] apex (for float16)

Getting started

Installation

Install Pytorch from http://pytorch.org/
Install Torchvision from the source

git clone https://github.com/pytorch/vision
cd vision
python setup.py install

[Optinal] You may skip it. Install apex from the source

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

Dataset & Preparation

Download [University-1652] upon request. You may use the request template.

Or download CVUSA / CVACT.

For CVUSA, I follow the training/test split in (https://github.com/Liumouliu/OriCNN).

Train & Evaluation

Train & Evaluation University-1652

python train.py --name three_view_long_share_d0.75_256_s1_google  --extra --views 3  --droprate 0.75  --share  --stride 1 --h 256  --w 256 --fp16; 
python test.py --name three_view_long_share_d0.75_256_s1_google

Default setting: Drone -> Satellite If you want to try other evaluation setting, you may change these lines at: https://github.com/layumi/University1652-Baseline/blob/master/test.py#L217-L225

Ablation Study only Satellite & Drone

python train_no_street.py --name two_view_long_no_street_share_d0.75_256_s1  --share --views 3  --droprate 0.75  --stride 1 --h 256  --w 256  --fp16; 
python test.py --name two_view_long_no_street_share_d0.75_256_s1

Set three views but set the weight of loss on street images to zero.

Train & Evaluation CVUSA

python prepare_cvusa.py
python train_cvusa.py --name usa_vgg_noshare_warm5_lr2 --warm 5 --lr 0.02 --use_vgg16 --h 256 --w 256  --fp16 --batchsize 16;
python test_cvusa.py  --name usa_vgg_noshare_warm5_lr2

Trained Model

You could download the trained model at GoogleDrive or OneDrive. After download, please put model folders under ./model/.

Citation

The following paper uses and reports the result of the baseline model. You may cite it in your paper.

@article{zheng2020university,
  title={University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization},
  author={Zheng, Zhedong and Wei, Yunchao and Yang, Yi},
  journal={ACM Multimedia},
  year={2020}
}

Instance loss is defined in

@article{zheng2017dual,
  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  doi={10.1145/3383184},
  volume={16},
  number={2},
  pages={1--23},
  year={2020},
  publisher={ACM New York, NY, USA}
}

Related Work

Instance Loss Code
Lending Orientation to Neural Networks for Cross-view Geo-localization Code
Predicting Ground-Level Scene Layout from Aerial Imagery Code

Comments

difficulties in downloading the dataset from Google Drive - Need direct link

Hi, thank you for sharing your dataset. Living in China, it's almost impossible to download your dataset from Google Drive. It's also stop if we try to use a VPN. Can you provide a direct link to download your dataset?

Thank you

opened by jpainam 5
Results can't be reproduced

Hi @layumi , thanks for releasing the codes.

When I ran the train.py file (using the resnet model), after initializing with the pretraining model parameters and training for 119 epochs, I ran the test.py file and only got the following results: Rec[email protected]:1.29 [email protected]:4.54 [email protected]:7.43 [email protected]:7.92 AP:2.53

And when I ran the train.py file using the vgg mode, I got: Recal[email protected]:1.75 [email protected]:6.22 [email protected]:10.36 [email protected]:11.16 AP:3.39

The hyper-parameters of the above results are set by default. To get the results in the paper, do I need to modify the hyper-parameters in the code?

I use pytorch 1.1.0 and V100

opened by Anonymous-so 4
How to visualize the retrieved image?

Hello, I've been looking at your code recently. In test.py file, after extracting the features of the image, save result to pytorch__result. mat file, and then run evaluate_ gpu. py file for evaluation. I want to know how to visualize the search results and get the matching results like Figure 5 in the paper.

opened by zkangkang0 2
Question about collecting images

Hello, First of all, thank you for sharing your great work.

I'm currently doing researches with cross-view geo-localization and I want to collect image data like the University1652 dataset, so I was wondering if you could share some sample codes, or a simple tutorial about how to collect images using Google Earth Engine.

Thank you and best regards.

opened by viet2411 2

Testing Drone -> satellite with views=2 is not defined but is default settings

Hi. I trained using the tutorial readme with this command. python train.py --gpu_ids 0,2 --name ft_ResNet50 --train_all --batchsize 32 --data_dir /home/xx/datasets/University-Release/train And this is the generated yaml

DA: false
batchsize: 32
color_jitter: false
data_dir: /home/paul/datasets/University-Release/train
droprate: 0.5
erasing_p: 0
extra_Google: false
fp16: false
gpu_ids: 0,2
h: 384
lr: 0.01
moving_avg: 1.0
name: ft_ResNet50
nclasses: 701
pad: 10
pool: avg
resume: false
share: false
stride: 2
train_all: true
use_NAS: false
use_dense: false
views: 2
w: 384
warm_epoch: 0

So, for testing, i do this python test.py --gpu_ids 0 --name ft_ResNet50 --test_dir /home/xx/datasets/University-Release/test --batchsize 32 --which_epoch 119 I found out that, the views=2 and the view_index=3 in the extract_feature function. Using this code

def which_view(name):
    if 'satellite' in name:
        return 1
    elif 'street' in name:
        return 2
    elif 'drone' in name:
        return 3
    else:
        print('unknown view')
    return -1

The task is 3 -> 1 means Drone -> Satellite with views=2. But the code in the testing, doesn't consider this scenario

 for scale in ms:
    if scale != 1:
       # bicubic is only  available in pytorch>= 1.1
        input_img = nn.functional.interpolate(input_img, scale_factor=scale, mode='bilinear', align_corners=False)
        if opt.views ==2:
           if view_index == 1:
              outputs, _ = model(input_img, None) 
           elif view_index ==2:
               _, outputs = model(None, input_img) 
        elif opt.views ==3:
           if view_index == 1:
              outputs, _, _ = model(input_img, None, None)
           elif view_index ==2:
                _, outputs, _ = model(None, input_img, None)
            elif view_index ==3:
                    _, _, outputs = model(None, None, input_img)
                ff += outputs # Give error, since outputs is not defined

For views == 2, there is no views_index == 3

opened by jpainam 2

file naming: Error Path too long

Hi, I guess on a Unix/Linux system, such error might not occur. But a file naming similar to the Market-1501 dataset could have been better for Windows based systems. Here an error due to the path length in Windows systems.

opened by jpainam 1
How to use t-SNE ?

Hi, Dr. Zheng. After reading your paper, I want to use t-SNE code, could you release this t-SNE code? I find lots of t-SNE codes on github, but I can not find useful codes of using resnet network or pretrained models. Thanks a lot !!!!

opened by starstarb 1
About GNN Re-ranking training program

Hello @layumi , thank you for your work

I was trying to reproduce the result in paper "Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective" using your pytorch code, but I'm having some trouble in running the program.

The program needs "market_88_test.pkl" as input data for re-ranking process, but I don't understand how to generate it properly.

Could you give some advices on how to use this code?

Thank you and best regards.

opened by viet2411 2

Releases(v1.1)

v1.1(Dec 16, 2021)

Add supports for seven losses in one repo, including contrast, triplet, lifted, circle, arcface, cosface and sphere.
Source code(tar.gz)
Source code(zip)

Owner

Zhedong Zheng

Hi, I am a PhD student at University of Technology Sydney. My work focuses on computer vision, especially representation learning.

GitHub Repository https://arxiv.org/abs/2002.12186

Generating Band-Limited Adversarial Surfaces Using Neural Networks

Generating Band-Limited Adversarial Surfaces Using Neural Networks This is the official repository of the technical report that was published on arXiv

3 Jul 26, 2022

ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

ManimML ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

259 Jan 04, 2023

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA). Master's thesis documents. Bibliography, experiments and reports.

73 Dec 04, 2022

A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

65 Dec 19, 2022

A ssl analyzer which could analyzer target domain's certificate.

ssl_analyzer A ssl analyzer which could analyzer target domain's certificate. Analyze the domain name ssl certificate information according to the inp

17 Dec 12, 2022

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

124 Dec 27, 2022

KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

KDD CUP 2020: AutoGraph Team: aister Members: Jianqiang Huang, Xingyuan Tang, Mingjian Chen, Jin Xu, Bohang Zheng, Yi Qi, Ke Hu, Jun Lei Team Introduc

96 May 30, 2022

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds

Unsupervised 3D Human Mesh Recovery from Noisy Point Clouds Xinxin Zuo, Sen Wang, Minglun Gong, Li Cheng Prerequisites We have tested the code on Ubun

41 Dec 12, 2022

Remote sensing change detection using PaddlePaddle

Change Detection Laboratory Developing and benchmarking deep learning-based remo

15 Sep 23, 2022

Multiview Dataset Toolkit

Multiview Dataset Toolkit Using multi-view cameras is a natural way to obtain a complete point cloud. However, there is to date only one multi-view 3D

11 Dec 22, 2022

3D-printable hand-strapped keyboard

Note: This repo has not been cleaned up and prepared for general consumption at all. This is just a dump of the project files. If there is any interes

41 Dec 31, 2022

Deep learning library for solving differential equations and more

DeepXDE Voting on whether we should have a Slack channel for discussion. DeepXDE is a library for scientific machine learning. Use DeepXDE if you need

1.4k Dec 29, 2022

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

76 Dec 14, 2022

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Optimum Transformers Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime. Installatio

115 Dec 16, 2022

EMNLP 2021 paper The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

Codebase for training transformers on systematic generalization datasets. The official repository for our EMNLP 2021 paper The Devil is in the Detail:

57 Nov 21, 2022

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022

Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation

Official Code Implementation of The Paper : XAI for Transformers: Better Explanations through Conservative Propagation For the SST-2 and IMDB expermin

23 Dec 30, 2022

This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

Lottery Simulator 2022 for Web Launch Application Developed by John Seong in Ontario. This game was designed to encourage young people not to gamble o

2 Sep 02, 2022

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++). Built in TensorFlow 2.5. Configured for vox

32 Dec 08, 2022

Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation.

SAFA: Structure Aware Face Animation (3DV2021) Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation. Getting Started

122 Dec 23, 2022