Google Landmark Recogntion and Retrieval 2021 Solutions

Overview

Google Landmark Recogntion and Retrieval 2021 Solutions

In this repository you can find solution and code for Google Landmark Recognition 2021 and Google Landmark Retrieval 2021 competitions (both in top-100).

Brief Summary

My solution is based on the latest modeling from the previous competition and strong post-processing based on re-ranking and using side models like detectors. I used single RTX 3080, EfficientNet B0 and only competition data for training.

Model and loss function

I used the same model and loss as the winner team of the previous competition as a base. Since I had only single RTX 3080, I hadn't enough time to experiment with that and change it. The only things I managed to test is Subcenter ArcMarginProduct as the last block of model and ArcFaceLossAdaptiveMargin loss function, which has been used by the 2nd place team in the previous year. Both those things gave me a signifact score boost (around 4% on CV and 5% on LB).

Setting up the training and validation

Optimizing and scheduling

Optimizer - Ranger (lr=0.003)
Scheduler - CosineAnnealingLR (T_max=12) + 1 epoch Warm-Up

Training stages

I found the best perfomance in training for 15 epochs and 5 stages:

  1. (1-3) - Resize to image size, Horizontal Flip
  2. (4-6) - Resize to bigger image size, Random Crop to image size, Horizontal Flip
  3. (7-9) - Resize to bigger image size, Random Crop to image size, Horizontal Flip, Coarse Dropout with one big square (CutMix)
  4. (10-12) - Resize to bigger image size, Random Crop to image size, Horizontal Flip, FMix, CutMix, MixUp
  5. (13-15) - Resize to bigger image size, Random Crop to image size, Horizontal Flip

I used default Normalization on all the epochs.

Validation scheme

Since I hadn't enough hardware, this became my first competition where I wasn't able to use a K-fold validation, but at least I saw stable CV and CV/LB correlation at the previous competitions, so I used simple stratified train-test split in 0.8, 0.2 ratio. I also oversampled all the samples up to 5 for each class.

Inference and Post-Processing:

  1. Change class to non-landmark if it was predicted more than 20 times .
  2. Using pretrained YoloV5 for detecting non-landmark images. All classes are used, boxes with confidence < 0.5 are dropped. If total area of boxes is greater than total_image_area / 2.7, the sample is marked as non-landmark. I tried to use YoloV5 for cleaning the train dataset as well, but it only decreased a score.
  3. Tuned post-processing from this paper, based on the cosine similarity between train and test images to non-landmark ones.
  4. Higher image size for extracting embeddings on inference.
  5. Also using public train dataset as an external data for extracting embeddings.

Didn't work for me

  • Knowledge Distillation
  • Resnet architectures (on average they were worse than effnets)
  • Adding an external non-landmark class to training from 2019 test dataset
  • Train binary non-landmark classifier

Transfer Learning on the full dataset and Label Smoothing should be useful here, but I didn't have time to test it.

Owner
Vadim Timakin
17 y.o Machine Learning Engineer | Kaggle Competitions Expert | ML/DL/CV | PyTorch
Vadim Timakin
Noether Networks: meta-learning useful conserved quantities

Noether Networks: meta-learning useful conserved quantities This repository contains the code necessary to reproduce experiments from "Noether Network

Dylan Doblar 33 Nov 23, 2022
Generate pixel-style avatars with python.

face2pixel Generate pixel-style avatars with python. Run: Clone the project: git clone https://github.com/theodorecooper/face2pixel install requiremen

Theodore Cooper 2 May 11, 2022
A tool to prepare websites grabbed with wget for local viewing.

makelocal A tool to prepare websites grabbed with wget for local viewing. exapmples After fetching xkcd.com with: wget -r -no-remove-listing -r -N --p

5 Apr 23, 2022
SPT_LSA_ViT - Implementation for Visual Transformer for Small-size Datasets

Vision Transformer for Small-Size Datasets Seung Hoon Lee and Seunghyun Lee and Byung Cheol Song | Paper Inha University Abstract Recently, the Vision

Lee SeungHoon 87 Jan 01, 2023
An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

0 May 06, 2022
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the w

Yuval Nirkin 182 Dec 14, 2022
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.

CPC_audio This code implements the Contrast Predictive Coding algorithm on audio data, as described in the paper Unsupervised Pretraining Transfers we

Meta Research 283 Dec 30, 2022
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

3d-pose-baseline This is the code for the paper Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little. A simple yet effective baseline for 3

Julieta Martinez 1.3k Jan 03, 2023
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vision Transformer for Fast and Efficient Scene Text Recognition (ICDAR 2021) ViTSTR is a simple single-stage model that uses a pre-trained Vision Tra

Rowel Atienza 198 Dec 27, 2022
DeLiGAN - This project is an implementation of the Generative Adversarial Network

This project is an implementation of the Generative Adversarial Network proposed in our CVPR 2017 paper - DeLiGAN : Generative Adversarial Net

Video Analytics Lab -- IISc 110 Sep 13, 2022
Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Kai Zhang 1.2k Dec 29, 2022
A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

CapsGNN ⠀⠀ A PyTorch implementation of Capsule Graph Neural Network (ICLR 2019). Abstract The high-quality node embeddings learned from the Graph Neur

Benedek Rozemberczki 1.2k Jan 02, 2023
Structure-Preserving Deraining with Residue Channel Prior Guidance (ICCV2021)

SPDNet Structure-Preserving Deraining with Residue Channel Prior Guidance (ICCV2021) Requirements Linux Platform NVIDIA GPU + CUDA CuDNN PyTorch == 0.

41 Dec 12, 2022
[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Ranking Models in Unlabeled New Environments Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch 1.7.0 + torchivision 0.8.1

Borui Zhang 39 Dec 10, 2022
[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

WIMP - What If Motion Predictor Reference PyTorch Implementation for What If Motion Prediction [PDF] [Dynamic Visualizations] Setup Requirements The W

William Qi 96 Dec 29, 2022
This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression This repository contains the code for the paper in EM

Chenhe Dong 2 Mar 24, 2022
MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Introduction This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Ple

7 Aug 24, 2022
This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Visual-Interaction-Networks An implementation of Deepmind visual interaction networks in Pytorch. Introduction For the purpose of understanding the ch

Mahmoud Gamal Salem 166 Dec 06, 2022
Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Dual super-resolution learning for semantic segmentation 2021-01-02 Subpixel Update Happy new year! The 2020-12-29 update of SISR with subpixel conv p

Sam 79 Nov 24, 2022