Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

Last update: Jan 02, 2023

Related tags

Overview

MUSIQ: Multi-Scale Image Quality Transformer

Unofficial pytorch implementation of the paper "MUSIQ: Multi-Scale Image Quality Transformer" (paper link: https://arxiv.org/abs/2108.05997)

This code doesn't exactly match what the paper describes.

It only works on the KonIQ-10k dataset. Or it works on the database which resolution is 1024(witdh) x 768(height).
Instead of using 5-layer Resnet as a backbone network, we use ResNet50 pretrained on ImageNet database.
We need to implement Earth Mover Distance (EMD) loss to train on other databases.
We additionally use ranking loss to improve the performance (we will upload the training code including ranking loss later)

The environmental settings are described below. (I cannot gaurantee if it works on other environments)

Pytorch=1.7.1 (with cuda 11.0)
einops=0.3.0
numpy=1.18.3
cv2=4.2.0
scipy=1.4.1
json=2.0.9
tqdm=4.45.0

Train & Validation

First, you need to download weights of ResNet50 pretrained on ImageNet database.

Downlod the weights from this website (https://download.pytorch.org/models/resnet50-0676ba61.pth)
rename the .pth file as "resnet50.pth" and put it in the "model" folder

Second, you need to download the KonIQ-10k dataset.

Download the database from this website (http://database.mmsp-kn.de/koniq-10k-database.html)
set the database path in "train.py" (It is represented as "db_path" in "train.py")
Please check "koniq-10k.txt" is in "IQA_list" folder
"koniq-10k.txt" file includes [scene number / image name / ground truth score] information

After those settings, you can run the train & validation code by running "train.py"

python3 train.py (execution code)
This code works on single GPU. If you want to train this code in muti-gpu, you need to change this code
Options are all included in "train.py". So you should change the variable "config" in "train.py"

Belows are the validation performance on KonIQ-10k database (I'm still training the code, so the results will be updated later)

SRCC: 0.9023 / PLCC: 0.9232 (after training 105 epochs)
If the codes are implemented exactly the same as the paper, the performance can be further improved

Inference

First, you need to specify variables in "inference.py"

dirname: root folder of test images
checkpoint: checkpoint file (trained on KonIQ-10k dataset)
result_score_txt: inference score will be saved on this txt file

After those settings, you can run the inference code by running "inference.py"

python3 inference.py (execution code)

Acknolwdgements

We refer to the following website to implement the transformer (https://paul-hyun.github.io/transformer-01/)

Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

Related tags

Overview

MUSIQ: Multi-Scale Image Quality Transformer

Train & Validation

Inference

Acknolwdgements

Owner

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

A GPT, made only of MLPs, in Jax

This is a demo app to be used in the video streaming applications

performing moving objects segmentation using image processing techniques with opencv and numpy

This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

Create Data & AI apps in 20 lines of code with Shimoku

labelpix is a graphical image labeling interface for drawing bounding boxes

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

Few-Shot Graph Learning for Molecular Property Prediction

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

In Search of Probeable Generalization Measures

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

AdaDM: Enabling Normalization for Image Super-Resolution

Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

Related tags

Overview

MUSIQ: Multi-Scale Image Quality Transformer

Train & Validation

Inference

Acknolwdgements

Owner

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

A GPT, made only of MLPs, in Jax

This is a demo app to be used in the video streaming applications

performing moving objects segmentation using image processing techniques with opencv and numpy

This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

Create Data & AI apps in 20 lines of code with Shimoku

labelpix is a graphical image labeling interface for drawing bounding boxes

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

Few-Shot Graph Learning for Molecular Property Prediction

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

In Search of Probeable Generalization Measures

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

AdaDM: Enabling Normalization for Image Super-Resolution

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .