TRIQ implementation

Overview

TRIQ Implementation

TF-Keras implementation of TRIQ as described in Transformer for Image Quality Assessment.

Installation

  1. Clone this repository.
  2. Install required Python packages. The code is developed by PyCharm in Python 3.7. The requirements.txt document is generated by PyCharm, and the code should also be run in latest versions of the packages.

Training a model

An example of training TRIQ can be seen in train/train_triq.py. Argparser should be used, but the authors prefer to use dictionary with parameters being defined. It is easy to convert to take arguments. In principle, the following parameters can be defined:

args = {}
args['multi_gpu'] = 0 # gpu setting, set to 1 for using multiple GPUs
args['gpu'] = 0  # If having multiple GPUs, specify which GPU to use

args['result_folder'] = r'..\databases\experiments' # Define result path
args['n_quality_levels'] = 5  # Choose between 1 (MOS prediction) and 5 (distribution prediction)

args['transformer_params'] = [2, 32, 8, 64]

args['train_folders'] =  # Define folders containing training images
    [
    r'..\databases\train\koniq_normal',
    r'..\databases\train\koniq_small',
    r'..\databases\train\live'
    ]
args['val_folders'] =  # Define folders containing testing images
    [
    r'..\databases\val\koniq_normal',
    r'..\databases\val\koniq_small',
    r'..\databases\val\live'
    ]
args['koniq_mos_file'] = r'..\databases\koniq10k_images_scores.csv'  # MOS (distribution of scores) file for KonIQ database
args['live_mos_file'] = r'..\databases\live_mos.csv'   # MOS (standard distribution of scores) file for LIVE-wild database

args['backbone'] = 'resnet50' # Choose from ['resnet50', 'vgg16']
args['weights'] = r'...\pretrained_weights\resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'  # Define the path of ImageNet pretrained weights
args['initial_epoch'] = 0  # Define initial epoch for use in fine-tune

args['lr_base'] = 1e-4 / 2  # Define the back learning rate in warmup and rate decay approach
args['lr_schedule'] = True  # Choose between True and False, indicating if learning rate schedule should be used or not
args['batch_size'] = 32  # Batch size, should choose to fit in the GPU memory
args['epochs'] = 120  # Maximal epoch number, can set early stop in the callback or not

args['image_aug'] = True # Choose between True and False, indicating if image augmentation should be used or not

Predict image quality using the trained model

After TRIQ has been trained, and the weights have been stored in h5 file, it can be used to predict image quality with arbitrary sizes,

    args = {}
    args['n_quality_levels'] = 5
    args['backbone'] = 'resnet50'
    args['weights'] = r'..\\TRIQ.h5'
    model = create_triq_model(n_quality_levels=args['n_quality_levels'],
                              backbone=args['backbone'],])
    model.load_weights(args['weights'])

And then use ModelEvaluation to predict quality of image set.

In the "examples" folder, an example script examples\image_quality_prediction.py is provided to use the trained weights to predict quality of example images. In the "train" folder, an example script train\validation.py is provided to use the trained weights to predict quality of images in folders.

A potential issue is image shape mismatch. For example, if an image is too large, then line 146 in transformer_iqa.py should be changed to increase the pooling size. For example, it can be changed to self.pooling_small = MaxPool2D(pool_size=(4, 4)) or even larger.

Prepare datasets for model training

This work uses two publicly available databases: KonIQ-10k KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment by V. Hosu, H. Lin, T. Sziranyi, and D. Saupe; and LIVE-wild Massive online crowdsourced study of subjective and objective picture quality by D. Ghadiyaram, and A.C. Bovik

  1. The two databases were merged, and then split to training and testing sets. Please see README in databases for details.

  2. Make MOS files (note: do NOT include head line):

    For database with score distribution available, the MOS file is like this (koniq format):

        image path, voter number of quality scale 1, voter number of quality scale 2, voter number of quality scale 3, voter number of quality scale 4, voter number of quality scale 5, MOS or Z-score
        10004473376.jpg,0,0,25,73,7,3.828571429
        10007357496.jpg,0,3,45,47,1,3.479166667
        10007903636.jpg,1,0,20,73,2,3.78125
        10009096245.jpg,0,0,21,75,13,3.926605505
    

    For database with standard deviation available, the MOS file is like this (live format):

        image path, standard deviation, MOS or Z-score
        t1.bmp,18.3762,63.9634
        t2.bmp,13.6514,25.3353
        t3.bmp,18.9246,48.9366
        t4.bmp,18.2414,35.8863
    

    The format of MOS file ('koniq' or 'live') and the format of MOS or Z-score ('mos' or 'z_score') should also be specified in misc/imageset_handler/get_image_scores.

  3. In the train script in train/train_triq.py the folders containing training and testing images are provided.

  4. Pretrained ImageNet weights can be downloaded (see README in.\pretrained_weights) and pointed to in the train script.

Trained TRIQ weights

TRIQ has been trained on KonIQ-10k and LIVE-wild databases, and the weights file can be downloaded here.

State-of-the-art models

Other three models are also included in the work. The original implementations of metrics are employed, and they can be found below.

Koncept512 KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment

SGDNet SGDNet: An end-to-end saliency-guided deep neural network for no-reference image quality assessment

CaHDC End-to-end blind image quality prediction with cascaded deep neural network

Comparison results

We have conducted several experiments to evaluate the performance of TRIQ, please see results.pdf for detailed results.

Error report

In case errors/exceptions are encountered, please first check all the paths. After fixing the path isse, please report any errors in Issues.

FAQ

  • To be added

ViT (Vision Transformer) for IQA

This work is heavily inspired by ViT An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. The module vit_iqa contains implementation of ViT for IQA, and mainly followed the implementation of ViT-PyTorch. Pretrained ViT weights can be downloaded here.

Owner
Junyong You
Junyong You
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
Semi-SDP Semi-supervised parser for semantic dependency parsing.

Semi-SDP Semi-supervised parser for semantic dependency parsing. This repo contains the code used for the semi-supervised semantic dependency parser i

12 Sep 17, 2021
Spectral Temporal Graph Neural Network (StemGNN in short) for Multivariate Time-series Forecasting

Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting This repository is the official implementation of Spectral Temporal Gr

Microsoft 306 Dec 29, 2022
Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021

DIFFNet This repo is for Self-Supervised Monocular DepthEstimation with Internal Feature Fusion(arXiv), BMVC2021 A new backbone for self-supervised de

Hang 94 Dec 25, 2022
Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification"

Code for "Steerable Pyramid Transform Enables Robust Left Ventricle Quantification" This is an end-to-end framework for accurate and robust left ventr

2 Jul 09, 2022
The AugNet Python module contains functions for the fast computation of image similarity.

AugNet AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation arxiv link In our work, we propose AugNet, a new deep le

Ming 74 Dec 28, 2022
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios This is the official TensorFlow implementation of MetaTTE in the

morningstarwang 4 Dec 14, 2022
Implements a fake news detection program using classifiers.

Fake news detection Implements a fake news detection program using classifiers for Data Mining course at UoA. Description The project is the categoriz

Apostolos Karvelas 1 Jan 09, 2022
Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

71 Dec 19, 2022
SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks Molecular interaction networks are powerful resources for the discovery. While dee

Kexin Huang 49 Oct 15, 2022
reimpliment of DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

DFANet This repo is an unofficial pytorch implementation of DFANet:Deep Feature Aggregation for Real-Time Semantic Segmentation log 2019.4.16 After 48

shen hui xiang 248 Oct 21, 2022
How to Learn a Domain Adaptive Event Simulator? ACM MM, 2021

LETGAN How to Learn a Domain Adaptive Event Simulator? ACM MM 2021 Running Environment: pytorch=1.4, 1 NVIDIA-1080TI. More details can be found in pap

CVTEAM 4 Sep 20, 2022
Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

Willi Menapace 136 Dec 31, 2022
Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

Lane Follower This code is for the lane follower, including perception and control, as shown below. Environment Hardware Industrial Camera Intel-NUC(1

Siqi Fan 3 Jul 07, 2022
docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Mindee 1.5k Jan 01, 2023
Multi-Person Extreme Motion Prediction

Multi-Person Extreme Motion Prediction Implementation for paper Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer, Multi-Person Extre

GUO-W 38 Nov 15, 2022
This repo contains the code required to train the multivariate time-series Transformer.

Multi-Variate Time-Series Transformer This repo contains the code required to train the multivariate time-series Transformer. Download the data The No

Gregory Duthé 4 Nov 24, 2022
Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

Donggon Jang 12 Sep 26, 2022