Estimating Example Difficulty using Variance of Gradients

Last update: Dec 26, 2022

Overview

Estimating Example Difficulty using Variance of Gradients

This repository contains source code necessary to reproduce some of the main results in the paper:

If you use this software, please consider citing:

@article{agarwal2020estimating, 
title={Estimating Example Difficulty using Variance of Gradients},
author={Agarwal, Chirag and Hooker, Sara},
journal={arXiv preprint arXiv:2008.11600},
year={2020}
}

1. Setup

Installing software

This repository is built using a combination of TensorFlow and PyTorch. You can install the necessary libraries by pip installing the requirements text file pip install -r ./requirements_tf.txt and pip install -r ./requirements_pytorch.txt

2. Usage

Toy experiment

toy_script.py is the script for running toy dataset experiment. You can analyze the training/testing data at diffferent stages of the training, viz. Early, Middle, and Late, using the flags split and mode. The vog_cal flag enables visualizing different versions of VOG scores such as the raw score, class normalized, or the absolute class normalized scores.

Examples

Running python3 toy_script.py --split test --mode early --vog_cal normalize generates the toy dataset decision boundary figure along with the relation between the perpendicular distance of individual points from the decision boundary and the VOG scores. The respective figures are:

Left: The visualization of the toy dataset decision boundary with the testing data points. The Multiple Layer Perceptron model achieves 100% training accuracy. Right: The scatter plot between the Variance of Gradients (VoGs) for each testing data point and their perpendicular distance shows that higher scores pertain to the most challenging examples (closest to the decision boundary)

ImageNet

The main scripts for the ImageNet experiments are in the ./imagenet/ folder.

Before calculating the VOG scores you would need to store the gradients of the respective images in the ./scripts/train.txt/ file using model snapshots. For demonstration purpose, we have shared the model weights of the late stage, i.e. steps 30024, 31275, and 32000. Now, for example, we want to store the gradients for the imagenet dataset (stored as /imagenet_dir/train) at snapshot 32000, we run the shell script train_get_gradients.sh like:

source train_get_gradients.sh 32000 ./imagenet/train_results/ 9 ./scripts/train.txt/

For this repo, we have generated the gradients for 100 random images for the late stage training process and stored the results in ./imagenet/train_results/. To generate the error rate performance at different VOG deciles run train_visualize_grad.py using the following command. python train_visualize_grad.py

On analyzing the VOG score for a particular class (e.g. below are magpie and pop bottle) in the late training stage, we found two unique groups of images. In this work, we hypothesize that examples that a model has difficulty learning (images on the right) will exhibit higher variance in gradient updates over the course of training (. On the other hand, the gradient updates for the relatively easier examples are expected to stabilize early in training and converge to a narrow range of values.

Each 5×5 grid shows the top-25 ImageNet training-set images with the lowest (left column) and highest (right column) VOG scores for the class magpie and pop bottle with their predicted labels below the image. Training set images with higher VOG scores (b) tend to feature zoomed-in images with atypical color schemes and vantage points.

4. Licenses

Note that the code in this repository is licensed under MIT License, but, the pre-trained condition models used by the code have their own licenses. Please carefully check them before use.

5. Questions?

If you have questions/suggestions, please feel free to email or create github issues.

Estimating Example Difficulty using Variance of Gradients

Related tags

Overview

Estimating Example Difficulty using Variance of Gradients

1. Setup

Installing software

2. Usage

Toy experiment

Examples

ImageNet

4. Licenses

5. Questions?

Owner

Chirag Agarwal

Code for the ICCV2021 paper "Personalized Image Semantic Segmentation"

Code for the paper "Improved Techniques for Training GANs"

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Predictive Maintenance LSTM

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

duralava is a neural network which can simulate a lava lamp in an infinite loop.

tensorflow implementation of 'YOLO : Real-Time Object Detection'

Repo for "Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks"

PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Certis - Certis, A High-Quality Backtesting Engine

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

yolov5 deepsort 行人车辆跟踪检测计数

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

Estimating Example Difficulty using Variance of Gradients

Related tags

Overview

Estimating Example Difficulty using Variance of Gradients

1. Setup

Installing software

2. Usage

Toy experiment

Examples

ImageNet

4. Licenses

5. Questions?

Owner

Chirag Agarwal

Code for the ICCV2021 paper "Personalized Image Semantic Segmentation"

Code for the paper "Improved Techniques for Training GANs"

The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Predictive Maintenance LSTM

My tensorflow implementation of "A neural conversational model", a Deep learning based chatbot

duralava is a neural network which can simulate a lava lamp in an infinite loop.

tensorflow implementation of 'YOLO : Real-Time Object Detection'

Repo for "Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks"

PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Certis - Certis, A High-Quality Backtesting Engine

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

yolov5 deepsort 行人 车辆 跟踪 检测 计数

Python project to take sound as input and output as RGB + Brightness values suitable for DMX

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

yolov5 deepsort 行人车辆跟踪检测计数