Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020

Related tags

Testingshow-edit-tell
Overview

Show, Edit and Tell: A Framework for Editing Image Captions | arXiv

This contains the source code for Show, Edit and Tell: A Framework for Editing Image Captions, to appear at CVPR 2020

Requirements

  • Python 3.6 or 3.7
  • PyTorch 1.2

For evaluation, you also need:

Argument Parser is currently not supported. We will add support to it soon.

Pretrained Models

You can download the pretrained models from here. Place them in eval folder.

Download and Prepare Features

In this work, we use 36 fixed bottom-up features. If you wish to use the adaptive features (10-100), please refer to adaptive_features folder in this repository and follow the instructions.

First, download the fixed features from here and unzip the file. Place the unzipped folder in bottom-up_features folder.

Next type this command:

python bottom-up_features/tsv.py

This command will create the following files:

  • An HDF5 file containing the bottom up image features for train and val splits, 36 per image for each split, in an (I, 36, 2048) tensor where I is the number of images in the split.
  • PKL files that contain training and validation image IDs mapping to index in HDF5 dataset created above.

Download/Prepare Caption Data

You can either download all the related caption data files from here or create them yourself. The folder contains the following:

  • WORDMAP_coco: maps the words to indices
  • CAPUTIL: stores the information about the existing captions in a dictionary organized as follows: {"COCO_image_name": {"caption": "existing caption to be edited", "encoded_previous_caption": an encoded list of the words, "previous_caption_length": a list contaning the length of the caption, "image_ids": the COCO image id}
  • CAPTIONS the encoded ground-truth captions (a list with number_images x 5 lists. Example: we have 113,287 training images in Karpathy Split, thereofre there is 566,435 lists for the training split)
  • CAPLENS: the length of the ground-truth captions (a list with number_images x 5 vallues)
  • NAMES: the COCO image name in the same order as the CAPTIONS
  • GENOME_DETS: the splits and image ids for loading the images in accordance to the features file created above

If you'd like to create the caption data yourself, download Karpathy's Split training, validation, and test splits. This zip file contains the captions. Place the file in caption data folder. You should also have the pkl files created from the 'Download Features' section: train36_imgid2idx.pkl and val36_imgid2idx.pkl.

Next, run:

python preprocess_caps.py

This will dump all the files to the folder caption data.

Next, download the existing captios to be edited, and organize them in a list containing dictionaries with each dictionary in the following format: {"image_id": COCO_image_id, "caption": "caption to be edited", "file_name": "split\\COCO_image_name"}. For example: {"image_id": 522418, "caption": "a woman cutting a cake with a knife", "file_name": "val2014\\COCO_val2014_000000522418.jpg"}. In our work, we use the captions produced by AoANet.

Next, run:

python preprocess_existing_caps.py

This will dump all the existing caption files to the folder caption data.

Prepare/Download Sequence-Level Training Data

Download the RL-data for sequence-level training used for computing metric scores from here.

Alternitavely, you may prepare the data yourself:

Run the following command:

python preprocess_rl.py

This will dump two files in the data folder used for computing metric scores.

Training and Validation

XE training stage:

For training DCNet, run:

python dcnet.py

For optimizing DCNet with MSE, run:

python dcnet_with_mse.py

For training editnet:

python editnet.py
Cider-D Optimization stage:

For training DCNet, run:

python dcnet_rl.py

For training editnet:

python editnet_rl.py

Evaluation

Refer to eval folder for instructions. All the generated captions and scores from our model can be found in the outputs folder.

BLEU-1 BLEU-4 CIDEr SPICE
Cross-Entropy Loss 77.9 38.0 1.200 21.2
CIDEr Optimization 80.6 39.2 1.289 22.6

Citation

@InProceedings{Sammani_2020_CVPR,
author = {Sammani, Fawaz and Melas-Kyriazi, Luke},
title = {Show, Edit and Tell: A Framework for Editing Image Captions},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

References

Our code is mainly based on self-critical and show attend and tell. We thank both authors.

Owner
Fawaz Sammani
The human brain is a miracle every human has, and mathematically modelling that brain is an overwhelming matter! I like teaching machines vision-language
Fawaz Sammani
tidevice can be used to communicate with iPhone device

tidevice can be used to communicate with iPhone device

Alibaba 1.8k Jan 08, 2023
A toolbar overlay for debugging Flask applications

Flask Debug-toolbar This is a port of the excellent django-debug-toolbar for Flask applications. Installation Installing is simple with pip: $ pip ins

863 Dec 29, 2022
A utility for mocking out the Python Requests library.

Responses A utility library for mocking out the requests Python library. Note Responses requires Python 2.7 or newer, and requests = 2.0 Installing p

Sentry 3.8k Jan 03, 2023
Test python asyncio-based code with ease.

aiounittest Info The aiounittest is a helper library to ease of your pain (and boilerplate), when writing a test of the asynchronous code (asyncio). Y

Krzysztof Warunek 55 Oct 30, 2022
FaceBot is a script to automatically create a facebook account using the selenium and chromedriver modules.

FaceBot is a script to automatically create a facebook account using the selenium and chromedriver modules. That way, we don't need to input full name, email and password and date of birth. All will

Fadjrir Herlambang 2 Jun 17, 2022
A feature flipper for Django

README Django Waffle is (yet another) feature flipper for Django. You can define the conditions for which a flag should be active, and use it in a num

952 Jan 06, 2023
bulk upload files to libgen.lc (Selenium script)

LibgenBulkUpload bulk upload files to http://libgen.lc/librarian.php (Selenium script) Usage ./upload.py to_upload uploaded rejects So title and autho

8 Jul 07, 2022
Simple assertion library for unit testing in python with a fluent API

assertpy Simple assertions library for unit testing in Python with a nice fluent API. Supports both Python 2 and 3. Usage Just import the assert_that

19 Sep 10, 2022
A pure Python script to easily get a reverse shell

easy-shell A pure Python script to easily get a reverse shell. How it works? After sending a request, it generates a payload with different commands a

Cristian Souza 48 Dec 12, 2022
Akulaku Create NewProduct Automation using Selenium Python

Akulaku-Create-NewProduct-Automation Akulaku Create NewProduct Automation using Selenium Python Usage: 1. Install Python 3.9 2. Open CMD on Bot Folde

Rahul Joshua Damanik 1 Nov 22, 2021
Hypothesis is a powerful, flexible, and easy to use library for property-based testing.

Hypothesis Hypothesis is a family of testing libraries which let you write tests parametrized by a source of examples. A Hypothesis implementation the

Hypothesis 6.4k Jan 05, 2023
输入Google Hacking语句,自动调用Chrome浏览器爬取结果

Google-Hacking-Crawler 该脚本可输入Google Hacking语句,自动调用Chrome浏览器爬取结果 环境配置 python -m pip install -r requirements.txt 下载Chrome浏览器

Jarcis 4 Jun 21, 2022
:game_die: Pytest plugin to randomly order tests and control random.seed

pytest-randomly Pytest plugin to randomly order tests and control random.seed. Features All of these features are on by default but can be disabled wi

pytest-dev 471 Dec 30, 2022
Screenplay pattern base for Python automated UI test suites.

ScreenPy TITLE CARD: "ScreenPy" TITLE DISAPPEARS.

Perry Goy 39 Nov 15, 2022
Cloint India Pvt. Ltd's (ClointFusion) Pythonic RPA (Automation) Platform

Welcome to , Made in India with ❤️ Description Cloint India Pvt. Ltd - Python functions for Robotic Process Automation shortly RPA. What is ClointFusi

Cloint India Pvt. Ltd 31 Apr 12, 2022
Tools for test driven data-wrangling and data validation.

datatest: Test driven data-wrangling and data validation Datatest helps to speed up and formalize data-wrangling and data validation tasks. It impleme

269 Dec 16, 2022
WIP SAT benchmarking tooling, written with only my personal use in mind.

SAT Benchmarking Some early work in progress tooling for running benchmarks and keeping track of the results when working on SAT solvers and related t

Jannis Harder 1 Dec 26, 2021
Python Webscraping using Selenium

Web Scraping with Python and Selenium The code shows how to do web scraping using Python and Selenium. We use as data the https://sbot.org.br/localize

Luís Miguel Massih Pereira 1 Dec 01, 2021
模仿 USTC CAS 的程序,用于开发校内网站应用的本地调试。

ustc-cas-mock 模仿 USTC CAS 的程序,用于开发校内网站应用阶段调试。 请勿在生产环境部署! 只测试了最常用的三个 CAS route: /login /serviceValidate(验证 CAS ticket) /logout 没有测试过 proxy ticket。(因为我

taoky 4 Jan 27, 2022
Divide full port scan results and use it for targeted Nmap runs

Divide Et Impera And Scan (and also merge the scan results) DivideAndScan is used to efficiently automate port scanning routine by splitting it into 3

snovvcrash 226 Dec 30, 2022