PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Last update: Nov 17, 2022

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

This is the PyTorch implementation of our paper:

Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020

[arXiv] [GitHub] [Project]

10-min YouTube Video

How to start

Clone the repo recursively:

git clone --recursive [email protected]:chihyaoma/cyclical-visual-captioning.git

If you didn't clone with the --recursive flag, then you'll need to manually clone the pybind submodule from the top-level directory:

git submodule update --init --recursive

Installation

The proposed cyclical method can be applied directly to image and video captioning tasks.

Currently, installation guide and our code for video captioning on the ActivityNet-Entities dataset are provided in anet-video-captioning.

Acknowledgments

Chih-Yao Ma and Zsolt Kira were partly supported by DARPA’s Lifelong Learning Machines (L2M) program, under Cooperative Agreement HR0011-18-2-0019, as part of their affiliation with Georgia Tech. We thank Chia-Jung Hsu for her valuable and artistic helps on the figures.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{ma2020learning,
    title={Learning to Generate Grounded Image Captions without Localization Supervision},
    author={Ma, Chih-Yao and Kalantidis, Yannis and AlRegib, Ghassan and Vajda, Peter and Rohrbach, Marcus and Kira, Zsolt},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2020},
    url={https://arxiv.org/abs/1906.00283},
}

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Related tags

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

10-min YouTube Video

How to start

Installation

Acknowledgments

Citation

Owner

Chih-Yao Ma

This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

This project is for a Twitter bot that monitors a bird feeder in my backyard. Any detected birds are identified and posted to Twitter.

Molecular Sets (MOSES): A benchmarking platform for molecular generation models

An image classification app boilerplate to serve your deep learning models asap!

Code for the paper "Can Active Learning Preemptively Mitigate Fairness Issues?" presented at RAI 2021.

A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.

Forecasting with Gradient Boosted Time Series Decomposition

Easily pull telemetry data and create beautiful visualizations for analysis.

Keeper for Ricochet Protocol, implemented with Apache Airflow

Boston House Prediction Valuation Tool

Official PyTorch implementation of the paper "Graph-based Generative Face Anonymisation with Pose Preservation" in ICIAP 2021

Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

Code repo for "FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation" (ICCV 2021)

A toolkit for developing and comparing reinforcement learning algorithms.

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

It's like Shape Editor in Maya but works with skeletons (transforms).