Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Last update: Nov 21, 2022

Related tags

Deep Learning m2e2

Overview

Cross-media Structured Common Space for Multimedia Event Extraction

Overview
Requirements
Data
Quickstart
Citation

Overview

The code for paper Cross-media Structured Common Space for Multimedia Event Extraction.

Requirements

You can install the environment using requirements.txt for each component.

pip install -r requirements.txt

Data

Situation Recognition (Visual Event Extraction Data)

We download situation recognition data from imSitu. Please find the preprocessed data in PreprcessedSR.

ACE (Text Event Extraction Data)

We preprcoessed ACE following JMEE. The preprocessing script is in dataflow/preprocess_ace_JMEE.py, and the sample data format is in sample.json. Due to license reason, the ACE 2005 dataset is only accessible to those with LDC2006T06 license, please drop me an email showing your possession of the license for the processed data.

Voice of America Image-Caption Pairs

We crawled VOA image-captions to train the common space, the image-caption pairs and images can be downloaded using the URLs (We share image URLs instead of downloaded images due to license issue). We preprocess the data including object detection, and parse text sentences. The preprocessed data is in PreprocessedVOA.

M2E2 (Multimedia Event Extraction Benchmark)

The images and text articles are in m2e2_rawdata, and annotations are in m2e2_annotation.

Vocabulary

Preprocessed vocabulary is in PreprocessedVocab.

Quickstart

Training

We have two variants to parse images into situation graph, one is parsing images to role-driven attention graph, and another is parsing images to object graphs.

(1) attention-graph based version

sh scripts/train/train_joint_att.sh

(2) object-graph based version:

sh scripts/train/train_joint_obj.sh

Please specify the data paths datadir, glovedir in scripts.

Testing

(1) attention-graph based version

sh test_joint.sh

(2) object-graph based version:

sh test_joint_object.sh

Please specify the data paths datadir, glovedir, and model paths checkpoint_sr, checkpoint_sr_params, checkpoint_ee, checkpoint_ee_params in scripts.

Citation

Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang. 2020. Cross-media Structured Common Space for Multimedia Event Extraction. Proceedings of The 58th Annual Meeting of the Association for Computational Linguistics.

@inproceedings{li2020multimediaevent,
    title={Cross-media Structured Common Space for Multimedia Event Extraction},
    author={Manling Li and Alireza Zareian and Qi Zeng and Spencer Whitehead and Di Lu and Heng Ji and Shih-Fu Chang},
    booktitle={Proceedings of The 58th Annual Meeting of the Association for Computational Linguistics},
    year={2020}

Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Related tags

Overview

Cross-media Structured Common Space for Multimedia Event Extraction

Table of Contents

Overview

Requirements

Data

Situation Recognition (Visual Event Extraction Data)

ACE (Text Event Extraction Data)

Voice of America Image-Caption Pairs

M2E2 (Multimedia Event Extraction Benchmark)

Vocabulary

Quickstart

Training

Testing

Citation

Owner

Manling Li

Out of Distribution Detection on Natural Adversarial Examples

Robust fine-tuning of zero-shot models

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

The `rtdl` library + The official implementation of the paper

Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

PyTorch implementation of EigenGAN

TYolov5: A Temporal Yolov5 Detector Based on Quasi-Recurrent Neural Networks for Real-Time Handgun Detection in Video

This folder contains the python code of UR5E's advanced forward kinematics model.

Capstone-Project-2 - A game program written in the Python language

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning

Finetune SSL models for MOS prediction

An NVDA add-on to split screen reader and audio from other programs to different sound channels

Annotated notes and summaries of the TensorFlow white paper, along with SVG figures and links to documentation

A New Approach to Overgenerating and Scoring Abstractive Summaries

Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

A curated list of Generative Deep Art projects, tools, artworks, and models

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".