A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Related tags

Deep LearningWSSGG
Overview

README.md shall be finished soon.

WSSGG

0 Overview

Our model uses the image's paired caption as weak supervision to learn the entities in the image and the relations among them. At inference time, it generates scene graphs without help from texts. To learn our model, we first allow context information to propagate on the text graph to enrich the entity word embeddings (Sec. 3.1). We found this enrichment provides better localization of the visual objects. Then, we optimize a text-query-guided attention model (Sec. 3.2) to provide the image-level entity prediction and associate the text entities with visual regions best describing them. We use the joint probability to choose boxes associated with both subject and object (Sec. 3.3), then use the top scoring boxes to learn better grounding (Sec. 3.4). Finally, we use an RNN (Sec. 3.5) to capture the vision-language common-sense and refine our predictions.

1 Installation

git clone "https://github.com/yekeren/WSSGG.git" && cd "WSSGG"

We use Tensorflow 1.5 and Python 3.6.4. To continue, please ensure that at least the correct Python version is installed. requirements.txt defines the list of python packages we installed. Simply run pip install -r requirements.txt to install these packages after setting up python. Next, run protoc protos/*.proto --python_out=. to compile the required protobuf protocol files, which are used for storing configurations.

pip install -r requirements.txt
protoc protos/*.proto --python_out=.

1.1 Faster-RCNN

Our Faster-RCNN implementation relies on the Tensorflow object detection API. Users can use git clone "https://github.com/tensorflow/models.git" "tensorflow_models" && ln -s "tensorflow_models/research/object_detection" to set up. Also, don't forget to using protoc to compire the protos used by the detection API.

The specific Faster-RCNN model we use is faster_rcnn_inception_resnet_v2_atrous_lowproposals_oidv2 to keep it the same as the VSPNet. More information is in Tensorflow object detection zoo.

git clone "https://github.com/tensorflow/models.git" "tensorflow_models" 
ln -s "tensorflow_models/research/object_detection"
cd tensorflow_models/research/; protoc object_detection/protos/*.proto --python_out=.; cd -

mkdir -p "zoo"
wget -P "zoo" "http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz"
tar xzvf zoo/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz -C "zoo"

1.2 Language Parser

Though we indicate the dependency on spacy in requirements.txt, we still need to run python -m spacy download en for English. Then, we checkout the tool at SceneGraphParser by running git clone "https://github.com/vacancy/SceneGraphParser.git" && ln -s "SceneGraphParser/sng_parser"

python -m spacy download en
git clone "https://github.com/vacancy/SceneGraphParser.git"
ln -s "SceneGraphParser/sng_parser"

1.3 GloVe Embeddings

We use the pre-trained 300-D GloVe embeddings.

wget -P "zoo" "http://nlp.stanford.edu/data/glove.6B.zip"
unzip "zoo/glove.6B.zip" -d "zoo"

python "dataset-tools/export_glove_words_and_embeddings.py" \
  --glove_file "zoo/glove.6B.300d.txt" \
  --output_vocabulary_file "zoo/glove_word_tokens.txt" \
  --output_vocabulary_word_embedding_file "zoo/glove_word_vectors.npy"

2 Settings

To avoid the time-consuming Faster RCNN processes in 2.1 and 2.2, users can directly download the features we provided at the following URLs. Then, the scripts create_vg_settings.sh and create_coco_setting.sh will check the existense of the Faster-RCNN features and skip the processs if they are provided. Please note that in the following table, we assume the directory for holding the VG and COCO data to be vg-gt-cap and coco-cap.

Name URLs Please extract to directory
VG Faster-RCNN features https://storage.googleapis.com/weakly-supervised-scene-graphs-generation/vg_frcnn_proposals.zip vg-gt-cap/frcnn_proposals/
COCO Faster-RCNN features https://storage.googleapis.com/weakly-supervised-scene-graphs-generation/coco_frcnn_proposals.zip coco-cap/frcnn_proposals/

2.1 VG-GT-Graph and VG-Cap-Graph

Typing sh dataset-tools/create_vg_settings.sh "vg-gt-cap" will generate VG-related files under the folder "vg-gt-cap" (for both VG-GT-Graph and VG-Cap-Graph settings). Basically, it will download the datasets and launch the following programs under the dataset-tools directory.

Name Desc.
create_vg_frcnn_proposals.py Extract VG visual proposals using Faster-RCNN
create_vg_text_graphs.py Extract VG text graphs using Language Parser
create_vg_vocabulary Gather the VG vocabulary
create_vg_gt_graph_tf_record.py Generate TF record files for the VG-GT-Graph setting
create_vg_cap_graph_tf_record.py Generate TF record files for the VG-Cap-Graph setting

2.2 COCO-Cap-Graph

Typing sh dataset-tools/create_coco_settings.sh "coco-cap" "vg-gt-cap" will generate COCO-related files under the folder "coco-cap" (for COCO-Cap-Graph setting). Basically, it will download the datasets and launch the following programs under the dataset-tools directory. Please note that the "vg-gt-cap" directory should be created in that we need to get the split information (either Zareian et al. or Xu et al.).

Name Desc.
create_coco_frcnn_proposals.py Extract COCO visual proposals using Faster-RCNN
create_coco_text_graphs.py Extract COCO text graphs using Language Parser
create_coco_vocabulary Gather the COCO vocabulary
create_coco_cap_graph_tf_record.py Generate TF record files for the COCO-Cap-Graph setting

3 Training and Evaluation

Multi-GPUs (5 GPUs in our case) training cost less than 2.5 hours to train a single model, while single-GPU strategy requires more than 8 hours.

3.1 Multi-GPUs training

We use TF distributed training to train the models shown in our paper. For example, the following command shall create and train a model specified by the proto config file configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt, and save the trained model to a directory named "logs/base_phr_ite_seq". In train.sh, we create 1 ps, 1, chief, 3 workers, and 1 evaluator. The 6 instances are distributed on 5 GPUS (4 for training and 1 for evaluation).

sh train.sh \
  "configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt" \
  "logs/base_phr_ite_seq"

3.2 Single-GPU training

Our model can also be trained using single GPU strategy such as follow. However, we would suggest to half the learning rate or explore for better other hyper-parameters.

python "modeling/trainer_main.py" \
  --pipeline_proto "configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt" \
  --model_dir ""logs/base_phr_ite_seq""

3.3 Performance on test set

During the training process, there is an evaluator measuring the model's performance on the validation set and save the best model checkpoint. Finally, we use the following command to evaluate the saved model's performance on the test set. This evaluation process will last for 2-3 hours depends on the post-process parameters (e.g., see here). Currently, there are many kinds of stuff written in pure python, which we would later optimize to utilize GPU better to reduce the final evaluation time.

python "modeling/trainer_main.py" \
  --pipeline_proto "configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt" \
  --model_dir ""logs/base_phr_ite_seq"" \
  --job test

3.4 Primary configs and implementations

Take configs/GT-Graph-Zareian/base_phr_ite_seq.pbtxt as an example, the following configs control the model's behavior.

Name Desc. Impl.
linguistic_options Specify the phrasal context modeling, remove the section to disable it. models/cap2sg_linguistic.py
grounding_options Specify the grounding options. models/cap2sg_grounding.py
detection_options Specify the WSOD model, num_iterations to control the iterative process. models/cap2sg_detection.py
relation_options Specify the relation detection modeling. models/cap2sg_relation.py
common_sense_options Specify the sequential context modeling, remove the section to disable it. models/cap2sg_common_sense.py

4 Visualization

Please see cap2sg.ipynb.

5 Reference

If you find this project helps, please cite our CVPR2021 paper :)

@InProceedings{Ye_2021_CVPR,
  author = {Ye, Keren and Kovashka, Adriana},
  title = {Linguistic Structures as Weak Supervision for Visual Scene Graph Generation},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2021}
}

Also, please take a look at our old work in ICCV2019.

@InProceedings{Ye_2019_ICCV,
  author = {Ye, Keren and Zhang, Mingda and Kovashka, Adriana and Li, Wei and Qin, Danfeng and Berent, Jesse},
  title = {Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month = {October},
  year = {2019}
}
Owner
Keren Ye
Ph.D. student at the University of Pittsburgh. I am interested in both Computer Vision and Natural Language Processing.
Keren Ye
This implements one of result networks from Large-scale evolution of image classifiers

Exotic structured image classifier This implements one of result networks from Large-scale evolution of image classifiers by Esteban Real, et. al. Req

54 Nov 25, 2022
PyTorch Implementation of Backbone of PicoDet

PicoDet-Backbone PyTorch Implementation of Backbone of PicoDet Original Implementation is implemented on PaddlePaddle. Example picodet_l_backbone = ES

Yonghye Kwon 7 Jul 12, 2022
COCO Style Dataset Generator GUI

A simple GUI-based COCO-style JSON Polygon masks' annotation tool to facilitate quick and efficient crowd-sourced generation of annotation masks and bounding boxes. Optionally, one could choose to us

Hans Krupakar 142 Dec 09, 2022
TAUFE: Task-Agnostic Undesirable Feature DeactivationUsing Out-of-Distribution Data

A deep neural network (DNN) has achieved great success in many machine learning tasks by virtue of its high expressive power. However, its prediction can be easily biased to undesirable features, whi

KAIST Data Mining Lab 8 Dec 07, 2022
Implementation of average- and worst-case robust flatness measures for adversarial training.

Relating Adversarially Robust Generalization to Flat Minima This repository contains code corresponding to the MLSys'21 paper: D. Stutz, M. Hein, B. S

David Stutz 13 Nov 27, 2022
Robust & Reliable Route Recommendation on Road Networks

NeuroMLR: Robust & Reliable Route Recommendation on Road Networks This repository is the official implementation of NeuroMLR: Robust & Reliable Route

4 Dec 20, 2022
95.47% on CIFAR10 with PyTorch

Train CIFAR10 with PyTorch I'm playing with PyTorch on the CIFAR10 dataset. Prerequisites Python 3.6+ PyTorch 1.0+ Training # Start training with: py

5k Dec 30, 2022
PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Maximum Entropy Generators for Energy-Based Models All experiments have tensorboard visualizations for samples / density / train curves etc. To run th

Rithesh Kumar 135 Oct 27, 2022
Config files for my GitHub profile.

Canalyst Candas Data Science Library Name Canalyst Candas Description Built by a former PM / analyst to give anyone with a little bit of Python knowle

Canalyst Candas 13 Jun 24, 2022
A deep learning tabular classification architecture inspired by TabTransformer with integrated gated multilayer perceptron.

The GatedTabTransformer. A deep learning tabular classification architecture inspired by TabTransformer with integrated gated multilayer perceptron. C

Radi Cho 60 Dec 15, 2022
Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system

Recommender-Systems Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system So the data

Yash Kumar 0 Jan 20, 2022
DLFlow is a deep learning framework.

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

DiDi 152 Oct 27, 2022
Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation

Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation Introduction ACoSP is an online pruning algorithm that compr

Merantix 8 Dec 07, 2022
Baseline and template code for node21 detection track

Nodule Detection Algorithm This codebase implements a baseline model, Faster R-CNN, for the nodule detection track in NODE21. It contains all necessar

node21challenge 11 Jan 15, 2022
RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

ML-and-DataScience-preparation This repository has the goal to create a learning and preparation roadMap for Machine Learning Engineers and Data Scien

33 Dec 29, 2022
TensorFlow implementation of "Variational Inference with Normalizing Flows"

[TensorFlow 2] Variational Inference with Normalizing Flows TensorFlow implementation of "Variational Inference with Normalizing Flows" [1] Concept Co

YeongHyeon Park 7 Jun 08, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN in PyTorch Official implementation of StyleCariGAN:Caricature Generation via StyleGAN Feature Map Modulation in PyTorch Requirements PyTo

PeterZhouSZ 49 Oct 31, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022
PyTorch implementation of MulMON

MulMON This repository contains a PyTorch implementation of the paper: Learning Object-Centric Representations of Multi-object Scenes from Multiple Vi

NanboLi 16 Nov 03, 2022
Code for the upcoming CVPR 2021 paper

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel J. Brostow and Michael

Niantic Labs 496 Dec 30, 2022