The Submission for SIMMC 2.0 Challenge 2021

Related tags

Deep Learningsimmc2.0
Overview

The Submission for SIMMC 2.0 Challenge 2021

Requirements

Preprocessing

  1. Download Data
  • Download the data provided by the challenge organizer and put it in the data folder.
  • Unzip data files
  1. Image saving
  • Preprocess the image files in advance. The preprocessed result has the image name as the key and visual as the value.
python3 image_preprocessor.py
python3 image_preprocessor_final.py

Step 1 (ITM)

First, the model is post-trained by image-to-text matching. Here, image is each object and text is the visual metadata of the object. Code is provided in the ITM folder.

Step 2 (BTM)

Second, pretraining is performed to use background reprsentation of image in subtasks. Similar to ITM, it is trained to match image and text, and the image is the background of the dialog and the text is the entire context of the dialog. Code is provided in the BTM folder.

Step 3

This is the learning process for each subtask. You can train the model in each folder (sub1, sub2_1, sub2_2, sub2_3, sub2_4, sub4).

Model

All models can be downloaded from the following link

model.pt is a model for evaluating devtest, and the result is saved in the dstc10-simmc-entry folder. model_final.pt is a model for evaluating teststd, and the result is saved in the dstc10-simmc-final-entry folder. However, the training of the model was not completed within the challenge period, so we inferred to model.pt for the teststd data in subtask2.

Evlauation

Using the evaluation script suggested by the challenge organizer

The SIMMC organizers introduce the scripts:

(line-by-line evaluation) $ python -m gpt2_dst.scripts.evaluate \ --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \ --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \ --output_path_report={PATH_TO_REPORT} (Or, dialog level evaluation) $ python -m utils.evaluate_dst \ --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \ --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \ --output_path_report={PATH_TO_REPORT} $ python tools/response_evaluation.py \ --data_json_path={PATH_TO_GOLD_RESPONSES} \ --model_response_path={PATH_TO_MODEL_RESPONSES} \ --single_round_evaluation $ python tools/retrieval_evaluation.py \ --retrieval_json_path={PATH_TO_GROUNDTRUTH_RETRIEVAL} \ --model_score_path={PATH_TO_MODEL_CANDIDATE_SCORES} \ --single_round_evaluation ">

     
      
$ python tools/disambiguator_evaluation.py \
	--pred_file="{PATH_TO_PRED_FILE}" \
	--test_file="{PATH_TO_TEST_FILE}" \


      
       
(line-by-line evaluation)
$ python -m gpt2_dst.scripts.evaluate \
  --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
  --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
  --output_path_report={PATH_TO_REPORT}

(Or, dialog level evaluation)
$ python -m utils.evaluate_dst \
    --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
    --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
    --output_path_report={PATH_TO_REPORT}
    

       
        
$ python tools/response_evaluation.py \
    --data_json_path={PATH_TO_GOLD_RESPONSES} \
    --model_response_path={PATH_TO_MODEL_RESPONSES} \
    --single_round_evaluation


        
         
$ python tools/retrieval_evaluation.py \
    --retrieval_json_path={PATH_TO_GROUNDTRUTH_RETRIEVAL} \
    --model_score_path={PATH_TO_MODEL_CANDIDATE_SCORES} \
    --single_round_evaluation    

        
       
      
     

DevTest Results

Subtask #1: Multimodal Disambiguation

Test Method Accuracy
GPT2 from CO(Challenge Organizer) 73.9
Ours 92.28

Subtask #2: Multimodal Coreference Resolution

Test Method Object F1
GPT2 from CO 0.366
Ours-1 (sub2_1) 0.595
Ours-2 (sub2_2) 0.604
Ours-3 (sub2_3) 0.607
Ours-4 (sub2_4) 0.608

Subtask #3: Multimodal Dialog State Tracking

No Training/Testing

Subtask #4: Multimodal Dialog Response Generation

Generation

Baseline BLEU
GPT2 from CO 0.192
MTN-SIMMC2 from CO 0.217
Ours 0.285

Retrieval

No Training/Testing

《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
Config files for my GitHub profile.

Canalyst Candas Data Science Library Name Canalyst Candas Description Built by a former PM / analyst to give anyone with a little bit of Python knowle

Canalyst Candas 13 Jun 24, 2022
Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators. It's also a suite of learning algorithms to train agents to operate in these enviro

Google 1.5k Jan 02, 2023
LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

LWCC: A LightWeight Crowd Counting library for Python LWCC is a lightweight crowd counting framework for Python. It wraps four state-of-the-art models

Matija Teršek 39 Dec 28, 2022
This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

Gait3D-Benchmark This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild

82 Jan 04, 2023
Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB This repository is the official implementation of the paper. Our results comming soon in

Xiaoqiang Wang 8 May 22, 2022
Machine Learning automation and tracking

The Open-Source MLOps Orchestration Framework MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-lea

873 Jan 04, 2023
A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

8 Nov 01, 2022
Fast Neural Representations for Direct Volume Rendering

Fast Neural Representations for Direct Volume Rendering Sebastian Weiss, Philipp Hermüller, Rüdiger Westermann This repository contains the code and s

Sebastian Weiss 20 Dec 03, 2022
Datasets, Transforms and Models specific to Computer Vision

vision Datasets, Transforms and Models specific to Computer Vision Installation First install the nightly version of OneFlow python3 -m pip install on

OneFlow 68 Dec 07, 2022
A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Orchard Dataset This repository contains the code used for generating the Orchard Dataset, as seen in the Multi-Hierarchical Reasoning in Sequences: S

Bill Pung 1 Jun 05, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Study of human inductive biases in CNNs and Transformers.

Are Convolutional Neural Networks or Transformers more like human vision? This repository contains the code and fine-tuned models of popular Convoluti

Shikhar Tuli 39 Dec 08, 2022
A robust pointcloud registration pipeline based on correlation.

PHASER: A Robust and Correspondence-Free Global Pointcloud Registration Ubuntu 18.04+ROS Melodic: Overview Pointcloud registration using correspondenc

ETHZ ASL 101 Dec 01, 2022
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The Hypersim Dataset For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real i

Apple 1.3k Jan 04, 2023
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 04, 2022
Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation This reposi

First Person Vision @ Image Processing Laboratory - University of Catania 1 Aug 21, 2022
Code for 1st place solution in Sleep AI Challenge SNU Hospital

Sleep AI Challenge SNU Hospital 2021 Code for 1st place solution for Sleep AI Challenge (Note that the code is not fully organized) Refer to the notio

Saewon Yang 13 Jan 03, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022
Official repository for the paper "Instance-Conditioned GAN"

Official repository for the paper "Instance-Conditioned GAN" by Arantxa Casanova, Marlene Careil, Jakob Verbeek, Michał Drożdżal, Adriana Romero-Soriano.

Facebook Research 510 Dec 30, 2022