Grammar Induction using a Template Tree Approach

Related tags

Deep Learninggitta
Overview

Gitta

Gitta ("Grammar Induction using a Template Tree Approach") is a method for inducing context-free grammars. It performs particularly well on datasets that have latent templates, e.g. forum topics, writing prompts and output from template-based text generators. The found context-free grammars can easily be converted into grammars for use in grammar languages such as Tracery & Babbly.

Demo

A demo for Gitta can be found & executed on Google Colaboratory.

Example

dataset = [
    "I like cats and dogs",
    "I like bananas and geese",
    "I like geese and cats",
    "bananas are not supposed to be in a salad",
    "geese are not supposed to be in the zoo",
]
induced_grammar = grammar_induction.induce_grammar_using_template_trees(
    dataset,
    relative_similarity_threshold=0.1,
)
print(induced_grammar)
print(induced_grammar.generate_all())

Outputs as grammar:

{
    "origin": [
        "<B> are not supposed to be in <C>",
        "I like <B> and <B>"
    ],
    "B": [
        "bananas",
        "cats",
        "dogs",
        "geese"
    ],
    "C": [
        "a salad",
        "the zoo"
    ]
}

Which in turn generates all these texts:

{"dogs are not supposed to be in the zoo",
"cats are not supposed to be in a salad",
"I like geese and cats",
"cats are not supposed to be in the zoo", 
bananas are not supposed to be in a salad",
"I like dogs and dogs",
"bananas are not supposed to be in the zoo",
"I like dogs and bananas",
"geese are not supposed to be in the zoo",
"geese are not supposed to be in a salad",
"I like cats and dogs",
"I like dogs and geese",
"I like cats and bananas",
"I like bananas and dogs",
"I like bananas and bananas",
"I like cats and geese",
"I like geese and dogs",
"I like dogs and cats",
"I like geese and bananas",
"I like bananas and geese",
"dogs are not supposed to be in a salad",
"I like cats and cats",
"I like geese and geese",
"I like bananas and cats"}

Performance

We tested out this grammar induction algorithm on Twitterbots using the Tracery grammar modelling tool. Gitta only saw either 25, 50 or 100 example generations, and had to introduce a grammar that could generate similar texts. Every setting was run 5 times, and the median number of in-language texts (generations that were also produced by the original grammar) and not in-language texts (texts that the induced grammar generated, but not the original grammar). The median number of production rules is also included, to show its generalisation performance.

Grammar 25 examples 50 examples 100 examples
Name # generations size in lang not in lang size in lang not in lang size in lang not in lang size
botdoesnot 380292 363 648 0 64 2420 0 115 1596 4 179
BotSpill 43452 249 75 0 32 150 0 62 324 0 126
coldteabot 448 24 39 0 38 149 19 63 388 9 78
hometapingkills 4080 138 440 0 48 1184 3240 76 2536 7481 106
InstallingJava 390096 95 437 230 72 2019 1910 146 1156 3399 228
pumpkinspiceit 6781 6885 25 0 26 50 0 54 100 8 110
SkoolDetention 224 35 132 0 31 210 29 41 224 29 49
soundesignquery 15360 168 256 179 52 76 2 83 217 94 152
whatkilledme 4192 132 418 0 45 1178 0 74 2646 0 108
Whinge_Bot 450805 870 3092 6 80 16300 748 131 59210 1710 222

Credits & Paper citation

If you like this work, consider following me on Twitter. If use this work in an academic context, please consider citing the following paper:

@article{winters2020gitta,
    title={Discovering Textual Structures: Generative Grammar Induction using Template Trees},
    author={Winters, Thomas and De Raedt, Luc},
    journal={Proceedings of the 11th International Conference on Computational Creativity},
    pages = {177-180},
    year={2020},
    publisher={Association for Computational Creativity}
}

Or APA style:

Winters, T., & De Raedt, L. (2020). Discovering Textual Structures: Generative Grammar Induction using Template Trees. Proceedings of the 11th International Conference on Computational Creativity.
Owner
Thomas Winters
PhD Researcher in Creative Artificial Intelligence @ KU Leuven.
Thomas Winters
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

Jina AI 25 Oct 14, 2022
Imagededup - 😎 Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

idealo 4.3k Jan 07, 2023
SWA Object Detection

SWA Object Detection This project hosts the scripts for training SWA object detectors, as presented in our paper: @article{zhang2020swa, title={SWA

237 Nov 28, 2022
MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc.

MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc. ⭐⭐⭐⭐⭐

568 Jan 04, 2023
Trustworthy AI related projects

Trustworthy AI This repository aims to include trustworthy AI related projects from Huawei Noah's Ark Lab. Current projects include: Causal Structure

HUAWEI Noah's Ark Lab 589 Dec 30, 2022
PolyTrack: Tracking with Bounding Polygons

PolyTrack: Tracking with Bounding Polygons Abstract In this paper, we present a novel method called PolyTrack for fast multi-object tracking and segme

Gaspar Faure 13 Sep 15, 2022
Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

Do you want a RL agent nicely moving on Atari? Rainbow is all you need! This is a step-by-step tutorial from DQN to Rainbow. Every chapter contains bo

Jinwoo Park (Curt) 1.4k Dec 29, 2022
Efficiently computes derivatives of numpy code.

Note: Autograd is still being maintained but is no longer actively developed. The main developers (Dougal Maclaurin, David Duvenaud, Matt Johnson, and

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 6.1k Jan 08, 2023
Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics

Location-Aware Generative Adversarial Networks (LAGAN) for Physics Synthesis This repository contains all the code used in L. de Oliveira (@lukedeo),

Deep Learning for HEP 57 Oct 22, 2022
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins

2 Jun 17, 2022
This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios"

TinyWeaklyIsolationForest This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised a

2 Mar 21, 2022
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Vision Transformer Pytorch reimplementation of Google's repository for the ViT model that was released with the paper An Image is Worth 16x16 Words: T

Eunkwang Jeon 1.4k Dec 28, 2022
Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir.

NetScanner.py Ağ tarayıcı.Gönderdiği paketler ile ağa bağlı olan cihazların IP adreslerini gösterir. Linux'da Kullanımı: git clone https://github.com/

4 Aug 23, 2021
FreeSOLO for unsupervised instance segmentation, CVPR 2022

FreeSOLO: Learning to Segment Objects without Annotations This project hosts the code for implementing the FreeSOLO algorithm for unsupervised instanc

NVIDIA Research Projects 253 Jan 02, 2023
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

ming71 215 Nov 28, 2022
Image Segmentation and Object Detection in Pytorch

Image Segmentation and Object Detection in Pytorch Pytorch-Segmentation-Detection is a library for image segmentation and object detection with report

Daniil Pakhomov 732 Dec 10, 2022
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

CSF Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion Tips: For testing: CUDA_VISIBLE_DEVICES=0 python main.py For trai

Han Xu 14 Oct 31, 2022
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

Sense-GVT 470 Dec 30, 2022
A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

pyHype: Computational Fluid Dynamics in Python pyHype is a Python framework for developing parallelized Computational Fluid Dynamics software to solve

Mohamed Khalil 21 Nov 22, 2022