TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Last update: Oct 31, 2021

Overview

TCube: Domain-Agnostic Neural Time series Narration

This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narration" (to appear in IEEE ICDM 2021).

The PLMs used in this effort (T5, BART, and GPT-2) are implemented using the HuggingFace library (https://huggingface.co/) and finetuned to the WebNLG v3 (https://gitlab.com/shimorina/webnlg-dataset/-/tree/master/release_v3.0) and DART (https://arxiv.org/abs/2007.02871) datasets.

Clones of both datasets are available under /Finetune PLMs/Datasets in this repository.

The PLMs fine-tuned to WebNLG/DART could not be uploaded due to the 1GB limitations of GitLFS. However, pre-made scripts in this repository (detailed below) are present for convientiently fine-tuning these models.

The entire repository is based on Python 3.6 and the results are visaulized through the iPython Notebooks.

Dependencies

Interactive Environments

notebook
ipywidgets==7.5.1

Deep Learning Frameworks

torch 1.7.1 (suited to your CUDA version)
pytorch-lightning 0.9.0
transformers==3.1.0

NLP Toolkits

sentencepiece==0.1.91
nltk

Scientific Computing, Data Manipulation, and Visualizations

numpy
scipy
sklearn
matplotib
pandas
pwlf

Evaluation

rouge-score
textstat
lexical_diversity
language-tool-python

Misc

xlrd
tqdm
cython

Please make sure that the aforementioned Python packages with their specified versions are installed in your system in a separate virtual environment.

Data-Preprocessing Scripts

Under /Finetune PLMs in this repository there are two scripts for pre-processing the WebNLG and DART datasets:

preprocess_webnlg.py
preprocess_dart.py

These scripts draw from the original datasets in /Finetune PLMs/Datasets/WebNLGv3 and /Finetune PLMs/Datasets/DART and prepare CSV files in /Finetune PLMs/Datasets breaking the original datasets into train, dev, and test sets in the format required by our PLMs.

Fine-tuning Scripts

Under /Finetune PLMs in this repository there are three scripts for fine-tuning T5, BART, and GPT-2:

finetuneT5.py
finetuneBART.py
finetuneGPT2.py

Visualization and Evaluation Notebooks

In the root directory are 10 notebooks. For the descriptions of the time-series datasets used:

Datatsets.ipynb

For comparisons of segmentation and regime-change detection algorithms:

Error Determination.ipynb
Regime Detection.ipynb
Segmentation.ipynb
Trend Detection Plot.ipynb

For the evaluation of the TCube framework on respective time-series datasets:

T3-COVID.ipnyb
T3-DOTS.ipnyb
T3-Pollution.ipnyb
T3-Population.ipnyb
T3-Temperature.ipnyb

Citation and Contact

If any part of this code repository or the TCube framework is used in your work, please cite our paper. Thanks!

Contact: Mandar Sharma ([email protected]), First Author.

TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Related tags

Overview

TCube: Domain-Agnostic Neural Time series Narration

Dependencies

Interactive Environments

Deep Learning Frameworks

NLP Toolkits

Scientific Computing, Data Manipulation, and Visualizations

Evaluation

Misc

Data-Preprocessing Scripts

Fine-tuning Scripts

Visualization and Evaluation Notebooks

Citation and Contact

Owner

Mandar Sharma

This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling

Code-free deep segmentation for computational pathology

Finite Element Analysis

text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.

Unimodal Face Classification with Multimodal Training

Code accompanying the paper "ProxyFL: Decentralized Federated Learning through Proxy Model Sharing"

Galactic and gravitational dynamics in Python

The mini-MusicNet dataset

What can linearized neural networks actually say about generalization?

Examples of using f2py to get high-speed Fortran integrated with Python easily

Experiments with Fourier layers on simulation data.

Label Mask for Multi-label Classification

A LiDAR point cloud cluster for panoptic segmentation

Inferring Lexicographically-Ordered Rewards from Preferences

The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

Machine Learning University: Accelerated Computer Vision Class

Implement some metaheuristics and cost functions

AI that generate music

Pytorch implementation of VAEs for heterogeneous likelihoods.

Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"