Deep learning for NLP crash course at ABBYY.

Last update: Dec 18, 2022

Overview

Deep NLP Course at ABBYY

Deep learning for NLP crash course at ABBYY.

I'm gradually updating and translating the notebooks right now. Stay in touch.

Materials

Week 1: Introduction

Sentiment analysis on the IMDB movie review dataset: a short overview of classical machine learning for NLP + indecently brief intro to keras.

Russian version:

Updated English version:

Week 2: Word Embeddings: Part 1

Meet the Word Embeddings: an unsupervised method to capture some fun relationships between words.
Phrases similarity with word embeddings model + word based machine translation without parallel data (with MUSE word embeddings).

Russian version:

Updated English version:

Week 3: Word Embeddings: Part 2

Introduction to PyTorch. Implementation of pet linear regression on pure numpy and pytorch. Implementations of CBoW, skip-gram, negative sampling and structured Word2vec models.

Russian version:

Updated English version:

Week 4: Convolutional Neural Networks

Introduction to convolutional networks. Relations between convolutions and n-grams. Simple surname detector on character-level convolutions + fun visualizations.

Russian version:

Updated English version:

Week 5: RNNs: Part 1

RNNs for text classification. Simple RNN implementation + memorization test. Surname detector in multilingual setup: character-level LSTM classifier.

Russian version:

Updated English version:

Week 6: RNNs: Part 2

RNNs for sequence labelling. Part-of-speech tagger implementations based on word embeddings and character-level word embeddings.

Russian version:

Week 7: Language Models: Part 1

Character-level language model for Russian troll tweets generation: fixed-window model via convolutions and RNN model.
Simple conditional language model: surname generation given source language.
And Toxic Comment Classification Challenge - to apply your skills to a real-world problem.

Russian version:

Week 8: Language Models: Part 2

Word-level language model for poetry generation. Pet examples of transfer learning and multi-task learning applied to language models.

Russian version:

Week 9: Seq2seq

Seq2seq for machine translation and image captioning. Byte-pair encoding, beam search and other usefull stuff for machine translation.

Russian version:

Week 10: Seq2seq with Attention

Seq2seq with attention for machine translation and image captioning.

Russian version:

Week 11: Transformers & Text Summarization

Implementation of Transformer model for text summarization. Discussion of Pointer-Generator Networks for text summarization.

Russian version:

Week 12: Dialogue Systems: Part 1

Goal-orientied dialogue systems. Implemention of the multi-task model: intent classifier and token tagger for dialogue manager.

Russian version:

Week 13: Dialogue Systems: Part 2

General conversation dialogue systems and DSSMs. Implementation of question answering model on SQuAD dataset and chit-chat model on OpenSubtitles dataset.

Russian version:

Week 14: Pretrained Models

Pretrained models for various tasks: Universal Sentence Encoder for sentence similarity, ELMo for sequence tagging (with a bit of CRF), BERT for SWAG - reasoning about possible continuation.

Russian version:

Final Presentation

NLP Summary - summary of cool stuff that appeared and didn't in the course.

Deep learning for NLP crash course at ABBYY.

Related tags

Overview

Deep NLP Course at ABBYY

Materials

Week 1: Introduction

Week 2: Word Embeddings: Part 1

Week 3: Word Embeddings: Part 2

Week 4: Convolutional Neural Networks

Week 5: RNNs: Part 1

Week 6: RNNs: Part 2

Week 7: Language Models: Part 1

Week 8: Language Models: Part 2

Week 9: Seq2seq

Week 10: Seq2seq with Attention

Week 11: Transformers & Text Summarization

Week 12: Dialogue Systems: Part 1

Week 13: Dialogue Systems: Part 2

Week 14: Pretrained Models

Final Presentation

Owner

Dan Anastasyev

VD-BERT: A Unified Vision and Dialog Transformer with BERT

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Repository for the paper: VoiceMe: Personalized voice generation in TTS

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

KoBERT - Korean BERT pre-trained cased (KoBERT)

Generating Korean Slogans with phonetic and structural repetition

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

BERT-based Financial Question Answering System

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Arabic speech recognition, classification and text-to-speech.

Open source annotation tool for machine learning practitioners.

A library for end-to-end learning of embedding index and retrieval model

Train and use generative text models in a few lines of code.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Comprehensive-E2E-TTS - PyTorch Implementation

A python script that will use hydra to get user and password to login to ssh, ftp, and telnet