Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Last update: Dec 06, 2022

Related tags

Overview

anlp21

Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dbamman/info256.html

Notebook	Description
1.words/EvaluateTokenizationForSentiment	The impact of tokenization choices on sentiment classification.
1.words/ExploreTokenization	Different methods for tokenizing texts (whitespace, NLTK, spacy, regex)
1.words/TokenizePrintedBooks	Design a better tokenizer for printed books
1.words/Text_Complexity	Implement type-token ratio and Flesch-Kincaid Grade Level scores for text
2.compare/ChiSquare, Mann-Whitney Tests	Explore two tests for finding distinctive terms
2.compare/Log-odds ratio with priors	Implement the log-odds ratio with an informative (and uninformative) Dirichlet prior
3.dictionaries/DictionaryTimeSeries	Plot sentiment over time using human-defined dictionaries
3.dictionaries/Empath	Explore using Empath dictionaries to characterize texts
4.embeddings/DistributionalSimilarity	Explore distributional hypothesis to build high-dimensional, sparse representations for words
4.embeddings/WordEmbeddings	Explore word embeddings using Gensim
4.embeddings/Semaxis	Implement SemAxis for scoring terms along a user-defined axis (e.g., positive-negative, concrete-abstract, hot-cold),
4.embeddings/BERT	Explore the basics of token representations in BERT and use it to find token nearest neighbors
4.embedings/SequenceEmbeddings	Use sequence embeddings to find TV episode summaries most similar to a short description
5.eda/WordSenseClustering	Inferring distinct word senses using KMeans clustering over BERT representations
5.eda/Haiku KMeans	Explore text representation in clustering by trying to group haiku and non-haiku poems into two distinct clusters

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Related tags

Overview

anlp21

Owner

David Bamman

Snowball compiler and stemming algorithms

Every Google, Azure & IBM text to speech voice for free

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Simple, hackable offline speech to text - using the VOSK-API.

Easy, fast, effective, and automatic g-code compression!

Uses Google's gTTS module to easily create robo text readin' on command.

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

LSTM model - IMDB review sentiment analysis

BiQE: Code and dataset for the BiQE paper

Natural Language Processing library built with AllenNLP 🌲🌱

Quick insights from Zoom meeting transcripts using Graph + NLP

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Python powered crossword generator with database with 20k+ polish words

Unsupervised text tokenizer for Neural Network-based text generation.

中文生成式预训练模型

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,