Natural Language Processing

Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022

What is the course about?

The course is designed as an introduction to the basics of natural language processing for analyzing unstructured, user-generated content. It is for beginners to the topic (and NLP in general), but it will be helpful to have basic knowledge of Python and a familarity with data science techniques.

Topics covered include:

text preprocessing in Python,
collecting your own data from Twitter and Reddit,
content analysis,
text embeddings, and
supervised learning with text data.

What materials are available here?

The sildes will be posted on the course BlackBoard page. They mostly serve as a high-level introduction to the examples and exercies (in Colab notebooks), which are linked to from the slides themselves. Copies of the Colab notebooks can also be found in the folder called /colab in this repository.

Can I work through the material on my own?

If you didn't attend the class, you can certainly work through the materials on your own (the Colab notebooks are designed to be readable and doable for individuals working at their own pace). The slides posted on BlackBoard will guide you through the content. The notebooks are intendend to be worked through in order. Each one will have examples to view and 1 or 2 practice exercises to complete.

Aknowledgements

I would like to aknowledge Steve Wilson at Oakland University for making his DS3 workshop materials publically available with an MIT license.

Natural Language Processing at EDHEC, 2022

Related tags

Overview

Natural Language Processing

What is the course about?

What materials are available here?

Can I work through the material on my own?

Aknowledgements

Owner

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

🤕 spelling exceptions builder for lazy people

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Ukrainian TTS (text-to-speech) using Coqui TTS

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

Edge-Augmented Graph Transformer

Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Convolutional 2D Knowledge Graph Embeddings resources

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

ADCS - Automatic Defect Classification System (ADCS) for SSMC

vits chinese, tts chinese, tts mandarin

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Korean Sentence Embedding Repository

Official PyTorch implementation of SegFormer

Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

A curated list of efficient attention modules