Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

Overview

Applied Natural Language Processing in the Enterprise

This is the companion repo for Applied Natural Language Processing in the Enterprise, an O'Reilly Media publication by Ankur A. Patel and Ajay Uppili Arasanipalai. Here, you will find all the source code from the book, published here on GitHub for your convenience.

Follow the steps below to get started with setting up your environment and running the code examples.

Setup

To install all the required libraries and dependencies, run the following command:

pip install nlpbook

However, the recommended approach is to use conda, a cross-platform, language-agnostic package manager that automatically handles dependency conflicts.

If you have not already, install the Miniforge distribution of Python 3.8 based on your OS. If you are on Windows, you can choose the Anaconda distribution of Python 3.8 instead of the Miniforge distribution, if you wish to.

Once conda is installed, run the following command:

conda install -c nlpbook nlpbook

Alternatively, if you'd like to keep your environment for this book isolated from the rest of your system (which we highly recommend), run the following commands:

conda create -n nlpbook
conda activate nlpbook
conda install -c nlpbook nlpbook

Then run conda activate nlpbook every time you want to return to your environment. To exit the environment, run conda deactivate.

Next, install the spaCy models.

python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg
python -m spacy download en_core_web_trf

Setup Environment Directly

If you're interested in setting up an environment to quickly get up and running with the code for this book, run the following commands from the root of this repo (please see the "Getting the Code" section below on how to set up the repo first).

conda env create --file environment.yml
conda activate nlpbook

You can also grab all the dependacies via pip:

pip install -r requirements.txt

Getting the Code

All publicly released code is in this repository. The simplest way to get started is via Git:

git clone https://github.com/nlpbook/nlpbook.git

If you're on Windows or another platform that doesn't already have git installed, you may need to obtain a Git client.

If you want a specific version to match the copy of the book you have (this can occasionally change), you can find previous versions on the releases page.

Getting the Data

Next, download data from AWS S3 (the data files are too large to store and access on Github).

aws s3 cp s3://applied-nlp-book/data/ data --recursive --no-sign-request
aws s3 cp s3://applied-nlp-book/models/ag_dataset/ models/ag_dataset --recursive --no-sign-request

How This Repo is Organized

Each chapter in the book has a corresponding notebook in the root of this project repository. They are named chXX.ipynb for the chapter XX. The appendices are named apXX.ipynb.

Note: This repo only contains the code for the chapters, not the actual text in the book. For the complete text, please purchase a copy of the book. Chapters 1, 2, and 3 have been open-sourced, courtesy of O'Reilly and the authors.

Once you'd navigated to the nlpbook project directory, you can lauch a Jupyter client such as Jupyter Lab, Jupyter Notebooks, or VS Code to view and run the notebooks.

Contributions and Errata

We welcome any suggestions, feedback, and errata from readers. If you notice anything that seems off in the book or could use improvement, we've love to hear from you. Feel free to submit an issue here on GitHub or on our errata page.

Copyright Notice

This material is made available by the Creative Commons Attribution-Noncommercial-No Derivatives 4.0 International Public License.

Note: You are free to use the code in accordance with the MIT license, but you are not allowed to redistribute or sell any of the text presented in chapters 1, 2, and 3, which have been open-sourced for the benefit of the community. Please consider purchasing a copy of the book if you are interested in reading the text that accompanies the code presented in this repo.

You might also like...
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow.  This is part of the CASL project: http://casl-project.ai/
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

DELTA is a deep learning based natural language and speech processing platform.
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Comments
  • Download failed for train_prepared.csv

    Download failed for train_prepared.csv

    download failed: s3://applied-nlp-book/data/ag_dataset/prepared/train_prepared.csv to data/train_prepared.csv An error occurred (AccessDenied) when calling the GetObject operation: Access Denied

    opened by sharma-ji 2
  • Chapter 05: data contains no attribute

    Chapter 05: data contains no attribute "Field"

    In chapter 05 when setting up the fields for training an Embedding on IMDB data you propose:

    TEXT = data.Field(lower=True, include_lengths=True, \
    batch_first=False, tokenize='spacy')
    LABEL = data.LabelField()
    

    However, data has not been defined yet. The module data imported from torchtext.__all__ does not contain an attribute Field. In the sources of torchtext I couldn't find it either.

    Can you advise or define data ?

    My Python version: 1.9.0 My Torchtext version: 0.10.0

    opened by iNLyze 1
  • No 'data' folder in Ch. 1

    No 'data' folder in Ch. 1

    Hello,

    I purchased your book and started reading Ch.1. Great book so far. I tried to emulate what is written in your book and ipynb. But there is no folder "data" that can retrieve Jeopardy questions. I guess this kind of incompleteness will not be the last even though I am reading your first chapter. Could you run your notebooks in a new environment and check what is missing? Thank you in advance. It would be an option to make your notebooks run in Colab. Then, you can write a setup file at the beginning of each chapter and users won't have issues running the scripts.

    opened by knslee07 1
Releases(v1.0.0)
  • v1.0.0(May 29, 2021)

    This is the initial public release of the source code for "Applied Natural Language Processing in the Enterprise" by Ankur A. Patel and Ajay Uppili Arasanipalai.

    Source code(tar.gz)
    Source code(zip)
Owner
Applied Natural Language Processing in the Enterprise
An O'Reilly Media book by Ankur A. Patel and Ajay Uppili Arasanipalai
Applied Natural Language Processing in the Enterprise
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 06, 2023
Saptak Bhoumik 14 May 24, 2022
BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by

DMIS Laboratory - Korea University 99 Jan 06, 2023
open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

7 Nov 02, 2022
Deduplication is the task to combine different representations of the same real world entity.

Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training wi

63 Nov 17, 2022
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

BROS (BERT Relying On Spatiality) is a pre-trained language model focusing on text and layout for better key information extraction from documents. Given the OCR results of the document image, which

Clova AI Research 94 Dec 30, 2022
Making text a first-class citizen in TensorFlow.

TensorFlow Text - Text processing in Tensorflow IMPORTANT: When installing TF Text with pip install, please note the version of TensorFlow you are run

1k Dec 26, 2022
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。 安装 pip install -U ttskit 注意 可能需另外安装的依赖包:torch,版本要求torch=1.6.0,=1.7.1,根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

KDD 483 Jan 04, 2023
PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

PyTranslator O Que é e para que serve o PyTranslator? PyTranslator é simultaneamente um editor e tradutor de texto em com interface gráfica que usa a

Elizeu Barbosa Abreu 1 May 12, 2022
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022
Text Classification Using LSTM

Text classification is the task of assigning a set of predefined categories to free text. Text classifiers can be used to organize, structure, and categorize pretty much anything. For example, new ar

KrishArul26 3 Jan 03, 2023
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Jan 03, 2023
Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
Super easy library for BERT based NLP models

Fast-Bert New - Learning Rate Finder for Text Classification Training (borrowed with thanks from https://github.com/davidtvs/pytorch-lr-finder) Suppor

Utterworks 1.8k Dec 27, 2022
The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"

Zhiyu Chen 114 Dec 29, 2022
This repository contains helper functions which can help you generate additional data points depending on your NLP task.

NLP Albumentations For Data Augmentation This repository contains helper functions which can help you generate additional data points depending on you

Aflah 6 May 22, 2022
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 09, 2022
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 342 Jan 05, 2023
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

3.2k Dec 30, 2022