The aim of this task is to predict someone's English proficiency based on a text input.

Last update: Dec 13, 2021

Overview

English_proficiency_prediction_NLP

The aim of this task is to predict someone's English proficiency based on a text input.

Using the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

The goal is to build a machine learning algorithm for predicting the SST score of each participant based on their transcript.

Steps:

1 - Pre-process the dataset: extract the participant transcript (all tags). Inside participant transcript, you can remove all other tags and extract only English words.

2 - Process the dataset: extract features with the Bag of Word (BoW) technique

3 - Train a classifier to predict the SST score

4 - Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.

5 - Try to improve your system (for example you can try to use GloVe instead of BoW).

The aim of this task is to predict someone's English proficiency based on a text input.

Related tags

Overview

English_proficiency_prediction_NLP

Owner

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Journey is a NLP-Powered Developer assistant

Shared code for training sentence embeddings with Flax / JAX

Simple translation demo showcasing our headliner package.

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

DeLighT: Very Deep and Light-Weight Transformers

Material for GW4SHM workshop, 16/03/2022.

A Paper List for Speech Translation

A NLP program: tokenize method, PoS Tagging with deep learning

ChatBotProyect - This is an unfinished project about a simple chatbot.

BiQE: Code and dataset for the BiQE paper

Python SDK for working with Voicegain Speech-to-Text

Fast, DB Backed pretrained word embeddings for natural language processing.

Subtitle Workshop (subshop): tools to download and synchronize subtitles

Natural language computational chemistry command line interface.

Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Maha is a text processing library specially developed to deal with Arabic text.

Natural Language Processing

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

a test times augmentation toolkit based on paddle2.0.