Problem: Given a nepali news find the category of the news

Overview

Classification of category of nepali news catorgory using different algorithms

Problem: Multiclass Classification

Approaches:

  • TFIDF for vectorization and different algorithms on top of it
  • Custom trained Doc2Vec model on nepali text coorpus for vectorization and different ML algorithms on top of it
  • Pretrained FastText nepali Word2Vec embedding for vectorization and different algorithms on top of it
  • Custom trained nepali word2vec embedding for vectorization and different ML and DL algorithms on top it
  • Applying nepaliBERT for vectorization and different ML and DL algorithms on top of it
  • Finetuning nepaliBERT model on custom data for multiclass classification
Owner
pudasainishushant
Machine Learning Engineer with specialization in Natural Language Processing
pudasainishushant
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 06, 2023
硕士期间自学的NLP子任务,供学习参考

NLP_Chinese_down_stream_task 自学的NLP子任务,供学习参考 任务1 :短文本分类 (1).数据集:THUCNews中文文本数据集(10分类) (2).模型:BERT+FC/LSTM,Pytorch实现 (3).使用方法: 预训练模型使用的是中文BERT-WWM, 下载地

12 May 31, 2022
Research code for the paper "Fine-tuning wav2vec2 for speaker recognition"

Fine-tuning wav2vec2 for speaker recognition This is the code used to run the experiments in https://arxiv.org/abs/2109.15053. Detailed logs of each t

Nik 103 Dec 26, 2022
A python script that will use hydra to get user and password to login to ssh, ftp, and telnet

Hydra-Auto-Hack A python script that will use hydra to get user and password to login to ssh, ftp, and telnet Project Description This python script w

2 Jan 16, 2022
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP prod

VinAI Research 109 Dec 02, 2022
A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

Yu-Hsiang Huang 7.1k Jan 05, 2023
Train and use generative text models in a few lines of code.

blather Train and use generative text models in a few lines of code. To see blather in action check out the colab notebook! Installation Use the packa

Dan Carroll 16 Nov 07, 2022
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

This Repository contains a sample code for Tacotron 2, WaveGlow with multi-speaker, emotion embeddings together with a script for data preprocessing.

Ivan Didur 106 Jan 01, 2023
DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 03, 2023
SDL: Synthetic Document Layout dataset

SDL is the project that synthesizes document images. It facilitates multiple-level labeling on document images and can generate in multiple languages.

Sơn Nguyễn 0 Oct 07, 2021
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

A Deep Learning NLP/NLU library by Intel® AI Lab Overview | Models | Installation | Examples | Documentation | Tutorials | Contributing NLP Architect

Intel Labs 2.9k Dec 31, 2022
A flask application to predict the speech emotion of any .wav file.

This is a speech emotion recognition app. It will allow you to train a modular MLP model with the RAVDESS dataset, and then use that model with a flask application to predict the speech emotion of an

Aryan Vijaywargia 2 Dec 15, 2021
TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

TFPNER TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech Named entity recognition (NER), which aims at identifyin

1 Feb 07, 2022
This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

EleutherAI 42 Dec 13, 2022
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 2.3k Jan 01, 2023
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
Script and models for clustering LAION-400m CLIP embeddings.

clustering-laion400m Script and models for clustering LAION-400m CLIP embeddings. Models were fit on the first million or so image embeddings. A subje

Peter Baylies 22 Oct 04, 2022
UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

Microsoft 281 Dec 15, 2022
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

652 Jan 06, 2023