Anomaly Detection

시계열 데이터에 대한 이상치 탐지

1. Kernel Density Estimation을 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 kernel density estimation 모델을 적합하여 정상 데이터의 분포를 추정함
추정된 분포를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python kde.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/kde'

2. Local Outlier Factor를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 Local Outlier Factor 모델을 적합하여 n_neighbors 개수의 이웃을 기반으로 정상 데이터의 밀도를 추정함
추정된 밀도를 기반으로 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python lof.py --train_data_path='./data/nasa_bearing_train.csv' \
              --test_data_path='./data/nasa_bearing_test.csv' \
              --save_root_path='./result/lof' \
              --n_neighbors=5

3. Isolation Forest를 활용한 이상치 탐지

train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와 test data를 input으로 사용함
Train data로 isolation forest 모델을 적합함
Train data를 reference set으로 사용하여 test data의 각 시점에 대한 anomaly score를 도출하고 이를 csv 파일 및 그래프로 save_root_path에 저장함

python iforest.py --train_data_path='./data/nasa_bearing_train.csv' \
                  --test_data_path='./data/nasa_bearing_test.csv' \
                  --save_root_path='./result/iforest'

4. Spectral Residual을 활용한 이상치 탐지

설정된 window size 와 score window size 를 통해 window 구간 내 이상치를 탐지함
score window size 는 window size 보다 크게 설정해야함

python spectral.py --window= 24 \
                  --score_window=100

Anomaly Detection 이상치 탐지 전처리 모듈

Related tags

Overview

Anomaly Detection

1. Kernel Density Estimation을 활용한 이상치 탐지

2. Local Outlier Factor를 활용한 이상치 탐지

3. Isolation Forest를 활용한 이상치 탐지

4. Spectral Residual을 활용한 이상치 탐지

Owner

CLUST-consortium

Rootski - Full codebase for rootski.io (without the data)

Utilizing RBERT model for KLUE Relation Extraction task

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

State of the art faster Natural Language Processing in Tensorflow 2.0 .

Trained T5 and T5-large model for creating keywords from text

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

A telegram bot to translate 100+ Languages

An open-source NLP library: fast text cleaning and preprocessing.

Open source annotation tool for machine learning practitioners.

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

NLP command-line assistant powered by OpenAI