MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

Last update: Dec 02, 2022

Overview

MemStream

Implementation of

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivin Srivastava, Kenji Kawaguchi, Bryan Hooi

MemStream detects anomalies from a multi-aspect data stream. We output an anomaly score for each record. MemStream is a memory augmented feature extractor, allows for quick retraining, gives a theoretical bound on the memory size for effective drift handling, is robust to memory poisoning, and outperforms 11 state-of-the-art streaming anomaly detection baselines.

After an initial training of the feature extractor on a small subset of normal data, MemStream processes records in two steps: (i) It outputs anomaly scores for each record by querying the memory for K-nearest neighbours to the record encoding and calculating a discounted distance and (ii) It updates the memory, in a FIFO manner, if the anomaly score is within an update threshold β.

Demo

KDDCUP99: Run python3 memstream.py --dataset KDD --beta 1 --memlen 256
NSL-KDD: Run python3 memstream.py --dataset NSL --beta 0.1 --memlen 2048
UNSW-NB 15: Run python3 memstream.py --dataset UNSW --beta 0.1 --memlen 2048
CICIDS-DoS: Run python3 memstream.py --dataset DOS --beta 0.1 --memlen 2048
SYN: Run python3 memstream-syn.py --dataset SYN --beta 1 --memlen 16
Ionosphere: Run python3 memstream.py --dataset ionosphere --beta 0.001 --memlen 4
Cardiotocography: Run python3 memstream.py --dataset cardio --beta 1 --memlen 64
Statlog Landsat Satellite: Run python3 memstream.py --dataset statlog --beta 0.01 --memlen 32
Satimage-2: Run python3 memstream.py --dataset satimage-2 --beta 10 --memlen 256
Mammography: Run python3 memstream.py --dataset mammography --beta 0.1 --memlen 128
Pima Indians Diabetes: Run python3 memstream.py --dataset pima --beta 0.001 --memlen 64
Covertype: Run python3 memstream.py --dataset cover --beta 0.0001 --memlen 2048

Command line options

--dataset: The dataset to be used for training. Choices 'NSL', 'KDD', 'UNSW', 'DOS'. (default 'NSL')
--beta: The threshold beta to be used. (default: 0.1)
--memlen: The size of the Memory Module (default: 2048)
--dev: Pytorch device to be used for training like "cpu", "cuda:0" etc. (default: 'cuda:0')
--lr: Learning rate (default: 0.01)
--epochs: Number of epochs (default: 5000)

Input file format

MemStream expects the input multi-aspect record stream to be stored in a contains , separated file.

Datasets

Processed Datasets can be downloaded from here. Please unzip and place the files in the data folder of the repository.

Environment

This code has been tested on Debian GNU/Linux 9 with a 12GB Nvidia GeForce RTX 2080 Ti GPU, CUDA Version 10.2 and PyTorch 1.5.

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

Related tags

Overview

MemStream

Demo

Command line options

Input file format

Datasets

Environment

Owner

Stream-AD

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

A clear, concise, simple yet powerful and efficient API for deep learning.

Solution to the first stage Quiz of Hamoye internship: Introduction to Python for Machine Learning

GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures

[IJCAI'21] Deep Automatic Natural Image Matting

Multi-Horizon-Forecasting-for-Limit-Order-Books

Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

A PyTorch Implementation of the paper - Choi, Woosung, et al. "Investigating u-nets with various intermediate blocks for spectrogram-based singing voice separation." 21th International Society for Music Information Retrieval Conference, ISMIR. 2020.

Introduction to AI assignment 1 HCM University of Technology, term 211

Efficient Deep Learning Systems course

pix2pix in tensorflow.js

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

This repository contains demos I made with the Transformers library by HuggingFace.

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

WPPNets: Unsupervised CNN Training with Wasserstein Patch Priors for Image Superresolution

A criticism of a recent paper on buggy image downsampling methods in popular image processing and deep learning libraries.

Experiment about Deep Person Re-identification with EfficientNet-v2

The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021