Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

Last update: Jun 28, 2022

Related tags

Data Analysis PyUpBit

Overview

PyUpBit

CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing
Paper

Table of Contents

About The Project
Usage
Contact
Acknowledgements

About The Project

Bitmaps are common data structures used in database implemen- tations due to having fast read performance. Often they are used in applications in need of common equality and selective range queries. Essentially, they store a bit-vector for each value in the domain of each attribute to keep track of large scale data files. How- ever, the main drawbacks associated with bitmap indexes are its encoding and decoding performances of bit-vectors. Currently the state of art update-optimized bitmap index, update conscious bitmaps, are able to support extremely efficient deletes and have improved update speeds by treating updates as delete then insert. Update conscious bitmaps make use of an additional bit-vector, called the existence bit-vector, to keep track of whether or not a value has been updated. By initializing all values of the existence bit-vector to 1, the data for each attribute associated with each row in the existence bit-vector is validated and presented. If a value needs to be deleted, the corresponding row in the existence bit-vector gets changed to 0, invalidating any data associated with that row. This new method in turn allows for very efficient deletes. To add on, updates are then performed as a delete operation, then an insert operation in to the end of the bit-vector. However, update conscious bitmaps do not scale well with more data. As more and more data gets updated and inserted, the run time increases significantly as well. Because update queries are out-of- place and increase size of vectors, read queries become increasingly expensive and time consuming. Furthermore, as the number of updates and deletes increases, the bit-vector becomes less and less compressible. This brings us to updateable Bitmaps (UpBit). According to the paper, UpBit: Scalable In-Memory Updatable Bitmap Indexing, re- searchers Manos Athanassoulis, Zheng Yan, and Stratos Idreos developed a new bitmap structure that improved the write per- formance of bitmaps without sacrificing read performance. The main differentiating point of UpBit is its use of an update bit vector for every value in the domain of an attribute that keeps track of updated values. This allows for faster write performance without sacrificing read performance. Based on this paper, we implemented UpBit and compared it to our implementation of update conscious bitmaps to compare and test the performances of both methods.

Usage

We used PyCharm to conduct our tests, /ucb, /upbit for algorithms, /tests for running testing scripts, and rest of the files for compression for memory usage improvement as well as creating and visualizing data.

Contact

Daniel Park - @h1yung - [email protected]

Acknowledgements

Original Paper
Winston Chen
Gregory Chininis
Daniel Hooks
Michael Lee

Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

Related tags

Overview

PyUpBit

About The Project

Usage

Contact

Acknowledgements

Owner

Hyeong Kyun (Daniel) Park

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Pandas and Dask test helper methods with beautiful error messages.

My first Python project is a simple Mad Libs program.

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

Python reader for Linked Data in HDF5 files

Tools for the analysis, simulation, and presentation of Lorentz TEM data.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Hg002-qc-snakemake - HG002 QC Snakemake

Business Intelligence (BI) in Python, OLAP

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

SparseLasso: Sparse Solutions for the Lasso

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Flenser is a simple, minimal, automated exploratory data analysis tool.

Stock Analysis dashboard Using Streamlit and Python

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.