A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Last update: Sep 07, 2022

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

This repo shows my project about real-time stock data pipeline. All the code is written in PYTHON. In this project, I play with various Data Engineering frameworks to develop a financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh. I used Kafka for realtime stock price and market news streaming, Cassandra for historical and realtime stock data warehousing, and Bokeh for visualization on web browsers. I also wrote a web crawler to scrape companys' financial statements and basic information from Yahoo Finance, and played with various economy data APIs.

Architecture

There are currently 3 tabs in the webpage:

Stock: Streaming & Fundamental
- Single stock's candlestick plot, basic company & financial information;
- Realtime S&P500 price during trading hours (fake date during non-trading hours)
Stock: Comparison
- 2 user-selected stocks' price, and their statstical summay and correlation
- 5,10,30-day moving average of adjusted close price
Economy
- Geomap of various economy data by state
- 4 economy indicators nationwide for comparison
- The most recent market news

Here is the architecture of the platform.

How Stock Data is Streamed via Kafka to Cassandra:

Please check each tab's screenshot:

Tab 1:

Tab 2:

Tab 3:

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Related tags

Overview

Realtime Financial Market Data Visualization and Analysis

Introduction

Architecture

Owner

An extension to pandas dataframes describe function.

signac-flow - manage workflows with signac

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Data pipelines built with polars

Provide a market analysis (R)

Useful tool for inserting DataFrames into the Excel sheet.

.npy, .npz, .mtx converter.

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

CINECA molecular dynamics tutorial set

Very useful and necessary functions that simplify working with data

[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Weather analysis with Python, SQLite, SQLAlchemy, and Flask

A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

Instant search for and access to many datasets in Pyspark.

Handle, manipulate, and convert data with units in Python

This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

Building house price data pipelines with Apache Beam and Spark on GCP

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine