Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.

Overview

time-series-kafka-demo

img

Mock stream producer for time series data using Kafka.

I walk through this tutorial and others here on GitHub and on my Medium blog. Here is a friend link for open access to the article on Towards Data Science: Make a mock “real-time” data stream with Python and Kafka. I'll always add friend links on my GitHub tutorials for free Medium access if you don't have a paid Medium membership (referral link).

If you find any of this useful, I always appreciate contributions to my Saturday morning fancy coffee fund!

This repo demos how to convert a csv file of timestamped data into a real-time stream useful for testing streaming analytics. An example input file with random time series data and a script for generating the file are included in the data directory.

The producer and consumer Python scripts use Confluent's Kafka client for Python, which is installed in the Docker image built with the accompanying Dockerfile, if you choose to use it.

Requires Docker and Docker Compose.

Usage

Clone repo and cd into directory.

git clone https://github.com/mtpatter/time-series-kafka-demo.git
cd time-series-kafka-demo

Start the Kafka broker

docker compose up --build

Build a Docker image (optionally, for the producer and consumer)

From the main root directory:

docker build -t "kafkacsv" .

If you want to use Docker for the python scripts, this should now work:

docker run -it --rm kafkacsv python bin/sendStream.py -h

Start a consumer

To start a consumer for printing all messages in real-time from the stream "my-stream":

python bin/processStream.py my-stream

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/processStream.py my-stream

Produce a time series stream

Send time series from data/data.csv to topic “my-stream”, and speed it up by a factor of 10.

python bin/sendStream.py data/data.csv my-stream --speed 10

or with Docker:

docker run -it --rm \
      -v $PWD:/home \
      --network=host \
      kafkacsv python bin/sendStream.py data/data.csv my-stream --speed 10

Shut down and clean up

Stop the consumer with Return and Ctrl+C.

Shutdown Kafka broker system:

docker compose down
Speed up Sphinx builds by selectively removing toctrees from some pages

Remove toctrees from Sphinx pages Improve your Sphinx build time by selectively removing TocTree objects from pages. This is useful if your documentat

Executable Books 8 Jan 04, 2023
OpenAPI (f.k.a Swagger) Specification code generator. Supports C#, PowerShell, Go, Java, Node.js, TypeScript, Python

AutoRest The AutoRest tool generates client libraries for accessing RESTful web services. Input to AutoRest is a spec that describes the REST API usin

Microsoft Azure 4.1k Jan 06, 2023
Spin-off Notice: the modules and functions used by our research notebooks have been refactored into another repository

Fecon235 - Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio eq

Adriano 825 Dec 27, 2022
Make posters from Markdown files.

MkPosters Create posters using Markdown. Supports icons, admonitions, and LaTeX mathematics. At the moment it is restricted to the specific layout of

Patrick Kidger 243 Dec 20, 2022
A complete kickstart devcontainer repository for python3

A complete kickstart devcontainer repository for python3

Viktor Freiman 3 Dec 23, 2022
level2-data-annotation_cv-level2-cv-15 created by GitHub Classroom

[AI Tech 3기 Level2 P Stage] 글자 검출 대회 팀원 소개 김규리_T3016 박정현_T3094 석진혁_T3109 손정균_T3111 이현진_T3174 임종현_T3182 Overview OCR (Optimal Character Recognition) 기술

6 Jun 10, 2022
Leetcode Practice

LeetCode Practice Description This is my LeetCode Practice. Visit LeetCode Website for detailed question description. The code in this repository has

Leo Hsieh 75 Dec 27, 2022
Data-Scrapping SEO - the project uses various data scrapping and Google autocompletes API tools to provide relevant points of different keywords so that search engines can be optimized

Data-Scrapping SEO - the project uses various data scrapping and Google autocompletes API tools to provide relevant points of different keywords so that search engines can be optimized; as this infor

Vibhav Kumar Dixit 2 Jul 18, 2022
Count the number of lines of code in a directory, minus the irrelevant stuff

countloc Simple library to count the lines of code in a directory (excluding stuff like node_modules) Simply just run: countloc node_modules args to

Anish 4 Feb 14, 2022
VSCode extension that generates docstrings for python files

VSCode Python Docstring Generator Visual Studio Code extension to quickly generate docstrings for python functions. Features Quickly generate a docstr

Nils Werner 506 Jan 03, 2023
Autolookup GUI Plugin for Plover

Word Tray for Plover Word Tray is a GUI plugin that automatically looks up efficient outlines for words that start with the current input, much like a

Kathy 3 Jun 08, 2022
Swagger Documentation Generator for Django REST Framework: deprecated

Django REST Swagger: deprecated (2019-06-04) This project is no longer being maintained. Please consider drf-yasg as an alternative/successor. I haven

Marc Gibbons 2.6k Jan 03, 2023
Fully typesafe, Rust-like Result and Option types for Python

safetywrap Fully typesafe, Rust-inspired wrapper types for Python values Summary This library provides two main wrappers: Result and Option. These typ

Matthew Planchard 32 Dec 25, 2022
Netbox Dns is a netbox plugin for managing zone, nameserver and record inventory.

Netbox DNS Netbox Dns is a netbox plugin for managing zone, nameserver and record inventory. Features Manage zones (domains) you have. Manage nameserv

Aurora Research Lab 155 Jan 06, 2023
An ongoing curated list of OS X best applications, libraries, frameworks and tools to help developers set up their macOS Laptop.

macOS Development Setup Welcome to MacOS Local Development & Setup. An ongoing curated list of OS X best applications, libraries, frameworks and tools

Paul Veillard 3 Apr 03, 2022
A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

Shahid Yousuf 1 Jan 22, 2022
The sarge package provides a wrapper for subprocess which provides command pipeline functionality.

Overview The sarge package provides a wrapper for subprocess which provides command pipeline functionality. This package leverages subprocess to provi

Vinay Sajip 14 Dec 18, 2022
CoderByte | Practice, Tutorials & Interview Preparation Solutions|

CoderByte | Practice, Tutorials & Interview Preparation Solutions This repository consists of solutions to CoderByte practice, tutorials, and intervie

Eda AYDIN 6 Aug 09, 2022
A module filled with many useful functions and modules in various subjects.

Usefulpy Check out the Usefulpy site Usefulpy site is not always up to date Download and Import download and install with with pip download usefulpyth

Austin Garcia 1 Dec 28, 2021
📖 Generate markdown API documentation from Google-style Python docstring. The lazy alternative to Sphinx.

lazydocs Generate markdown API documentation for Google-style Python docstring. Getting Started • Features • Documentation • Support • Contribution •

Machine Learning Tooling 118 Dec 31, 2022