Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

OpenSource bot for control groups ...

⭕️ کمک به افراد برای اداره هرچه فان تره گروه 📟 همه گروه های بزرگ نیاز به یه بات خفن دارن تا از گروه مراقبت کنه این بات کارش همینه سعی کرده فیچر خیلی

Mehran Alam Beigi 2 Nov 26, 2021
Scrapes an instagram user's photos and videos

Instagram Scraper instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos.

7.3k Nov 18, 2022
Discord bot that plays cricket with the user

CricBot Table of content Commands Installation Game rules License Commands S.No Command Use 1. cric Open the home window. This command is not necessa

Raveesh Yadav 1 Nov 19, 2021
Script for polybar to display and control media(not only Spotify) using DBus.

polybar-now-playing Script for polybar to display and control media(not only Spotify) using DBus Python script to display and control current playing

Dope Wizard 48 Dec 31, 2022
A simple discord bot named atticus that sends you the timetable of your classes upon request

A simple discord bot named atticus that sends you the timetable of your classes upon request. Soon, it would you ping you before classes too!

Samhitha 3 Oct 13, 2022
Student-Management-System-in-Python - Student Management System in Python

Student-Management-System-in-Python Student Management System in Python

G.Niruthian 3 Jan 01, 2022
A repository of publicly verifiable token Sale contracts

Token-Sale-Plutus-Contract A repository of publicly verifiable token sale and royalty contracts. This will be the storage solution since it is easily

Logical Mechanism 29 Aug 18, 2022
Plazmix API wrapper for Python

An optimised, easy to use Plazmix API wrapper written in Python

Someone 2 Nov 16, 2021
bot for hearthstone mercenaries

Hearthstone-Mercenaries-game-bot - prevention: Bot is not ready and now on the development stage estimated release date - 21.10.21 The main idea of th

Andrew Efimov 59 Dec 12, 2022
Quadrirrotor UFABC - ROS/Gazebo

QuadROS_UFABC - Repositório utilizado durante minha dissertação de mestrado para simular sistemas de controle e estimação para navegação de um quadrirrotor utilizando visão computacional.

Mateus Ribeiro 1 Dec 13, 2022
Prime Mega is a modular bot running on python3 with autobots theme and have a lot features.

PRIME MEGA Prime Mega is a modular bot running on python3 with autobots theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is Cre

『TØNIC』 乂 ₭ILLΣR 45 Dec 15, 2022
Spore Api

SporeApi Spore Api Simple example: import asyncio from spore_api.client import SporeClient async def main() - None: async with SporeClient() a

LEv145 16 Aug 02, 2022
Automatically load stolen cookies from ChromePass

AutoCookie - Automatically loading stolen cookies from ChromePass View Demo · Report Bug · Request Feature Table of Contents About the Project Getting

darkArp 21 Oct 11, 2022
Telegram Music Bot for YouTube/SoundCloud/Mixcloud

Telegram Music Bot Telegram Music Bot for YouTube/SoundCloud/Mixcloud This bot downloads and sends the audio when someone send a YouTube/SoundCloud/Mi

Calls Music 76 Jan 02, 2023
Bot para automatizacao de registros no Vacivida para o COVID19

VACIBOT v.06 - Bot para automatizacao de registros no Vacivida para o COVID19 by Victor Fragoso - Prefeitura Municipal de Santo André Email:

Prefeitura de Santo André 22 Sep 19, 2022
WebhookHub - A discord WebHook Manager with much more features coming soon

WebhookHub A discord WebHook Manager with much more features coming soon This is

5 Feb 19, 2022
Cord Python API Client

Cord Python API Client The data programming platform for AI 💻 Features Minimal low-level Python client that allows you to interact with Cord's API Su

Cord 52 Nov 25, 2022
WhatsApp Multi Device Client

WhatsApp Multi Device Client

23 Nov 18, 2022
Discord Remote Administration Tool

Discord Remote Administration Tool

Rdimo 82 Aug 15, 2022
Just a python library to make reddit post caching easier

Reddist Just a python library to make reddit post caching easier. Caching Options In Memory Caching Redis Caching Pickle Caching Usage Installation: D

Samrid Pandit 3 Jan 16, 2022