Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Практическая работа 6 - Документирование кода

Практическая работа №6 ПСП – правильная скобочная последовательность – последовательность из открывающих «(« и закрывающих «)» круглых скобок. Програм

0 Apr 14, 2022
Leakvertise is a Python open-source project which aims to bypass these fucking annoying captchas and ads from linkvertise, easily

Leakvertise Leakvertise is a Python open-source project which aims to bypass these fucking annoying captchas and ads from linkvertise, easily. You can

Quatrecentquatre 9 Oct 06, 2022
An API that uses NLP and AI to let you predict possible diseases and symptoms based on a prompt of what you're feeling.

Disease detection API for MediSearch An API that uses NLP and AI to let you predict possible diseases and symptoms based on a prompt of what you're fe

Sebastian Ponce 1 Jan 15, 2022
Experiment to find the best time to look for an appointment at the Berlin Bürgeramt

Bürgeramt appointment experiment Checks Berlin.de for free Anmeldung appointments every X minutes, then analyses the results. How to use Run get-page.

Nicolas Bouliane 42 Jan 02, 2023
My attempt at weaponizing Discord.

MayorbotC2 This is my Discord C2 bot. There are many like it, but this one is mine. MayorbotC2 is a project I absolutely forgot about until I was pilf

Joe Helle 19 May 16, 2022
A Python library for the Buildkite API

PyBuildkite A Python library and client for the Buildkite API. Usage To get the package, execute: pip install pybuildkite Then set up an instance of

Peter Yasi 29 Nov 30, 2022
Asynchronous Python Wrapper for the GoFile API

Asynchronous Python Wrapper for the GoFile API

Gautam Kumar 22 Aug 04, 2022
A telegram to pyrogram json bot

Pyrogram-Json-Bot A telegram to pyrogram json bot Please fork this repository don't import code Made with Python3 (C) @FayasNoushad Copyright permissi

Fayas Noushad 11 Dec 20, 2022
Coinbase Listing Sniper

Coinbase Listing Sniper Script that listens to the @CoinbaseAssets twitter to find information about new Coinbase listings, and automatically buys 100

4 Oct 26, 2022
Обертка для мини-игры "рабы" на python

Slaves API Библиотека для игры Рабы на Python. Большая просьба Поставьте звездочку на репозиторий. Это много для меня значит. Версии Т.к. разработчики

Zdorov Philipp 13 Mar 31, 2021
Roblox-Account-Gen - A simple account generator not using paid solving services

Roblox Account Generator Star this if it helped to spread awareness! No 2captcha

x 1 Feb 17, 2022
Ghost-toolbox - Ghost's official Discord raid tool

Ghost Toolbox Ghost's official Discord raid tool. How to use Clone this repo int

G H Ø S T 10 Oct 31, 2022
Mazda Connected Service API wrapper based on pymazda and Flask.

Mazda Connected Service Relay Mazda Connected Service API wrapper based on pymazda and Flask. Usage Make POST calls to https://mymazda.herokuapp.com/{

Alan Chen 10 Jan 05, 2023
Asca - Antiscam Discord Bot With Python

asca Antiscam Discord Bot Asca moderates scammers and deletes scam messages Opti

11 Nov 01, 2022
A Python interface to AFL, allowing for easy injection of testcases and other functionality.

Fuzzer This module provides a Python wrapper for interacting with AFL (American Fuzzy Lop: http://lcamtuf.coredump.cx/afl/). It supports starting an A

Shellphish 614 Dec 26, 2022
A ideia é fornecer uma base ampla de questões do ENEM como uma api REST

base10 "A ideia é fornecer uma base ampla de questões do ENEM como uma api REST" TODO Documentar a api com apifairy Criar testes Criar crawler para si

Wadson Garbes 4 Apr 24, 2022
𝗖𝝠𝝦𝝩𝝠𝝞𝝥 𝝦𝗥𝝞𝗖𝝽°™️ 🇱🇰 Is An All In One Media Inline Bot Made For Inline Your Media Effectively With Some Advance Security Tools♥️

𝗖𝝠𝝦𝝩𝝠𝝞𝝥 𝝦𝗥𝝞𝗖𝝽° ™️ 🇱🇰 𝗙𝗘𝝠𝝩𝗨𝗥𝗘𝗦 Auto Filter IMDB Admin Commands Broadcast Index IMDB Search Inline Search Random Pics Ids & User I

Kɪꜱᴀʀᴀ Pᴇꜱᴀɴᴊɪᴛʜ Pᴇʀᴇʀᴀ 〄 13 Jun 21, 2022
Ig-Crackv2 - Crack Instagram Version 2.9

★★ Information ★★ ★★Menu Special Crack Melalui Pengikut Crack Melalui Mengikuti

Risky [ Zero Tow ] 11 Aug 30, 2022
Tglogging - A python package to send your app logs to a telegram chat in realtime

Telegram Logger A simple python package to send your app logs to a telegram chat

SUBIN 60 Dec 27, 2022
Easy to use phishing tool with 63 website templates. Author is not responsible for any misuse.

PyPhisher [+] Created By KasRoudra [+] Description : Ultimate phishing tool in python. Includes popular websites like facebook, twitter, instagram, gi

KasRoudra 1.1k Jan 01, 2023