Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Kakatua discord music bot

Donate Ayo donasi! Lokal Internasional Ucapan Terima Kasih Tentu saja, donatur Bunga dan talent-talent h!mawari. Semoga rezeki teman-teman semakin lan

1 Oct 30, 2021
use python script to fix vmp dump api in ida

FixVmpDump use python script to fix vmp dump api in ida. support x86 and x64. details in my blog: https://blog.csdn.net/yan_star/article/details/11279

97 Nov 02, 2022
Telegram Vc Video Player Bot

Telegram Video Player Bot Telegram bot project for streaming video on telegram video chat, powered by tgcalls and pyrogram Deploy to Heroku πŸ‘¨β€πŸ”§ The

Dihan Official 11 Dec 25, 2022
API para realizar parser de frases

NLP API Simple api to parse and apply some preprocessing steps in portuguses phrases (pt_BR) This api uses the great FastAPI and spaCy packages! Usage

⟠ Rodolfo De Nadai 1 Dec 28, 2021
A decentralized messaging daemon built on top of the Kademlia routing protocol.

parakeet-message A decentralized messaging daemon built on top of the Kademlia routing protocol. Now that you are done laughing... pictures what is it

Jonathan Abbott 3 Apr 23, 2022
This tool is created by Shahzain and is one of the best self bots out there!

Shahzain SelfBot This tool is created by Shahzain and is one of the best self bots out there! Features Token Destroyer! Server Nuker(50-100 Bans Per S

Shahzain 6 Apr 02, 2022
Simple-nft-tutorial - A simple tutorial on making nft/memecoins on algorand

nft/memecoin Tutorial on Algorand Let's make a simple NFT/memecoin on the Algora

2 Feb 05, 2022
a discord bot coded in Python which shows news based on the term searched by the user

Noah Miller v1.0 a discord bot coded in Python which shows news based on the term searched by the user Add the bot to your server About This is a disc

klevr 3 Nov 08, 2021
The world's first public V2ray manager Telegram bot

πŸ“Œ DarkV2ray-Manager-Bot 0.1 UPDATE 11/11/2021 Telegram bot v2ray Test user expired date data limit paylode && sni usage user on/off heroku bot hostin

@Dk_king_offcial 1 Nov 11, 2021
C Y B Ξ R UserBot is a project that simplifies the use of Telegram. All rights reserved.

C Y B Ξ R USΞRBOT πŸ‡¦πŸ‡Ώ C Y B Ξ R UserBot is a project that simplifies the use of Telegram. All rights reserved. Automatic Setup Android: open Termux p

C Y B Ξ R 0 Sep 20, 2022
Community-based extensions for the python-telegram-bot library.

Community-based extensions for the python-telegram-bot library. Table of contents Introduction Installing Getting help Contributing License Introducti

74 Dec 24, 2022
Get notifications in your Discord server of any software releases from Apple.

Apple Releases Get notifications in your Discord server of any software releases from Apple. Running To locally host your own instance, create a Disco

adam 17 Oct 22, 2022
A python library created to make life easier for Telegram API Developers.

opentele A python library created to make life easier for Telegram API Developers. Read the documentation Features Convert Telegram Desktop tdata sess

103 Jan 02, 2023
This repository provides a set functions to extract paragraphs from AWS Textract responses.

extract-paragraphs-with-aws-textract Since AWS Textract (the AWS OCR service) does not have a native function to extract paragraphs, this repository p

Juan Anzola 3 Jan 26, 2022
Recommended AWS CDK project structure for Python applications

Recommended AWS CDK project structure for Python applications The project implements a user management backend component that uses Amazon API Gateway,

AWS Samples 110 Jan 06, 2023
Want to play What Would Rather on your Server? Invite the bot now!😏

What is this Bot? πŸ‘€ What You Would Rather? is a Guessing game where you guess one thing. Long Description short Take this example: You typed r!rather

δΈ‚γ„šδΉˆδΉ™γƒ„ 2 Nov 17, 2021
TG-Url-Uploader-Bot - Telegram RoBot to Upload Links

MW-URL-Uploader Bot Telegram RoBot to Upload Links. Features: πŸ‘‰ Only Auth Users

Aadhi 3 Jun 27, 2022
MusicBot is the original Discord music bot written for Python 3.5+, using the discord.py library

The original MusicBot for Discord (formerly SexualRhinoceros/MusicBot)

Just Some Bots 2.9k Jan 02, 2023
Petpy is an easy-to-use and convenient Python wrapper for the Petfinder API.

Petpy is an easy-to-use and convenient Python wrapper for the Petfinder API. Includes methods for parsing output JSON into pandas DataFrames for easier data analysis

Aaron Schlegel 27 Nov 19, 2022
πŸ“¦ Opensource Python wrapper for Hiven's REST and WebSocket API

hiven.py πŸ“¦ Opensource Python wrapper for Hiven's REST and WebSocket API Installation pip install -U hiven.py Usage hiven.py is currently under devel

Kevin Thomas 3 Sep 03, 2021