Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Overview

Modern Data Lake Storage Layers

This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache Iceberg, and Delta Lake.

Specifically, there's a CloudFormation template to create an EMR cluster and EMR Studio with the necessary requirements and Jupyter notebooks with the example walkthroughs.

You can view the corresponding blog post and video

Pre-requisites

You'll need an AWS Account in which you have administrator privileges and the ability to deploy a CloudFormation template. The template will create an EMR Cluster and S3 bucket that will incur charges - be sure to either shut down the cluster when done or delete the CloudFormation stack. In order to delete the CloudFormation stack, you'll need to:

  • Manually delete any EMR Studio Workspaces you created
  • Manually empty the S3 bucket created by CloudFormation
  • Manually delete the VPC created by CloudFormation due to auto-created rules

Overview

The included CloudFormation template creates a new VPC and EMR Cluster for you to be able to run the notebooks. An EMR Studio is also created and you can find the Studio URL in the Outputs tab of your CloudFormation Stack.

Once the stack is done creating, you'll need to navigate to EMR Studio and create a new workspace attached to the "data-lakes" cluster.

Inside the workspace you either upload each notebook individually from the notebooks/ folder or simply connect to this repository by using the "Git" icon on the left-hand side.

Rio Userbot Adalah Bot Untuk Membantu Mempermudahkan Sesuatu Di Telegram, Last Repository With Pytgcalls v0.8.3

RIO - USERBOT Disclaimer Saya tidak bertanggung jawab atas penyalahgunaan bot ini. Bot ini dimaksudkan untuk bersenang-senang sekaligus membantu Anda

RioProjectX 4 Oct 18, 2022
Basic query to access Traindex API

traindex-api-query Basic query to access Traindex API Please make sure to provide your Traindex API key to the line 8 in the script file. There are tw

Foretheta 1 Nov 11, 2021
Asynchronous Python Wrapper for the Ufile API

Ufile.io Asynchronous Python Wrapper for the Ufile API (Unofficial).

Gautam Kumar 16 Aug 31, 2022
CryptoBar - A simple MenuBar app that shows the price of 3 cryptocurrencies

CryptoBar A very simple MenuBar app that shows the price of the following crypto

4 Jul 04, 2022
Simple Telegram AI Chat bot made using OpenAI and Luna API

Yui Yui, is a simple telegram chat bot made using OpenAI and Luna Chat bot Deployment 👀 Deploying is easy 🤫 ! You can deploy this bot in Heroku or i

I'm Not A Bot #Left_TG 21 Dec 29, 2022
A Discord token grabber executing in a Microsoft Document.

🦊 Rage 🦊 Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi

Billy 73 Nov 03, 2022
An open source raffle bot made to increase the chance of winning limited sneaker raffles by automating entries.

🚀 SyneziaRaffles An open source raffle bot made to increase the chance of winning limited sneaker raffles by automating entries. 🏄‍♂️ Quick Start Pr

Alexis M. 29 Dec 22, 2022
Another secured and Yet Fastest telegram userbot

Vision-UserBot A stable, simple Telegram UserBot in Pyrogram! Support Variables ➨ TG_APP_ID - Your Telegram Api id. ➨ TG_API_HASH - Your Telegram Api

TeamVision 40 Oct 24, 2022
Telegram bot to extract text from image

OCR Bot @Image_To_Text_OCR_Bot A star ⭐ from you means a lot to us! Telegram bot to extract text from image Usage Deploy to Heroku Tap on above button

Stark Bots 25 Nov 24, 2022
An open source API to validate the EU Covid Certificates / Green Certificates

Open Covid Certificate Validator This an open source API to validate EU Digital COVID Certificates. It receives a COVID certificate and validates it u

Merlin Schumacher 47 May 30, 2022
PS4RemotePKGSender - Use with Remote PKG Installer

PS4_Remote_PKG_Sender Used with the remote PKG installer on PS4 Thanks to the au

Teri 4 Sep 23, 2022
Crystal Orb is a discord bot made from discord.py and python

Crystal orb Replacing barbot Overview Crystal Orb is a discord bot made from discord.py and python, Crystal Orb is for anti alt detection and other st

AlexyDaCoder 3 Nov 28, 2021
Get random jokes bapack2 from jokes-bapack2-api

Random Jokes Bapack2 Get random jokes bapack2 from jokes-bapack2-api Requirements Python Requests HTTP library How to Run py random-jokes-bapack2.py T

Miftah Afina 1 Nov 18, 2021
Discord Token Creator 🥵

Discord Token Creator 🥵

dropout 304 Jan 03, 2023
This repo provides the source code for "Cross-Domain Adaptive Teacher for Object Detection".

Cross-Domain Adaptive Teacher for Object Detection This is the PyTorch implementation of our paper: Cross-Domain Adaptive Teacher for Object Detection

Meta Research 91 Dec 12, 2022
Google Sheets Python API

Google Spreadsheets Python API v4 Simple interface for working with Google Sheets. Features: Open a spreadsheet by title, key or url. Read, write, and

Anton Burnashev 6.2k Dec 30, 2022
😈 Discord RAGE is a Python tool that allows you to automatically spam messages in Discord

😈 Discord RAGE Python tool that allows you to automatically spam messages in Discord 🏹 Setup Make sure you have Python installed and PIP is added to

Alphalius 4 Jun 12, 2022
Disctopia-c2 - Windows Backdoor that is controlled through Discord

Disctopia Disctopia Command and Control What is Disctopia? Disctopia is an open

Dimitris Kalopisis 218 Dec 26, 2022
ANKIT-OS/STYLISH-TEXT is a special repository. Its Is A Telegram Bot Which Can Translate Your Text Into 100+ Language

🔥 ᴳᴼᴼᴳᴸᴱ⁻ᵀᴿᴬᴺᔆᴸᴬᵀᴱᴿ 🔥 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🛠️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs 🔰 • If

ANKIT KUMAR 1 Dec 23, 2021
This app is providing you to track some online products' prices via GMAIL.

Price Tracking App variables and descriptions of that code is in Turkish language. but we're working on translate them into English. This app is provi

Abdullah Aslan 1 Dec 11, 2021