Free Data Engineering course!

Overview

Data Engineering Zoomcamp

Syllabus

Taking the course

Self-paced mode

All the materials of the course are freely available, so you can take the course at your own pace

  • Follow the suggested syllabus (see below) week by week
  • You don't need to fill in the registration form. Just start watching the videos and join Slack
  • Check FAQ if you have problems
  • If you can't find a solution to your problem in FAQ, ask for help in Slack

2022 Cohort

Asking for help in Slack

The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering channel.

To make discussions in Slack more organized:

Syllabus

Week 1: Introduction & Prerequisites

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

Week 2: Data ingestion

  • Data Lake
  • Workflow orchestration
  • Setting up Airflow locally
  • Ingesting data to GCP with Airflow
  • Ingesting data to local Postgres with Airflow
  • Moving data from AWS to GCP (Transfer service)
  • Homework

More details

Week 3: Data Warehouse

  • Data Warehouse
  • BigQuery
  • Partitoning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • Integrating BigQuery with Airflow
  • BigQuery Machine Learning

More details

Week 4: Analytics engineering

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualising the data with google data studio and metabase

More details

Week 5: Batch processing

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

Week 6: Streaming

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

Week 7, 8 & 9: Project

Putting everything we learned to practice

  • Week 7 and 8: working on your own project
  • Week 9: reviewing your peers

More details

Overview

Architecture diagram

Technologies

  • Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
    • Google Cloud Storage (GCS): Data Lake
    • BigQuery: Data Warehouse
  • Terraform: Infrastructure-as-Code (IaC)
  • Docker: Containerization
  • SQL: Data Analysis & Exploration
  • Airflow: Pipeline Orchestration
  • dbt: Data Transformation
  • Spark: Distributed Processing
  • Kafka: Streaming

Prerequisites

To get most out of this course, you should feel comfortable with coding and command line, and know the basics of SQL. Prior experience with Python will be helpful, but you can pick Python relatively fast if you have experience with other programming languages.

Prior experience with data engineering is not required.

Instructors

Tools

For this course you'll need to have the following software installed on your computer:

  • Docker and Docker-Compose
  • Python 3 (e.g. via Anaconda)
  • Google Cloud SDK
  • Terraform

See Week 1 for more details about installing these tools

FAQ

  • Q: I registered, but haven't received a confirmation email. Is it normal? A: Yes, it's normal. It's not automated. But you will receive an email eventually
  • Q: At what time of the day will it happen? A: Office hours will happen on Mondays at 17:00 CET. But everything will be recorded, so you can watch it whenever it's convenient for you
  • Q: Will there be a certificate? A: Yes, if you complete the project
  • Q: I'm 100% not sure I'll be able to attend. Can I still sign up? A: Yes, please do! You'll receive all the updates and then you can watch the course at your own pace.
  • Q: Do you plan to run a ML engineering course as well? A: Glad you asked. We do :)
  • Q: I'm stuck! I've got a technical question! A: Ask on Slack! And check out the student FAQ; many common issues have been answered already. If your issue is solved, please add how you solved it to the document. Thanks!

Our friends

Big thanks to other communities for helping us spread the word about the course:

Check them out - they are cool!

Owner
DataTalksClub
The place to talk about data
DataTalksClub
Find all solutions to SUBSET-SUM, including negative, positive, and repeating numbers

subsetsum The subsetsum Python module can enumerate all combinations within a list of integers which sums to a specific value. It works for both negat

Trevor Phillips 9 May 27, 2022
The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of the first few terms.

Blancmange-curve The blancmange curve can be visually built up out of triangle wave functions if the infinite sum is approximated by finite sums of th

Shankar Mahadevan L 1 Nov 30, 2021
A free and powerful system for awareness and research of the American judicial system.

CourtListener Started in 2009, CourtListener.com is the main initiative of Free Law Project. The goal of CourtListener.com is to provide high quality

Free Law Project 332 Dec 25, 2022
This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols.

Regex-Reverser This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols. Screenshots Usage/Examples py

x19 0 Jun 02, 2022
A lighweight screen color picker tool

tkpick A lighweigt screen color picker tool Availability Only GNU/Linux 🐧 Installing Install via pip (No auto-update): [sudo] pip install tkpick Usa

Adil Gürbüz 7 Aug 30, 2021
A general purpose low level programming language written in Python.

A general purpose low level programming language written in Python. Basal is an easy mid level programming language compiling to C. It has an easy syntax, similar to Python, Rust etc.

Snm Logic 6 Mar 30, 2022
This alerts you when the avalanche score a goal

This alerts you when the avalanche score a goal

Davis Burrill 1 Jan 15, 2022
ROS Foxy + Raspi + Adafruit BNO055

ROS Foxy + Raspi + Adafruit BNO055

Ar-Ray 3 Nov 04, 2022
Analyzes crypto candles over a set time period and then trades based on winning patterns found

patternstrade Analyzes crypto candles over a set time period and then trades based on winning patterns found. Heavily customizable. Warning: This was

ConnorCreate 14 May 29, 2022
Fried Chicken Programming Language

Fried-Chicken Fried Chicken Programming Language How To Run Once downloaded and opened, choose any file for code. Any file extensions work. Just make

Attachment Studios 9 Jul 11, 2022
Zeus is an open source flight intellingence tool which supports more than 13,000+ airlines and 250+ countries.

Zeus Zeus is an open source flight intellingence tool which supports more than 13,000+ airlines and 250+ countries. Any flight worldwide, at your fing

DeVickey 1 Oct 22, 2021
🚀 emojimash 🚀 is a programming language with ALL THE EMOJI

🚀 emojimash 🚀 is a programming language with ALL THE EMOJI

Python Whiz 256 1 Oct 26, 2021
DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Moroccan Web Portals

DownTime-Score DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Morocca

adnane-tebbaa 5 Apr 30, 2022
Easily Generate Revolut Business Cards

RevBusinessCardGen Easily Generate Revolut Business Cards Prerequisites Before you begin, ensure you have met the following requirements: You have ins

Younes™ 35 Dec 14, 2022
In this project we will be using OpenCV to virtually drag a rectangle and drop it at a different location. It will be further used for Virtual Mouse.

Virtual Drag & Drog using OpenCV In this project we will be using OpenCV to virtually drag a rectangle and drop it at a different location. It will be

Hassan Shahzad 5 Sep 27, 2021
Sigma coding youtube - This is a collection of all the code that can be found on my YouTube channel Sigma Coding.

Sigma Coding Tutorials & Resources YouTube • Facebook Support Sigma Coding Patreon • GitHub Sponsor • Shop Amazon Table of Contents Overview Topics Re

Alex Reed 927 Jan 08, 2023
A joke conlang with minimal semantics

SyntaxLang Reserved Defined Words Word Function fo Terminates a noun phrase or verb phrase tu Converts an adjective block or sentence to a noun to Ter

Leo Treloar 1 Dec 07, 2021
Ikaros is a free financial library built in pure python that can be used to get information for single stocks, generate signals and build prortfolios

Ikaros is a free financial library built in pure python that can be used to get information for single stocks, generate signals and build prortfolios

Salma Saidane 64 Sep 28, 2022
Fuzz introspector for python

Fuzz introspector High-level goals: Show fuzzing-relevant data about each function in a given project Show reachability of fuzzer(s) Integrate seamles

14 Mar 25, 2022
Repositório para estudo do airflow

airflow-101 Repositório para estudo do airflow Docker criado baseado no tutorial Exemplo de API da pokeapi Para executar clone o repo execute as confi

Gabriel (Gabu) Bellon 1 Nov 23, 2021