whylogs Workshop

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs - The open source standard for data logging (Don't forget to give it a star!)

Workshop

In this hands-on workshop, we’ll learn how to set up a system for monitoring your data pipelines, ensuring data quality and detecting changes in your data.

Without data monitoring, it’s impossible to guarantee to your stakeholders that the data that they are using for their analytics and machine learning use cases is trustworthy. By setting up a data observability system, you’ll be able to get visibility into the health of your data pipelines, thus building your customers’ trust in your work.

We’ll cover the following:

Introduction to data observability and monitoring
whylogs — the open source standard for data logging
How to monitor batch Python or Spark data pipelines with whylogs
How to monitor Kafka streaming pipelines with whylogs

By the end of this workshop, you’ll be able to set up such a system yourself.

Code

This repository contains files that are needed for the workshop:

ccloud_lib.py - file for connecting to confluent cloud
confluent_credentials.txt - template for configuration (put your credentials there - but don't commit them!)
producer.py - the code for putting events to Kafka
requirements.txt - all the dependencies for the workshop

Confluent cloud

For this workshop, you'll need

Account in Deepnote
Account in Confluent cloud (instructions)

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Related tags

Overview

whylogs Workshop

Workshop

Code

Confluent cloud

Owner

DataTalksClub

Intent parsing and slot filling in PyTorch with seq2seq + attention

Chinese Pre-Trained Language Models (CPM-LM) Version-I

Source code for the paper "TearingNet: Point Cloud Autoencoder to Learn Topology-Friendly Representations"

Espial is an engine for automated organization and discovery of personal knowledge

Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"

Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Repository for Project Insight: NLP as a Service

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

HAN2HAN : Hangul Font Generation

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

A demo for end-to-end English and Chinese text spotting using ABCNet.

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

End-2-end speech synthesis with recurrent neural networks

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Signature remover is a NLP based solution which removes email signatures from the rest of the text.

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

A desktop GUI providing an audio interface for GPT3.

ACL'22: Structured Pruning Learns Compact and Accurate Models

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.