A workshop on data visualization in Python with notebooks and exercises for following along.

Overview

Beyond the Basics: Data Visualization in Python

Binder Nbviewer View slides in browser

The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.

While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.

Workshop Outline

This is a workshop on data visualization in Python first delivered at ODSC West 2021 and subsequently at ODSC East 2022 and PyCon Italia 2022. It's divided into the following sections:

Section 1: Getting Started With Matplotlib

We will begin by familiarizing ourselves with Matplotlib. Moving beyond the default options, we will explore how to customize various aspects of our visualizations. By the end of this section, you will be able to generate plots using the Matplotlib API directly, as well as customize the plots that libraries like pandas and Seaborn create for you.

Section 2: Moving Beyond Static Visualizations

Static visualizations are limited in how much information they can show. To move beyond these limitations, we can create animated and/or interactive visualizations. Animations make it possible for our visualizations to tell a story through movement of the plot components (e.g., bars, points, lines). Interactivity makes it possible to explore the data visually by hiding and displaying information based on user interest. In this section, we will focus on creating animated visualizations using Matplotlib before moving on to create interactive visualizations in the next section.

Section 3: Building Interactive Visualizations for Data Exploration

When exploring our data, interactive visualizations can provide the most value. Without having to create multiple iterations of the same plot, we can use mouse actions (e.g., click, hover, zoom, etc.) to explore different aspects and subsets of the data. In this section, we will learn how to use a few of the libraries in the HoloViz ecosystem to create interactive visualizations for exploring our data utilizing the Bokeh backend.


Prerequisites

You should have basic knowledge of Python and be comfortable working in Jupyter Notebooks. Check out this notebook for a crash course in Python or work through the official Python tutorial for a more formal introduction. The environment we will use for this workshop comes with JupyterLab, which is pretty intuitive, but be sure to familiarize yourself using notebooks in JupyterLab and additional functionality in JupyterLab. In addition, a basic understanding of pandas will be beneficial, but is not required; reviewing the first section of my pandas workshop will be sufficient.


Setup Instructions

  1. Install Anaconda/Miniconda. Note that you can use this Binder environment instead if you don't want to install anything on your machine.

  2. Fork this repository:

    location of fork button in GitHub

  3. Clone your forked repository:

    location of clone button in GitHub

  4. Create and activate a conda virtual environment (on Windows, these commands should be run in Anaconda Prompt):

    $ cd python-data-viz-workshop
    ~/python-data-viz-workshop$ conda install mamba -n base -c conda-forge
    ~/python-data-viz-workshop$ mamba env create --file environment.yml
    ~/python-data-viz-workshop$ conda activate data_viz_workshop
    (data_viz_workshop) ~/python-data-viz-workshop$
  5. Launch JupyterLab:

    (data_viz_workshop) ~/python-data-viz-workshop$ jupyter lab
  6. Navigate to the 0-check_your_env.ipynb notebook in the notebooks/ folder:

    open 0-check_your_env.ipynb

  7. Run the notebook to confirm everything is set up properly:

    check env


About the Author

Stefanie Molin (@stefmolin) is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of Hands-On Data Analysis with Pandas, which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Related Content

All examples herein were developed exclusively for this workshop. Hands-On Data Analysis with Pandas contains additional examples and exercises, as does this blog post and this workshop on pandas.

Owner
Stefanie Molin
Developer | Data Scientist | Author of "Hands-On Data Analysis with Pandas" | occasional hacker
Stefanie Molin
This is a learning tool and exploration app made using the Dash interactive Python framework developed by Plotly

Support Vector Machine (SVM) Explorer This app has been moved here. This repo is likely outdated and will not be updated. This is a learning tool and

Plotly 150 Nov 03, 2022
Certificate generating and sending system written in Python.

Certificate Generator & Sender How to use git clone https://github.com/saadhaxxan/Certificate-Generator-Sender.git cd Certificate-Generator-Sender Add

Saad Hassan 11 Dec 01, 2022
Lime: Explaining the predictions of any machine learning classifier

lime This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predict

Marco Tulio Correia Ribeiro 10.3k Dec 29, 2022
Scientific measurement library for instruments, experiments, and live-plotting

PyMeasure scientific package PyMeasure makes scientific measurements easy to set up and run. The package contains a repository of instrument classes a

PyMeasure 445 Jan 04, 2023
The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Computational Intelligence Group 125 Dec 24, 2022
A Jupyter - Leaflet.js bridge

ipyleaflet A Jupyter / Leaflet bridge enabling interactive maps in the Jupyter notebook. Usage Selecting a basemap for a leaflet map: Loading a geojso

Jupyter Widgets 1.3k Dec 27, 2022
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews

hvPlot A high-level plotting API for the PyData ecosystem built on HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it?

HoloViz 694 Jan 04, 2023
Frbmclust - Clusterize FRB profiles using hierarchical clustering, plot corresponding parameters distributions

frbmclust Getting Started Clusterize FRB profiles using hierarchical clustering,

3 May 06, 2022
Data visualization using matplotlib

Data visualization using matplotlib project instructions Top 5 Most Common Coffee Origins In this visualization I used data from Ankur Chavda on Kaggl

13 Oct 27, 2021
1900-2016 Olympic Data Analysis in Python by plotting different graphs

🔥 Olympics Data Analysis 🔥 In Data Science field, there is a big topic before creating a model for future prediction is Data Analysis. We can find o

Sayan Roy 1 Feb 06, 2022
Python script to generate a visualization of various sorting algorithms, image or video.

sorting_algo_visualizer Python script to generate a visualization of various sorting algorithms, image or video.

146 Nov 12, 2022
This GitHub Repository contains Data Analysis projects that I have completed so far! While most of th project are focused on Data Analysis, some of them are also put here to show off other skills that I have learned.

Welcome to my Data Analysis projects page! This GitHub Repository contains Data Analysis projects that I have completed so far! While most of th proje

Kyle Dini 1 Jan 31, 2022
A python script editor for napari based on PyQode.

napari-script-editor A python script editor for napari based on PyQode. This napari plugin was generated with Cookiecutter using with @napari's cookie

Robert Haase 9 Sep 20, 2022
Focus on Algorithm Design, Not on Data Wrangling

The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.

Zensors 37 Nov 25, 2022
Multi-class confusion matrix library in Python

Table of contents Overview Installation Usage Document Try PyCM in Your Browser Issues & Bug Reports Todo Outputs Dependencies Contribution References

Sepand Haghighi 1.3k Dec 31, 2022
Political elections, appointment, analysis and visualization in Python

Political elections, appointment, analysis and visualization in Python poli-sci-kit is a Python package for political science appointment and election

Andrew Tavis McAllister 9 Dec 01, 2022
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.3k Dec 31, 2022
A simple, fast, extensible python library for data validation.

Validr A simple, fast, extensible python library for data validation. Simple and readable schema 10X faster than jsonschema, 40X faster than schematic

kk 209 Sep 19, 2022
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

PyVista 1.6k Jan 08, 2023
Joyplots in Python with matplotlib & pandas :chart_with_upwards_trend:

JoyPy JoyPy is a one-function Python package based on matplotlib + pandas with a single purpose: drawing joyplots (a.k.a. ridgeline plots). The code f

Leonardo Taccari 462 Jan 02, 2023