Flexible HDF5 saving/loading and other data science tools from the University of Chicago

Last update: Dec 10, 2022

Overview

https://travis-ci.org/uchicago-cs/deepdish.svg?branch=master

https://img.shields.io/badge/license-BSD%203--Clause-blue.svg?style=flat

deepdish

Flexible HDF5 saving/loading and other data science tools from the University of Chicago. This repository also host a Deep Learning blog:

http://deepdish.io

Installation

pip install deepdish

Alternatively (if you have conda with the conda-forge channel):

conda install -c conda-forge deepdish

Main feature

The primary feature of deepdish is its ability to save and load all kinds of data as HDF5. It can save any Python data structure, offering the same ease of use as pickling or numpy.save. However, it improves by also offering:

Interoperability between languages (HDF5 is a popular standard)
Easy to inspect the content from the command line (using h5ls or our specialized tool ddls)
Highly compressed storage (thanks to a PyTables backend)
Native support for scipy sparse matrices and pandas DataFrame, Series and Panel
Ability to partially read files, even slices of arrays

An example:

import deepdish as dd

d = {
    'foo': np.ones((10, 20)),
    'sub': {
        'bar': 'a string',
        'baz': 1.23,
    },
}
dd.io.save('test.h5', d)

This can be reconstructed using dd.io.load('test.h5'), or inspected through the command line using either a standard tool:

$ h5ls test.h5
foo                      Dataset {10, 20}
sub                      Group

Or, better yet, our custom tool ddls (or python -m deepdish.io.ls):

$ ddls test.h5
/foo                       array (10, 20) [float64]
/sub                       dict
/sub/bar                   'a string' (8) [unicode]
/sub/baz                   1.23 [float64]

Documentation

http://deepdish.readthedocs.io/

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

Related tags

Overview

deepdish

Installation

Main feature

Documentation

Owner

UChicago - Department of Computer Science

Implementation in Python of the reliability measures such as Omega.

Tools for the analysis, simulation, and presentation of Lorentz TEM data.

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Toolchest provides APIs for scientific and bioinformatic data analysis.

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

Candlestick Pattern Recognition with Python and TA-Lib

Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Statsmodels: statistical modeling and econometrics in Python

Create HTML profiling reports from pandas DataFrame objects

Example Of Splunk Search Query With Python And Splunk Python SDK

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Python utility to extract differences between two pandas dataframes.

DefAP is a program developed to facilitate the exploration of a material's defect chemistry

CPSPEC is an astrophysical data reduction software for timing

PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

Ejercicios Panda usando Pandas

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

The OHSDI OMOP Common Data Model allows for the systematic analysis of healthcare observational databases.

Validation and inference over LinkML instance data using souffle