Python reader for Linked Data in HDF5 files

Last update: May 17, 2022

Related tags

Overview

`h5ld`: HDF5 Linked Data

Linked Data are becoming more popular for user-created metadata in HDF5 files. This Python package provides readers for the HDF5-based formats with such metadata . Entire linked data content is read in one operation and made available as an rdflib graph object.

Currently supported:

Allotrope Data Format (ADF)

Installation

pip install git+https://github.com/HDFGroup/h5ld@{LABEL}

where {LABEL} is either master or a tag label.

Requirements:

Python >= 3.7
h5py >= 3.3.0
rdflib >= 5.0.0

License

This software is open source. See this file for details.

Quick Start

This package can be used either as a command-line tool or programmatically. On the command-line, the package dumps the link data of an input HDF5 file into several popular RDF formats supported by the rdflib package. For example:

python -m h5ld -f json-ld -o output.json INPUT.h5

will dump the input file's RDF data to a file output.json in the JSON-LD format. Omitting an output file prints out the same content so it can be ingested by another command-line tool. Full description is available from:

python -m h5ld --help

There is also a programmatic interface for integration into Python applications. Each h5ld reader will provide the following methods and attributes:

File format name.

print(f"Input file format is: {reader.name}")

Short (usually an acronym) of the file format.

print(f"File format acronym: {reader.short_name}")

Check if the reader is the right choice for the input file.

with h5py.File("input.h5", mode="r") as f:
    if reader.verify_format(f):
        # Do something...
      else:
          print("Sorry but not the right h5ld reader.")

Check if there is linked data content in the input HDF5 file. Optionally, print an appropriate description of the data.
```
with h5py.File("input.h5", mode="r") as f:
    reader.check_ld(f, report=True)
```

Read linked data and export it to a destination in the requested RDF format.

with h5py.File("input.h5", mode="r") as f:
    reader(f).dump_ld("output.json", format="json-ld")

Read linked data and return either an rdflib.Graph or rdflib.ConjunctiveGraph object.

with h5py.File("input.h5", mode="r") as f:
    graph = reader(f).get_ld()

A Python dictionary with the reader's namespace prefixes and their IRIs.

with h5py.File("input.h5", mode="r") as f:
    rdr = reader(f)
    namespaces = rdr.namespaces

Python reader for Linked Data in HDF5 files

Related tags

Overview

`h5ld`: HDF5 Linked Data

Installation

License

Quick Start

Owner

The HDF Group

Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Vectorizers for a range of different data types

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

A forecasting system dedicated to smart city data

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Powerful, efficient particle trajectory analysis in scientific Python.

An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Aggregating gridded data (xarray) to polygons

MotorcycleParts DataAnalysis python

Random dataframe and database table generator

ETL flow framework based on Yaml configs in Python

Scraping and analysis of leetcode-compensations page.

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Single-Cell Analysis in Python. Scales to >1M cells.

AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

Show you how to integrate Zeppelin with Airflow

Python reader for Linked Data in HDF5 files

Related tags

Overview

h5ld: HDF5 Linked Data

Installation

License

Quick Start

Owner

The HDF Group

Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Vectorizers for a range of different data types

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

A forecasting system dedicated to smart city data

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Powerful, efficient particle trajectory analysis in scientific Python.

An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Aggregating gridded data (xarray) to polygons

MotorcycleParts DataAnalysis python

Random dataframe and database table generator

ETL flow framework based on Yaml configs in Python

Scraping and analysis of leetcode-compensations page.

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Single-Cell Analysis in Python. Scales to >1M cells.

AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

Show you how to integrate Zeppelin with Airflow

`h5ld`: HDF5 Linked Data