Python package for machine learning for healthcare using a OMOP common data model

Overview

omop-learn

What is omop-learn?

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database. omop-learn supports the easy definition of predictive clinical tasks, featurizations of OMOP data, and cohorts of relevance. We further provide methods using sparse tensor implementations to rapidly manipulate the collected features in the rawest form possible, allowing for dynamic transformations of the data.

Two machine-learning models are included with the library. First, a windowed linear model, which uses various backwards-facing windows to aggregate features over different timescales, then feeds these features into a regularized logistic regression model. This model was inspired by the work of Razavian et. al. '15, and despite its simplicity is often competitive with state-of-the-art algorithms. We also include SARD (Self-Attention with Reverse Distillation), a novel deep-learning algorithm that uses self-attention to allow medical events to contextualize themselves using other events in a patient's timeline. SARD also makes use of reverse distillation, a training technique we introduce that effectively initializes a deep model using a high-performing linear proxy, in this case the windowed linear model described above -- for the details of this method and the SARD architecture, please see our paper Kodialam et al. AAAI '21.

Documentation

For a more detailed summary of omop-learn's data collection pipeline, and for documentation of functions, please see the full documentation for this repo, which also describes the process of creating one's own cohorts, predictive tasks, and features.

Dependencies

The following libraries are necessary to run omop-learn:

  • numpy
  • sqlalchemy
  • pandas
  • torch
  • sklearn
  • matplotlib
  • ipywidgets
  • IPython.display
  • gensim.models
  • scipy.sparse
  • sparse

Note that sparse is the PyData Sparse library, documented here

Running omop-learn

We provide several example notebooks, which all use an example task of predicting mortality over a six-month window for patients over the age of 70.

  • End of Life Linear Model Example.ipynb and End of Life Deep Model Example.ipynb run the windowed linear and deep SARD models respectively -- note that your machine must be able to access a GPU in order to run the deep models.
  • End of Life Linear Model Example (With Nontemporal Features).ipynb demonstrates how to add nontemporal features.
  • End of Life Linear Model Ancestors Example.ipynb demonstrates how to add feature ancestors.
  • End of Life Linear Model Example More Prediction Times.ipynb uses a larger dataset with predictions from any date within a time range.

To run the models, first set up the file config.py with connection information for your Postgres server containing an OMOP CDM database. Then, simply run through the cells of the notebook in order. Further documentation of the exact steps taken to define a task, collect data, and run a predictive model are embedded within the notebooks.

Contributors and Acknowledgements

Omop-learn was written by Rohan Kodialam and Jake Marcus, with additional contributions by Rebecca Boiarsky, Ike Lage, and Shannon Hwang.

This package was developed as part of a collaboration with Independence Blue Cross and would not have been possible without the advice and support of Aaron Smith-McLallen, Ravi Chawla, Kyle Armstrong, Luogang Wei, and Jim Denyer.

Owner
Sontag Lab
Machine learning algorithms and applications to health care.
Sontag Lab
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
Credit Card Fraud Detection, used the credit card fraud dataset from Kaggle

Credit Card Fraud Detection, used the credit card fraud dataset from Kaggle

Sean Zahller 1 Feb 04, 2022
Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification Introduction. This package includes the pyth

5 Dec 06, 2022
Laporan Proyek Machine Learning - Azhar Rizki Zulma

Laporan Proyek Machine Learning - Azhar Rizki Zulma Project Overview Domain proyek yang dipilih dalam proyek machine learning ini adalah mengenai hibu

Azhar Rizki Zulma 6 Mar 12, 2022
Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft 366 Jan 03, 2023
Machine Learning Course with Python:

A Machine Learning Course with Python Table of Contents Download Free Deep Learning Resource Guide Slack Group Introduction Motivation Machine Learnin

Instill AI 6.9k Jan 03, 2023
💀mummify: a version control tool for machine learning

mummify is a version control tool for machine learning. It's simple, fast, and designed for model prototyping.

Max Humber 43 Jul 09, 2022
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Igor Ivanov 671 Dec 25, 2022
Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

Paweł Rościszewski 131 Dec 12, 2022
XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction

DeepMind 620 Dec 27, 2022
This is a curated list of medical data for machine learning

Medical Data for Machine Learning This is a curated list of medical data for machine learning. This list is provided for informational purposes only,

Andrew L. Beam 5.4k Dec 26, 2022
Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Felix Daudi 1 Jan 06, 2022
TIANCHI Purchase Redemption Forecast Challenge

TIANCHI Purchase Redemption Forecast Challenge

Haorui HE 4 Aug 26, 2022
Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022
Simple Machine Learning Tool Kit

Getting started smltk (Simple Machine Learning Tool Kit) package is implemented for helping your work during data preparation testing your model The g

Alessandra Bilardi 1 Dec 30, 2021
CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

ZhihuiYangCS 8 Jun 07, 2022
Responsible AI Workshop: a series of tutorials & walkthroughs to illustrate how put responsible AI into practice

Responsible AI Workshop Responsible innovation is top of mind. As such, the tech industry as well as a growing number of organizations of all kinds in

Microsoft 9 Sep 14, 2022
Tools for mathematical optimization region

Tools for mathematical optimization region

林景 15 Nov 30, 2022
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

Aurélien Geron 1.6k Jan 05, 2023
PySurvival is an open source python package for Survival Analysis modeling

PySurvival What is Pysurvival ? PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or p

Square 265 Dec 27, 2022