To be a next-generation DL-based phenotype prediction from genome mutations.

Last update: Jan 11, 2022

Related tags

Overview

Sequence -----------+--> 3D_structure --> 3D_module --+                                      +--> ?
|                   |                                 |                                      +--> ?
|                   |                                 +--> Joint_module --> Hierarchical_CLF +--> ?
|                   |                                 |                                      +--> ?
+-> NLP_embeddings -+-------> Embedding_module -------+                                      +--> ?

ClynMut: Predicting the Clynical Relevance of Genome Mutations (wip)

To be a next-generation DL-based phenotype prediction from genome mutations. Will use sota NLP and structural techniques.

Planned modules will likely be:

3D learning module
NLP embeddings
Joint module + Hierarchical classification

The main idea is for the model to learn the prediction in an end-to-end fashion.

Install

$ pip install clynmut

Example Usage:

import torch
from clynmut import *

hier_graph = {"class": "all", 
              "children": [
                {"class": "effect_1", "children": [
                  {"class": "effect_12", "children": []},
                  {"class": "effect_13", "children": []}
                ]},
                {"class": "effect_2", "children": []},
                {"class": "effect_3", "children": []},
              ]}

model = MutPredict(
    seq_embedd_dim = 512,
    struct_embedd_dim = 256, 
    seq_reason_dim = 512, 
    struct_reason_dim = 256,
    hier_graph = hier_graph,
    dropout = 0.0,
    use_msa = False,
    device = None)

seqs = ["AFTQRWHDLKEIMNIDALTWER",
        "GHITSMNWILWVYGFLE"]

pred_dicts = model(seqs, pred_format="dict")

Important topics:

3D structure learning

There are a couple architectures that can be used here. I've been working on 2 of them, which are likely to be used here:

GVP
E(n)-gnn

Hierarchical classification

A simple custom helper class has been developed for it.

Testing

$ python setup.py test

Datasets:

This package will use the awesome work by Jonathan King at this repository.

To install

$ pip install git+https://github.com/jonathanking/sidechainnet.git

$ git clone https://github.com/jonathanking/sidechainnet.git
$ cd sidechainnet && pip install -e .

Citations:

@article{pejaver_urresti_lugo-martinez_pagel_lin_nam_mort_cooper_sebat_iakoucheva et al._2020,
    title={Inferring the molecular and phenotypic impact of amino acid variants with MutPred2},
    volume={11},
    DOI={10.1038/s41467-020-19669-x},
    number={1},
    journal={Nature Communications},
    author={Pejaver, Vikas and Urresti, Jorge and Lugo-Martinez, Jose and Pagel, Kymberleigh A. and Lin, Guan Ning and Nam, Hyun-Jun and Mort, Matthew and Cooper, David N. and Sebat, Jonathan and Iakoucheva, Lilia M. et al.},
    year={2020}

@article{rehmat_farooq_kumar_ul hussain_naveed_2020, 
    title={Predicting the pathogenicity of protein coding mutations using Natural Language Processing},
    DOI={10.1109/embc44109.2020.9175781},
    journal={2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)},
    author={Rehmat, Naeem and Farooq, Hammad and Kumar, Sanjay and ul Hussain, Sibt and Naveed, Hammad},
    year={2020}

@article{pagel_antaki_lian_mort_cooper_sebat_iakoucheva_mooney_radivojac_2019,
    title={Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome},
    volume={15},
    DOI={10.1371/journal.pcbi.1007112},
    number={6},
    journal={PLOS Computational Biology},
    author={Pagel, Kymberleigh A. and Antaki, Danny and Lian, AoJie and Mort, Matthew and Cooper, David N. and Sebat, Jonathan and Iakoucheva, Lilia M. and Mooney, Sean D. and Radivojac, Predrag},
    year={2019},
    pages={e1007112}

You might also like...

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

409 Oct 28, 2022

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

211 Dec 28, 2022

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

83 Jan 9, 2023

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

2.1k Feb 17, 2021

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

310 Feb 1, 2021

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

137 Feb 1, 2021

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

138 Dec 30, 2022

Python generation script for BitBirds

BitBirds generation script Intro This is published under MIT license, which means you can do whatever you want with it - entirely at your own risk. Pl

286 Dec 6, 2022

TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

6.5k Jan 8, 2023

Comments

TO DO LIST
[x] Add embeddings functionality

[ ] Add 3d structure module (likely-to-be GVP/... based)

[x] Add classifier

[x] Hierarchical classification helper based on differentiability

[x] End-to-end code

[ ] data collection

[ ] data formatting

[ ] Run featurization for all data points (esm1b + af2 structs)

[ ] Perform a sample training

[ ] Perform sample evaluation

[ ] Iterate - improve

[ ] ...

[ ] idk, will see as we go
opened by hypnopump 0

Releases(0.0.2)

0.0.2(Mar 16, 2021)
Adds main architecture

Adds main helper functions

After some tests and fixes, and trials, it should be ready.

Source code(tar.gz)
Source code(zip)
0.0.01(Mar 12, 2021)

Initial infra. Statement of purpose and wip
Source code(tar.gz)
Source code(zip)

To be a next-generation DL-based phenotype prediction from genome mutations.

Related tags

Overview

ClynMut: Predicting the Clynical Relevance of Genome Mutations (wip)

Install

Example Usage:

Important topics:

3D structure learning

Hierarchical classification

Testing

Datasets:

Citations:

You might also like...

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Python generation script for BitBirds

TTS is a library for advanced Text-to-Speech generation.

Comments

TO DO LIST

Releases(0.0.2)

0.0.2(Mar 16, 2021)

0.0.01(Mar 12, 2021)

Owner

Eric Alcaide

A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

Crowd sourced training data for Rasa NLU models

基于百度的语音识别，用python实现，pyaudio+pyqt

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Pytorch implementation of Tacotron

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

A simple word search made in python

Python implementation of TextRank for phrase extraction and summarization of text documents

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Grover is a model for Neural Fake News -- both generation and detectio

结巴中文分词

A minimal code for fairseq vq-wav2vec model inference.

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

A library for finding knowledge neurons in pretrained transformer models.

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. Unlike other versions of the model we use BERT for text encoder and SWIN transformer for image encoder.

A PyTorch-based model pruning toolkit for pre-trained language models

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.