Implementation of ProteinBERT in Pytorch

Last update: Dec 25, 2022

Overview

ProteinBERT - Pytorch (wip)

Implementation of ProteinBERT in Pytorch.

Install

$ pip install protein-bert-pytorch

Usage

import torch
from protein_bert_pytorch import ProteinBERT

model = ProteinBERT(
    num_tokens = 21,
    num_annotation = 8943,
    dim = 512,
    dim_global = 256,
    depth = 6,
    narrow_conv_kernel = 9,
    wide_conv_kernel = 9,
    wide_conv_dilation = 5,
    attn_heads = 8,
    attn_dim_head = 64
)

seq = torch.randint(0, 21, (2, 2048))
mask = torch.ones(2, 2048).bool()
annotation = torch.randint(0, 1, (2, 8943)).float()

seq_logits, annotation_logits = model(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

Citations

@article {Brandes2021.05.24.445464,
    author      = {Brandes, Nadav and Ofer, Dan and Peleg, Yam and Rappoport, Nadav and Linial, Michal},
    title       = {ProteinBERT: A universal deep-learning model of protein sequence and function},
    year        = {2021},
    doi         = {10.1101/2021.05.24.445464},
    publisher   = {Cold Spring Harbor Laboratory},
    URL         = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464},
    eprint      = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464.full.pdf},
    journal     = {bioRxiv}
}

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

43 Dec 28, 2022

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

142 Jan 6, 2023

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

605 Jan 2, 2023

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

286 Jan 2, 2023

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

201 Nov 9, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

775 Jan 8, 2023

PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

2.7k Dec 27, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

204 Jul 14, 2022

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

When

seq        = torch.randint(0, 21, (2, 2048)).cuda()
annotation = torch.randint(0, 1, (2, 8943)).float().cuda()
mask       = torch.ones(2, 2048).bool().cuda()

learner.cuda()

loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

OUTPUT

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-60892e498570> in <module>
      4 learner.cuda()
      5 
----> 6 loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

~/data/.conda/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/mnt/5280b/wwang/proteinbert/protein_bert_pytorch.py in forward(self, seq, annotation, mask)
    365 
    366         for token_id in self.exclude_token_ids:
--> 367             random_replace_token_prob_mask = random_replace_token_prob_mask & (random_tokens != token_id)  # make sure you never substitute a token with an excluded token type (pad, start, end)
    368 
    369         # noise sequence

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

opened by wilmerwang 0

How to use this bert version to use the pretrianed model?

Hi guys, thanks for great work. I'm trying to use this pytorch version protein-bert to use the pre-trained model 'ftp://ftp.cs.huji.ac.il/users/nadavb/protein_bert/epoch_92400_sample_23500000.pkl', but have no clues at all. Could you please give some suggestions? Thank you so much!

opened by Y-H-Joe 1

Implementation of ProteinBERT in Pytorch

Related tags

Overview

ProteinBERT - Pytorch (wip)

Install

Usage

Citations

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

PyTorch original implementation of Cross-lingual Language Model Pretraining.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

How to use this bert version to use the pretrianed model?

Releases(0.1.0)

0.1.0(Aug 10, 2021)

0.0.11(Aug 6, 2021)

0.0.10(Jun 11, 2021)

0.0.9(Jun 11, 2021)

0.0.8(Jun 11, 2021)

0.0.7(Jun 10, 2021)

0.0.6(May 29, 2021)

0.0.5(May 28, 2021)

0.0.4(May 28, 2021)

0.0.3a(May 28, 2021)

0.0.2(May 28, 2021)

0.0.1(May 28, 2021)

Owner

Phil Wang

A paper list of pre-trained language models (PLMs).

Mkdocs + material + cool stuff

Perform sentiment analysis and keyword extraction on Craigslist listings

Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

An easier way to build neural search on the cloud

Reformer, the efficient Transformer, in Pytorch

Yet Another Compiler Visualizer

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Various Algorithms for Short Text Mining

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Every Google, Azure & IBM text to speech voice for free

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Outreachy TFX custom component project

Natural language Understanding Toolkit

Pytorch NLP library based on FastAI