HiFi DeepVariant + WhatsHap workflow

Workflow steps

align HiFi reads to reference with pbmm2
call small variants with DeepVariant, using two-pass method (DeepVariant ➡️ WhatsHap phase ➡️ WhatsHap haplotag ➡️ DeepVariant)
phase small variants with WhatsHap
haplotag aligned BAMs with WhatsHap and merge

Directory structure within basedir

.
├── cluster_logs  # slurm stderr/stdout logs
├── reference
│   ├── reference.chr_lengths.txt  # cut -f1,2 reference.fasta > reference.chr_lengths.txt
│   ├── reference.fasta
│   └── reference.fasta.fai
├── samples
│   └── 
   
      # sample_id regex: r'[A-Za-z0-9_-]+'
│       ├── whatshap/  # phased small variants; merged haplotagged alignments
│       ├── logs/  # per-rule stdout/stderr logs
│       ├── aligned/  # intermediate
│       ├── deepvariant/  # intermediate
│       ├── deepvariant_intermediate/  # intermediate
│       └── whatshap_intermediate/  # intermediate
├── smrtcells
│   ├── done  # move folders from smrtcells/ready to smrtcells/done to prevent re-processing
│   └── ready
│       └── 
    
       # uBAMs or FASTQs per sample
│                        # filename regex: r'm\d{5}[Ue]?_\d{6}_\d{6}).(ccs|hifi_reads).bam' or r'm\d{5}[Ue]?_\d{6}_\d{6}).fastq.gz'
└── workflow  # clone of this repo

To run the pipeline

$ conda create \
    --channel bioconda \
    --channel conda-forge \
    --prefix ./conda_env \
    python=3 snakemake mamba lockfile

$ conda activate ./conda_env

$ sbatch workflow/run_snakemake.sh <sample_id>

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Related tags

Overview

HiFi DeepVariant + WhatsHap workflow

Workflow steps

Directory structure within basedir

To run the pipeline

Owner

William Rowell

Long text token classification using LongFormer

Fully featured implementation of Routing Transformer

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

A website which allows you to play with the GPT-2 transformer

Residual2Vec: Debiasing graph embedding using random graphs

Text to speech converter with GUI made in Python.

English loanwords in the world's languages

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

OpenAI CLIP text encoders for multiple languages!

Rhyme with AI

Installation, test and evaluation of Scribosermo speech-to-text engine

TFIDF-based QA system for AIO2 competition

Simple Annotated implementation of GPT-NeoX in PyTorch

ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

code for modular summarization work published in ACL2021 by Krishna et al