A set of tools to analyse the output from TraDIS analyses

Overview

QuaTradis (Quadram TraDis)

A set of tools to analyse the output from TraDIS analyses

Contents

Introduction

The QuaTradis pipeline provides software utilities for the processing, mapping, and analysis of transposon insertion sequencing data. The pipeline was designed with the data from the TraDIS sequencing protocol in mind, but should work with a variety of transposon insertion sequencing protocols as long as they produce data in the expected format.

For more information on the TraDIS method, see http://bioinformatics.oxfordjournals.org/content/32/7/1109 and http://genome.cshlp.org/content/19/12/2308.

Installation

QuaTradis has the following dependencies:

Required dependencies

  • bwa
  • smalt
  • samtools
  • tabix

There are a number of ways to install QuaTradis and details are provided below. If you encounter an issue when installing QuaTradis please contact your local system administrator.

Bioconda

Install conda and enable the bioconda channel.

conda install -c bioconda quatradis=xxx

Docker

QuaTradis can be run in a Docker container. First install Docker, then pull the QuaTradis image from dockerhub:

docker pull quadraminstitute/quatradis

To use QuaTradis use a command like this (substituting in your directories), where your files are assumed to be stored in /home/ubuntu/data:

docker run --rm -it -v /home/ubuntu/data:/data quadraminstitute/quatradis bacteria_tradis -h

Running the tests

The test can be run with pytest from the tests directory. Alternatively you can use the make target from the top-level directory:

make test

Usage

QuaTradis provides functionality to:

  • detect TraDIS tags in a BAM file
  • add the tags to the reads
  • filter reads in a FastQ file containing a user defined tag
  • remove tags
  • map to a reference genome
  • create an insertion site plot file

The functions are available as standalone scripts or as perl modules.

Scripts

Executable scripts to carry out most of the listed functions are available in the bin:

  • check_tradis_tags - Prints 1 if tags are present in alignment file, prints 0 if not.
  • add_tradis_tags - Generates a BAM file with tags added to read strings.
  • filter_tradis_tags - Create a fastq file containing reads that match the supplied tag
  • remove_tradis_tags - Creates a fastq file containing reads with the supplied tag removed from the sequences
  • tradis_plot - Creates an gzipped insertion site plot
  • bacteria_tradis - Runs complete analysis, starting with a fastq file and produces mapped BAM files and plot files for each file in the given file list and a statistical summary of all files. Note that the -f option expects a text file containing a list of fastq files, one per line. This script can be run with or without supplying tags.

Note that default parameters are for comparative experiments, and will need to be modified for gene essentiality studies.

A help menu for each script can be accessed by running the script by adding with "--help".

Analysis Scripts

Three scripts are provided to perform basic analysis of TraDIS results in bin:

  • tradis_gene_insert_sites - Takes genome annotation in embl format along with plot files produced by bacteria_tradis and generates tab-delimited files containing gene-wise annotations of insert sites and read counts.
  • tradis_essentiality.R - Takes a single tab-delimited file from tradis_gene_insert_sites to produce calls of gene essentiality. Also produces a number of diagnostic plots.
  • tradis_comparison.R - Takes tab files to compare two growth conditions using edgeR. This analysis requires experimental replicates.

License

QuaTradis is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page or email [email protected]

Citation

If you use this software please cite:

"The TraDIS toolkit: sequencing and analysis for dense transposon mutant libraries", Barquist L, Mayho M, Cummins C, Cain AK, Boinett CJ, Page AJ, Langridge G, Quail MA, Keane JA, Parkhill J. Bioinformatics. 2016 Apr 1;32(7):1109-11. doi: 10.1093/bioinformatics/btw022. Epub 2016 Jan 21.

Comments
  • fix channel order in readme

    fix channel order in readme

    Channel order is important for bioconda to work correctly -- the conda-forge has to come first (which means higher priority when specified on the command line with -c). That might be why some users are getting pysam issues requiring a workaround.

    FYI might also want to consider suggesting --strict-channel-priority, see the new bioconda docs.

    opened by daler 1
  • Fixes for albatradis compatibility

    Fixes for albatradis compatibility

    Fixing name of analysis output files for consumption by albatradis.

    Fixing mistake when creating gene names during insertion site analysis.. Shouldn't have ignored underscores in the name.

    opened by maplesond 0
  • requirements.txt should not list bgzip

    requirements.txt should not list bgzip

    A followup to the discussion on the Bioconda PR: The requirements.txt file that you are using should not list bgzip. Names in requirements.txt refer to packages on PyPI, so if you list bgzip, you actually pull in a Python package named bgzip (that is meant to be used via import bgzip from within Python). It will not give you the bgzip binary that your project actually seems to want.

    You cannot list non-Python dependencies in requirements.txt so you can only list that dependency in the Conda recipe.

    opened by marcelm 0
  • Fixing problems running the job in docker.

    Fixing problems running the job in docker.

    The issue was that the mapping stage outputs files to the current working directory which may not have user permissions. The fix is to make sure mapping logs are output to the same place as all other output files.

    opened by maplesond 0
  • Nextflow pipeline to replace bacteria_tradis, and implementation of tradis_gene_insert_sites

    Nextflow pipeline to replace bacteria_tradis, and implementation of tradis_gene_insert_sites

    Adding nextflow to handle processing of multiple fastq files (similar to bacteria_tradis).

    Add the tradis_gene_insert_sites script, and associated functions under isp_analyse. Although there are still some very small diffs between this and old biotradis script in terms of ins_index and ins_count, which I still need to investigate.

    Renamed and refactored a few things.

    Added a few scripts to get closer to feature parity with old BioTradis.

    Tidied up README.

    opened by maplesond 0
  • problem with running tradis pipeline multiple

    problem with running tradis pipeline multiple

    Hello,

    When I try to run following command using quatradis:

    tradis pipeline multiple -v -n 12 -o quatradis_out fastqs_filtered_sizecut_all.txt genome.fa

    this error appears: Traceback (most recent call last): File "/home/jang/anaconda3/envs/mamba/envs/albatradis/bin/tradis", line 293, in main() File "/home/jang/anaconda3/envs/mamba/envs/albatradis/bin/tradis", line 285, in main args.func(args) File "/home/jang/anaconda3/envs/mamba/envs/albatradis/bin/tradis", line 202, in run_multiple_pipeline tradis.run_multi_tradis(args.fastqs, args.reference, File "/home/jang/anaconda3/envs/mamba/envs/albatradis/lib/python3.9/site-packages/quatradis/tradis.py", line 142, in run_multi_tradis pipeline = find_pipeline_file() File "/home/jang/anaconda3/envs/mamba/envs/albatradis/lib/python3.9/site-packages/quatradis/tradis.py", line 101, in find_pipeline_file if os.path.exists(exe_path): File "/home/jang/anaconda3/envs/mamba/envs/albatradis/lib/python3.9/genericpath.py", line 19, in exists os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

    What I'm doing wrong?

    The same input files work smoothly in bacteria_tradis.

    Bests, Jan

    opened by gaworj 1
Owner
Quadram Institute Bioscience
Quadram Institute Bioscience
This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

NSE-timeseries-form-CSV-file-creator-and-SQL-appender- This creates a ohlc timeseries from downloaded CSV files from National Stock Exchange India (NS

PILLAI, Amal 1 Oct 02, 2022
Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

kana sudheer reddy 2 Oct 01, 2021
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dy

2 Nov 08, 2021
Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required)

Binomial Option Pricing Calculator Option Pricing Calculator using the Binomial Pricing Method (No Libraries Required) Background A derivative is a fi

sammuhrai 1 Nov 29, 2021
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 09, 2022
BErt-like Neurophysiological Data Representation

BENDR BErt-like Neurophysiological Data Representation This repository contains the source code for reproducing, or extending the BERT-like self-super

114 Dec 23, 2022
Python reader for Linked Data in HDF5 files

Linked Data are becoming more popular for user-created metadata in HDF5 files.

The HDF Group 8 May 17, 2022
ELFXtract is an automated analysis tool used for enumerating ELF binaries

ELFXtract ELFXtract is an automated analysis tool used for enumerating ELF binaries Powered by Radare2 and r2ghidra This is specially developed for PW

Monish Kumar 49 Nov 28, 2022
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022
INFO-H515 - Big Data Scalable Analytics

INFO-H515 - Big Data Scalable Analytics Jacopo De Stefani, Giovanni Buroni, Théo Verhelst and Gianluca Bontempi - Machine Learning Group Exercise clas

Yann-Aël Le Borgne 58 Dec 11, 2022
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 07, 2023
Spaghetti: an open-source Python library for the analysis of network-based spatial data

pysal/spaghetti SPAtial GrapHs: nETworks, Topology, & Inference Spaghetti is an open-source Python library for the analysis of network-based spatial d

Python Spatial Analysis Library 203 Jan 03, 2023
Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python 📊

Thomas 2 May 26, 2022
Hydrogen (or other pure gas phase species) depressurization calculations

HydDown Hydrogen (or other pure gas phase species) depressurization calculations This code is published under an MIT license. Install as simple as: pi

Anders Andreasen 13 Nov 26, 2022
WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

Institute for Complex Systems (ICS), Johannes Kepler University Linz 40 Dec 13, 2022
Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine

Statistical Rethinking: A Bayesian Course Using CmdStanPy and Plotnine Intro This repo contains the python/stan version of the Statistical Rethinking

Andrés Suárez 3 Nov 08, 2022
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
Anomaly Detection with R

AnomalyDetection R package AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the pre

Twitter 3.5k Dec 27, 2022
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022