Toolchest provides APIs for scientific and bioinformatic data analysis.

Overview

Toolchest Python Client

Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of running tools on your own resources by running the same jobs on secure, powerful remote servers.

This package contains the Python client for using Toolchest. For the R client, see here.

Installation

The Toolchest client is available on PyPI:

pip install toolchest-client

Usage

Using a tool in Toolchest is as simple as:

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY")
toolchest.kraken2(
  tool_args="",
  inputs="path/to/input.fastq",
  output_path="path/to/output.fastq",
)

For a list of available tools, see the documentation.

Configuration

To use Toolchest, you must have an authentication key stored in the TOOLCHEST_KEY environment variable.

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY") # or a file path containing the key

Contact Toolchest if:

  • you need a key
  • you’ve forgotten your key
  • the key is producing authentication errors.

Documentation & User Guide available at Read the Docs

Comments
  • Enable paired reads for `kraken2`

    Enable paired reads for `kraken2`

    Adds the option to use paired-read inputs for kraken2, via the read_one and read_two arguments (or a list of two paths via inputs).

    Adds/removes --paired to tool_args as necessary.

    opened by bcai2 3
  • v0.4.0

    v0.4.0

    • Add Poetry, remove Twine

    • Add CircleCI automatic deploy to PyPI (untested for prod PyPI)

    Note: CircleCI will be failing because v0.4.0 already exists on test PyPI. That is to be expected, because I already bumped it to v0.4.0 when testing.

    opened by lebovic 3
  • S3 chaining

    S3 chaining

    Adds:

    • Output class returned by all toolchest.tool() calls, which contains s3_uri, presigned_s3_url, and (local) output_path variables
    • S3 chaining, via supplying output.s3_uri from a previous tool as the inputs parameter for a following tool
    • the ability to skip download of any tool's output, by setting output_path=None (set to None by default)
    opened by lebovic 2
  • Polish tool_arg handling, add more STAR args

    Polish tool_arg handling, add more STAR args

    Adds:

    • More STAR args
    • Add multiple levels of tool_arg handling (whitelist, dangerlist, blacklist)
    • Error on unknown or blacklisted args
    • Reduce complexity (validation and parallelization for now) if a dangerous argument is passed

    Requires:

    • https://github.com/trytoolchest/toolchest-worker-node/pull/24
    • https://github.com/trytoolchest/toolchest-api/pull/22

    This does not fix:

    • Bigger disk/memory/etc requirements for larger files where args trigger reduced complexity / no parallelization
    opened by lebovic 2
  • STAR whitelist options

    STAR whitelist options

    • Adds basic whitelist options for STAR.

    • Adds support for tags with variable amounts of arguments. Adds the --quantMode tag for STAR.

    (This should be merged in after the kraken2 paired read commit.)

    opened by bcai2 2
  • feat: centrifuge base

    feat: centrifuge base

    • Adds the centrifuge tool.
    • Adds docs.
    • Refactors how prefix_mapping is generated for megahit with a new module (input_util.py) and function (convert_input_params_to_prefix_mapping). Adds a unit test for the function.
    opened by bcai2 1
  • fix: upload/download tracker bugfixes

    fix: upload/download tracker bugfixes

    • Refactors the tracking printed statements into a pythonic print call with string formatting.
    • Fixes status update logic in uploading. (This was causing the terminal output to stall at the "uploading" stage.)
    • Adds integration test dirs to .gitignore.
    opened by bcai2 1
  • fix: remove pysam due to multiple issues

    fix: remove pysam due to multiple issues

    Pysam has caused multiple issues as a package and STAR parallelization is not currently used so this pr fully removes pysam as a dependency. Either a different library or custom sam file merging code is planned to be implemented later so parallelization framework is remaining in the code for now.

    opened by jherr-dev 1
  • feat: add preliminary alphafold support

    feat: add preliminary alphafold support

    Adds basic support for running AlphaFold via Toolchest. Code needs to be cleaned up and better documented. Currently limited to 1 input fasta.

    use_reduced_dbs and is_prokaryote_list are currently disabled until further implementation and testing is done. Integration will come with reduced dbs since full dbs take 45 minutes to an hour to run even on simple input.

    opened by jherr-dev 1
  • feat: support async execution

    feat: support async execution

    Adds:

    • Support for async execution

    See https://gist.github.com/lebovic/72fbb857119f1667c7959a4d7e28cd50 (or the integration test) for a hacky example on how to run Toolchest with async execution.

    opened by lebovic 1
  • fix: set default version number

    fix: set default version number

    Sets the version number to a default instead of erroring if the client is run from source (i.e., without the toolchest-client package being installed via pip).

    Open question: the version number defaults to 0.0.0, which can be confusing -- are there any other labels that might be better (e.g., dev or just the empty string)?

    opened by bcai2 1
Releases(v0.11.3)
Owner
Toolchest
Toolchest
Gathering data of likes on Tinder within the past 7 days

tinder_likes_data Gathering data of Likes Sent on Tinder within the past 7 days. Versions November 25th, 2021 - Functionality to get the name and age

Alex Carter 12 Jan 05, 2023
A crude Hy handle on Pandas library

Quickstart Hyenas is a curde Hy handle written on top of Pandas API to allow for more elegant access to data-scientist's powerhouse that is Pandas. In

Peter Výboch 4 Sep 05, 2022
Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

Pablo 1 Nov 05, 2021
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c

SPCL 330 Dec 30, 2022
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

Candace Makeda Moore, MD 20 Jan 05, 2023
This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

VarDACAE This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Dat

Julian Mack 23 Dec 16, 2022
Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

Anang Sahroni 0 Dec 04, 2021
Extract data from a wide range of Internet sources into a pandas DataFrame.

pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d

Python for Data 2.5k Jan 09, 2023
Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 01, 2021
Get mutations in cluster by querying from LAPIS API

Cluster Mutation Script Get mutations appearing within user-defined clusters. Usage Clusters are defined in the clusters dict in main.py: clusters = {

neherlab 1 Oct 22, 2021
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
Elasticsearch tool for easily collecting and batch inserting Python data and pandas DataFrames

ElasticBatch Elasticsearch buffer for collecting and batch inserting Python data and pandas DataFrames Overview ElasticBatch makes it easy to efficien

Dan Kaslovsky 21 Mar 16, 2022
Stochastic Gradient Trees implementation in Python

Stochastic Gradient Trees - Python Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on th

John Koumentis 2 Nov 18, 2022
Feature Detection Based Template Matching

Feature Detection Based Template Matching The classification of the photos was made using the OpenCv template Matching method. Installation Use the pa

Muhammet Erem 2 Nov 18, 2021
AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

AptaMAT Purpose AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures. The method is based on the compa

GEC UTC 3 Nov 03, 2022
An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

largeVis This is an implementation of the largeVis algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates: A very fast algori

336 May 25, 2022
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022