Python script to download (TCR) genes from IMGT/GENE-DB

Related tags

DownloaderIMGTgeneDL
Overview

IMGTgeneDL

0.1.0

Jamie Heather | CCR @ MGH | 2021

This script provides an alternative way to access TCR and IG genes stored in IMGT/GENE-DB. It's primarily designed for downloading human/mouse TCRs, but it's readily adaptable to other species/loci.

Usage

This script is tested on python >= 3.6, not requiring any non-standard packages.

Download specific gene types

The primary way this script is intended to be used is to tell it the species, loci, and sequence types that you want to download. This will be downloaded and saved to a file named with the date the IMGT release used, and details of the combination of parameters searched for (unless overriden with the -o / --out_path flag).

Species

The script must be run on single species at a time, given via the -s / --species flag as a full genus species with a '+' symbol in place of the space. E.g.:

  • -s Homo+sapiens
  • -s Mus+musculus

Note that it doesn't seem that the IMGT URL interface will accept either genus or species alone, and it can be particular about formatting so maintaining proper case is advised. However it seems to download sub-species (e.g.\ searching for Mus+musculus will return all covered strains).

Loci

This script is currently configured to download the four common TCR loci:

  • A / TRA / alpha
  • B / TRB / beta
  • G / TRG / gamma
  • D / TRD / delta

These must be provided to the script using the -L / --loci flag, giving it the desired loci as a string of characters, e.g. -L AB to just download alpha and beta sequences, or -L G to just download gamma. Alternatively -L TR will simply download all four chains' sequences (equivalent to -L ABGD).

Sequence types

This script is designed to help in the aid in the analysis of typical expressed repertoires, and thus is configured by default to download the relevant parts of the loci that end up involved in expressed transcripts. The download of each is achieved using specific flags:

  • -l / --get_l: download leader sequences
  • -v / --get_v: download V sequences
  • -d / --get_d: download D sequences
  • -j / --get_j: download J sequences
  • -c / --get_c: download constant region sequences

Note that these can be combined, e.g. -vdj will just download the V, D, and J gene sequences. Alternatively users can apply the -r / --get_all_regions flag to just download all of these regions (equivalent to -lvdjc).

Examples

The following is the basic command to download all relevant human sequences for all chains:

python3 IMGTgeneDL.py -s Homo+sapiens -L TR -r

While this is a command to just download delta chain J genes from mice:

python3 -i IMGTgeneDL.py -s Mus+musculus -j -L D

Download whole database

If no locus and gene type flags are used, or if the -a / --get_all flag is used, then the script will just download the whole of GENE-DB - all species, all genes, all loci. By default this downloads the ungapped nucleotide file, with all pseudogenes, and saves this to a file named with the date and the IMGT release used. This can be changed using the following flags:

  • -gap / --gapped: downloads the gapped FASTA instead of the ungapped
  • -ifp / --in_frame_p: downloads the 'inframeP' FASTA instead of the 'allP'
  • -o / --out_path: as above, sets the path to a specific file if you don't wish to use the automatic names or save in the same directory

Notes

Downloaded regions

The architecture of the TCR loci differs a little between the genes and across species, which the IMGT nomenclature has specific terms to cope with. However the URL based searching this script does requires provision exact exon names for each species. This script assumes generic defaults, but these can be overriden by providing specific details in the tab-delimited region-overrides.tsv file. This allows users to override which fields are downloaded, or download additional fields, by adding an entry for the relevant gene/species combination and filling in the final 'Field(s)' comma-delimited field with the IMGT labels to be downloaded.

The most relevant place this comes in to place is in the constant regions, which have differing numbers and names of exons. The relevant differences for these loci/species are that exon 4 of the alpha and delta chains is an UTR, while gamma chains lack a fourth exon and have duplicated exon 2 variants. If users wish to run the script to download specific sequences including constant regions for species other than humans or mice they will need to edit this document appropriately first.

The other default IMGT labels downloaded are:

  • L-PART1+L-PART2 for leader sequences
  • V-/D-/J-REGION for V/D/J genes

IMGT FASTA headers

The IMGT header FASTA fields (as reported in the output of GENE-DB) are:

The FASTA header contains 15 fields separated by '|':

1. IMGT/LIGM-DB accession number(s)
2. IMGT gene and allele name
3. species
4. IMGT allele functionality
5. exon(s), region name(s), or extracted label(s)
6. start and end positions in the IMGT/LIGM-DB accession number(s)
7. number of nucleotides in the IMGT/LIGM-DB accession number(s)
8. codon start, or 'NR' (not relevant) for non coding labels
9. +n: number of nucleotides (nt) added in 5' compared to the corresponding label extracted from IMGT/LIGM-DB
10. +n or -n: number of nucleotides (nt) added or removed in 3' compared to the corresponding label extracted from IMGT/LIGM-DB
11. +n, -n, and/or nS: number of added, deleted, and/or substituted nucleotides to correct sequencing errors, or 'not corrected' if non corrected sequencing errors
12. number of amino acids (AA): this field indicates that the sequence is in amino acids
13. number of characters in the sequence: nt (or AA)+IMGT gaps=total
14. partial (if it is)
15. reverse complementary (if it is)
Disclaimer

I am not affiliated with IMGT, and this tool is only shared as a way to increase the utility of their platform. Please TCR responsibly.

Owner
Jamie Heather
Postdoc research working in cancer immunology at MGH.
Jamie Heather
Python script to automate youtube-dl downloads

Automated Download Tool !! Project status I am writing a new version of this program, which will solve several errors. The new version only supports G

Devil64-Dev 21 Sep 22, 2022
抖音批量下载助手

抖音批量下载助手

HuangSK 303 Jan 05, 2023
A simple Python program which uses youtube-dl for downloading YouTube videos as mp3 files.

yt-mp3 converter This is a simple Python program which uses youtube-dl for downloading YouTube videos as mp3 files. This program is for you if you are

nostalgicnerdpenguin 1 Oct 24, 2021
Download songs and playlists from Spotify for free!

spotify-to-mp3-converter You can basically understand the process with just this image but for clarity, these are the steps. Before using the exe down

2 Jan 25, 2022
A simple kemono.party downloader using python.

kemono-dl This is a simple kemono.party downloader. How to use Install python Download source code from releases and extract it Then install requireme

318 Dec 27, 2022
Download all games from a public Itch.io Game Jam

Itch Jam Downloader Downloads all games from a public Itch.io Game Jam. What you'll need: Python 3.8+ pip install -r requirements.txt For site mirrori

Dragoon Aethis 19 Dec 07, 2022
Bulk Downloader for Reddit

saveddit is a bulk media downloader for reddit pip3 install saveddit Setting up authorization Register an application with Reddit Write down your clie

Pranav 136 Jan 03, 2023
Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

open(LARGE) Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included. Motivation Oftentimes, especi

András Schmelczer 2 Jan 30, 2022
Youtube video downloader and info extractor for python.

tube_dl Tube_dl is a Simple Youtube video downloader for Python. A Modular approach to bypass and download Youtube Videos and Playlist from Youtube us

Shekhar Chander 16 Jul 09, 2022
📺 YouTube Song Downloader Bot For Telegram 🔮

📺 YouTube Song Downloader Bot For Telegram 🔮 Powerd By TamilBots.

Tamil Bots 146 Dec 31, 2022
A web app for downloading Facebook comments as a csv file

Facebook Comment Downloader A small web app for downloading comments from a public facebook page post. Comment downloading from https://github.com/min

WSDOT 23 Jan 04, 2023
Newsemble is an API that provides easy access to the current news for programmatic analysis

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.

Rishabh 43 Dec 16, 2022
😷 Dowload dos documentos da CPI da Pandemia

A CPI da Pandemia recebeu milhares de documentos públicos, todos disponibilizados no site do Senado Federal.

Eduardo Cuducos 98 Sep 23, 2022
This is Yt Downloader. Coded with Python (my first repository)

Get Started Download & install Python first before using this software. Download Python Installing Python and Pytube Library (IMPORTANT) Installing Py

Qi 2 Oct 25, 2021
File Downloader

File Downloader Watches a file containing download links and runs a command to download them. The link file is in form of: # comment DOWNLOAD_LINK

Pouriya 1 Jan 08, 2022
A Fast as F*** Downloader

FAFD A Fast as F*** Downloader Github Usages You'll want to use a URL like this: https://github.com/RPowell-C/FAFD/raw/main/FAFD.py It's easier DONT F

1 Jan 19, 2022
A simple GUI video downloader built off of the python module 'yt-dlp'

Simple-Youtube-DL-Gui Supported Operating Systems Windows 7 (x64), Windows 8 (x64), and Windows 10 (x64) How to use Main Gui Extract program from arch

12 Dec 30, 2022
Utility for downloading works from AO3 (Archive Of Our Own)

froyo A small graphical application for batch downloading works from Archive Of Our Own (AO3). Curate a fic repo of your own today :) Features Batch d

flux 24 Dec 09, 2022
A Unit3D Mass Release Downloader

Unit3DMassDL A Unit3D Mass Release Downloader. Currently supports Aither. Installation Ensure Python 3 is installed in your system. Run the following

2 Apr 11, 2022
Download YouTube videos/music and images in MP4, JPG with this tool.

ABOUT THE TOOL Download YouTube videos, music and images in MP4, JPG with this tool, with an easy to understand interface. This tool works with both,

TrollSkull 5 Jan 02, 2023