Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Overview

MeConcord

  • MeConcord is a method used to investigate local read-level DNA methylation patterns for intermediately methylated regions with bisulfite sequencing data.
  • Intermediately methylated regions occupy a significant fraction of the whole genome and are markedly associated with epigenetic regulations or cell-type deconvolution of bulk data. However, these regions show distinct methylation patterns corresponding to different biological mechanisms. Although there have been some metrics developed for investigating these regions, the poor perfor-mance in antagonizing noises limits the utility for distinguishing distinct methylation patterns.
  • We proposed a method, MeConcord, with two metrics measuring local methylation concordance across reads and CpGs, respectively, with Hamming distance. MeConcord showed the most robust performance in distinguishing distinct methylation patterns (identical, uniform, and disor-dered) compared with other metrics.

Installation

  • MeConcord is implemented by Python and compatible with both Python 2 and Python 3.
  • Modules of python are required:pysam(if the input is .bam files), pandas,numpy, scipy,multiprocessing.
  • The scripts could be downloaded and used directly with command python *.py -i ....

Usage

Input

MeConcord currently only accept the output(.bam or converted to .sam) of Bismark (https://github.com/FelixKrueger/Bismark/blob/master/README.md)

Run

1.Obtaining CpG positions across genome

Usage: python pre_cpg_pos.py -i hg38.fa -o ./cpg_pos/

  • i, The path to reference sequences (.fa);
  • o, The path that you want to deposit the positions of CpG sites, each chromosome has a seperate file;
  • h, Help information

2.Converting mapped Bam, Sam, Sam.gz files from Bismark to methylation recordings read-by-read

Usage: python s1_bamToMeRecord.py -i test.bam -o test -c 0

  • i, The path to input files (.bam or .sam or .sam.gz);
  • o, Output prefix;
  • c, Clipping read ends with such base number (defalut 0); can be used when sequencing quality of read ends is not good. such as -c 5 to remove 5 bases from the both ends of the reads.
  • h, Help information

3.Spliting the big MeRecord files into small files of each chromosome to redude memory requirements in the next step

Usage: python s2_RecordSplit.py -i ./test_ReadsMethyAndMuts.txt -o ./test -g chr1,chr2,chr3,chr4,chr5

  • i, The path to s1 output. ( end with _ReadsMethyAndMuts.txt);
  • o, Output prefix;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • h, Help information

4. Calculating concordance metrics (NRC, NCC and P-values)

Usage: python s3_RecordToMeConcord.py -p 4 -i ./test -o ./test -r ./region.bed -c ./cpgpos/ -b 150 -m 600 -z 0 -g chr1,chr2,chr3

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • p, Threads used for parallel computation; default is 4;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • b, Bin size (default 150bp);
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of fragement length in sequencing library(default 600bp for paired-end reads). if there are single-end reads,m should be set as the length of reads, if not sure, default will work for most cases;

5. Methylation recordings to methylation matrix (optional)

Usage: python s4_RecordToMeMatrix.py -i ./test -o ./test -r ./p1.bed -c ./cpgpos/ -m 600 -z 0 -g chr1,chr2

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of reads length (default 600bp for paired-end reads). if there are single-end reads,m should be set length of reads, if not sure, default will work for most cases;

6. Visualization of methylation matrix (optional)

Usage: visualization_Matlab.m

  • Open this script and edit

    • path_to_matrix as the path you deposit the MeMatrix;
    • path_to_cpgPos as the path you deposit CpG positions of the genome, which is the result of pre_cpg_pos.py;
    • name as the name of MeMatrix, for example 'test_chr1_1287967_1288117';
  • Output: two lollipop plots, one without considering distance between CpGs, one considering distance between CpGs.

    • unmethylated CpGs are labeled as light blue
    • CpGs without signal are labeled as grey
    • methylated CpGs are labeled as dark red

Test for an example

  • STEP 1 python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.bam -o ./test/test -c 2 or python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.sam -o ./test/test -c 2 if there is no pysam module on Windows

    • The error that Could not retrieve index file for './test/GM12878_chr1_1286017_1294783.bam' doesn't affect the results.
    • Please check if there is an output in test folder, test_ReadsMethyAndMuts.txt. If yes, it works.
  • STEP 2 python s2_RecordSplit.py -i ./test/test_ReadsMethyAndMuts.txt -o ./test/test -g chr1

    • Please check if there is an output in test folder, test_ReadsMethyAndMuts_chr1.txt. If yes, it works.
  • STEP 3 python s3_RecordToMeConcord.py -p 1 -i ./test/test -o ./test/test -r ./test/tmp1.bed -c ./test/ -b 150 -m 600 -z 1 -g chr1

    • Please check if there is an output in test folder, test_MeConcord.txt. If yes, it works.
  • STEP 4 python s4_RecordToMeMatrix.py -i ./test/test -o ./test/test -r ./test/tmp2.bed -c ./test/ -m 600 -z 1 -g chr1

    • Please check if there is two output files in test folder, test_chr1_1287967_1288117_me.txt; test_chr1_1287967_1288117_unme.txt. If yes, it works.
Owner
omics tools,especially for DNA methylation
Projeto para ajudar no aprendizado da linguagem Pyhon

Economize Este projeto tem o intuito de criar desáfios para a codificação em Python, fazendo com que haja um maior entendimento da linguagem em seu to

Lucas Cunha Rodrigues 1 Dec 16, 2021
IEEE ITU bunyesinde komitelere verilen Python3 egitiminin dokumanlastirilmis versiyonlari bu repository altinda tutulmaktadir.

IEEE ITU Python Egitimi Nasil Faydalanmaliyim? Dersleri izledikten sonra dokumanlardaki kodlari yorum satirlari isaretlerini kaldirarak deneyebilirsin

İTÜ IEEE Student Branch 47 Sep 04, 2022
JimShapedCoding Python Crash Course 2021

Python CRASH Course by JimShapedCoding - Click Here to Start! This Repository includes the code and MORE exercises on each section of the entire cours

Jim Erg 64 Dec 23, 2022
HogwartsRegister - A Hogwarts Register With Python

A Hogwarts Register Installation download code git clone https://github.com/haor

0 Feb 12, 2022
A VirtualBox manager with interactive mode

A VirtualBox manager with interactive mode

Luis Gerardo 1 Nov 21, 2021
Different steganography methods with examples and my own small image database

literally-the-most-useless-project [Different steganography methods with examples and my own small image database] This project currently contains thr

Kamyishka 1 Dec 09, 2022
A simple code for processing images to local binary pattern.

This figure is gotten from this link https://link.springer.com/chapter/10.1007/978-3-030-01449-0_24 LBP-Local-Binary-Pattern A simple code for process

Happy N. Monday 3 Feb 15, 2022
Multiple GNOME terminals in one window

Terminator by Chris Jones [email protected] and others. Description Terminator was

GNOME Terminator 1.5k Jan 01, 2023
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

Scalene: a high-performance CPU, GPU and memory profiler for Python by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene community Slack Ab

PLASMA @ UMass 7k Dec 30, 2022
Lock a program and kills it indefinitely if it is started.

Kill By Lock Lock a program and kills it indefinitely if it is started. How start it? It' simple, you just have to double-click on the python file (.p

1 Jan 12, 2022
Python implementation of the Learning Time-Series Shapelets method, that learns a shapelet-based time-series classifier with gradient descent.

shaplets Python implementation of the Learning Time-Series Shapelets method by Josif Grabocka et al., that learns a shapelet-based time-series classif

Mohamed Haseeb 187 Dec 14, 2022
Packaging tools for shanty services.

parcel Packaging tools for shanty services. What? Services are docker containers deployed by shanty on a hosting appliance. Each service consists of t

0 Jan 20, 2022
Abilian Core: an enterprise application development platform based on the Flask micro-framework, the SQLAlchemy ORM

About Abilian Core is an enterprise application development platform based on the Flask micro-framework, the SQLAlchemy ORM, good intentions and best

Abilian open source projects 47 Apr 14, 2022
Irrigation Component V4 providing support for a custom card

Irrigation Component V4 This release sees the delivery of a custom card https://github.com/petergridge/irrigation_card to render the program options s

12 Oct 28, 2022
Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

5 Jun 19, 2021
My Analysis of the VC4 Assembly Code from the RPI4

My Analysis of the VC4 Assembly Code from the RPI4

Nicholas Starke 31 Jul 13, 2022
Python requirements.txt Guesser

Python-Requirements-Guesser ⚠️ This is alpha quality software. Work in progress Attempt to guess requirements.txt modules versions based on Git histor

Jerome 9 May 24, 2022
NYCU(NCTU)-差勤-助教

NCTU-TA-fill 填寫 差勤-助教時數 有沒有覺得在差勤系統填助教時數有點浪費生命? 今天有個懶鬼浪費好多時間幫大家寫了code 只要填好的必要的資料,就可以讓電腦自動幫你完成差勤助教的時數填寫喔! https://pt-attendance.nctu.edu.tw/verify/userL

14 Dec 21, 2021
Analyzes crypto candles over a set time period and then trades based on winning patterns found

patternstrade Analyzes crypto candles over a set time period and then trades based on winning patterns found. Heavily customizable. Warning: This was

ConnorCreate 14 May 29, 2022
Make dbt docs and Apache Superset talk to one another

dbt-superset-lineage Make dbt docs and Apache Superset talk to one another Why do I need something like this? Odds are rather high that you use dbt to

Slido 81 Jan 06, 2023