AP1 Transcription Factor Binding Site Prediction

In this project, we built machine learning models to predict and classify the binding sites of AP1 transcription factor in the human genome.

Experiments such as Chip-Seq can identify a list of DNA regions bound by a given transcription factor. Combined with a computational scan for the AP1’s position- weight matrix, this can be used to identify sites that are occupied by AP1 in the cell type and conditions where the experiments were made.

The project involved: (1) Identifying a set of bound and non-bound DNA sequences for a given TF based on existing experimental data (2) Calculating the DNA physical properties of each sequence (3) Training a machine learning classifier to distinguish between bound and unbound sites.

With the use of sequencePreProcessing.py and motifPreProccessing.py, we pre-proccessed the local DNA shape and motif sequence data.

In machineLearningClassifers.py, we built and trained the classifers using this dataset.

In Using Machine Learning to Predict AP1 TF Binding Sites.pdf, we shared our results and analysis of the project.

AP1 Transcription Factor Binding Site Prediction

Related tags

Overview

AP1 Transcription Factor Binding Site Prediction

Owner

fMRIprep Pipeline To Machine Learning

K-Means clusternig example with Python and Scikit-learn

A data preprocessing package for time series data. Design for machine learning and deep learning.

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

Solve automatic numerical differentiation problems in one or more variables.

A collection of Scikit-Learn compatible time series transformers and tools.

A single Python file with some tools for visualizing machine learning in the terminal.

Iterative stochastic gradient descent (SGD) linear regressor with regularization

CS 7301: Spring 2021 Course on Advanced Topics in Optimization in Machine Learning

icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

jaxfg - Factor graph-based nonlinear optimization library for JAX.

Python package for causal inference using Bayesian structural time-series models.

Price forecasting of SGB and IRFC Bonds and comparing there returns

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

A classification model capable of accurately predicting the price of secondhand cars

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

My project contrasts K-Nearest Neighbors and Random Forrest Regressors on Real World data

Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.