Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Morpy Bot Linux - Morpy Bot Linux With Python

Morpy_Bot_Linux Guide to using the robot : 🔸 Lsmod = to identify admins and st

2 Jan 20, 2022
An unofficial Python wrapper for the 'Binance exchange REST API'

Welcome to binex_f v0.1.0 many interfaces are heavily used by myself in product environment, the websocket is reliable (re)connected. Latest version:

DeepLn 2 Jan 05, 2022
A bot i made for a dead com server lol it gets updated daily etc

6ix-Bot-Source A bot i made for a dead com server lol it gets updated daily etc For The UserAgent CMD https://developers.whatismybrowser.com/ thats a

Swiper 9 Mar 10, 2022
Using Streamlit to build a simple UI on top of the OpenSea API

OpenSea API Explorer Using Streamlit to build a simple UI on top of the OpenSea API. 🤝 Contributing Contributions, issues and feature requests are we

Gavin Capriola 1 Jan 04, 2022
BroBot's files, code and tester.

README - BroBOT Made by Rohan Chaturvedi [email protected] DISCLAIMER: Th

1 Jan 09, 2022
A discord token nuker With loads of options that will screw an account up real bad

A discord token nuker With loads of options that will screw an account up real bad, also has inbuilt massreport, GroupChat Spammer and Token/Password/Creditcard grabber and so much more!

XPTGR 0 Aug 07, 2022
Python library wrapping and enhancing the Invenio RDM REST API.

Iridium The metal Iridium is used to refine and enhance metal alloys. Similarly, this package provides an enhanced coating around the Invenio RDM APIs

Materials Data Science and Informatics 2 Mar 29, 2022
A Discord token grabber executing in a Microsoft Document.

🦊 Rage 🦊 Rage is a tool written in Python3 allowing you to inject a Python3 complete Discord token grabber (Riot) script in a Microsoft Document usi

Billy 73 Nov 03, 2022
Library for working with QIWI API.

Library for working with QIWI API.

qxtony 2 Apr 26, 2022
A Telegram bot for playing Trop Bon Cadavre.

Trop Bon Cadavre A Telegram bot for playing Trop Bon Cadavre (english: very good corpse), a game freely adapted from the french Cadavre exquis. A game

4 Oct 26, 2021
A cs:go cheat/hack made in Python3.

Atomic 💖 Cheat for cs:go written in Python. Features. Glow Esp No Flash Bunny Hop Third Person To-Do. It is prefered to start the cheat when you are

Sofia 6 Feb 12, 2022
A collection of discord tools I've made.

Discord A collection of discord tools i've made. What's in here? Basically every discord related project i've worked on can be found here, i'll try an

?? ?? ?? 6 Nov 13, 2021
An example of a chatbot with a number-based menu that can be used as a starting point for a project.

NumMenu Bot NumMenu Bot is an example chatbot showing a way to design a number-based menu assistant with Rasa. This type of bot is very useful on plat

Derguene 19 Nov 14, 2022
Python Business Transactions Library - ContractsPY

Python Business Transactions Library - ContractsPY Declare and define business transactions in Python. Use the contracts library to validate business

Arzu Huseynov 7 Jun 21, 2022
Kyura-Userbot: a modular Telegram userbot that runs in Python3 with a sqlalchemy database

Kyura-Userbot Telegram Kyura-Userbot adalah userbot Telegram modular yang berjal

Kyura 17 Oct 29, 2022
Discord Unverified Token Gen

Discord-Unverified-Token-Gen This is a token gen that was made in an hour and just generates unverified tokens, most will be locked. Usage: in cmd jus

Aran 2 Oct 23, 2022
A simple Python TDLib wrapper

Telegram Forwarder App Description pywtdlib (Python Wrapper TDLib) is a simple synchronous Python wrapper that makes you easy to create new Python Tel

Álvaro Fernández 2 Jan 04, 2023
This is a music bot for discord written in python

this is a music bot for discord written in python, it is designed for educational use ONLY, I do not take any responsibility for uses outside of educational use

5 Dec 24, 2021
🪣 Bitbucket Server PAT Generator

🪣 Bitbucket Server PAT Generator 🤝 Introduction Bitbucket Server (nee Stash) can hand out Personal Access Tokens (PAT) to be used in-place of user+p

reecetech 2 May 03, 2022
Simple Discord bot for snekbox (sandboxed Python code execution), self-host or use a global instance

snakeboxed Simple Discord bot for snekbox (sandboxed Python code execution), self-host or use a global instance

0 Jun 25, 2022