Pipelines de datos, 2021.

Last update: May 19, 2022

Related tags

Overview

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi.

Stack principal

Python 3.7+
Streamlit
Scikit-learn
Pandas
Luigi

Idea

El proceso completo es descrito en una app interactiva que encuentras en el script app.py. Checa los detalles de cómo levantar la app en la sección de cómo ejecutar los scripts.

Setup

Crea un entorno virtual (te recomiendo usar conda):
```
conda create --name data-pipes python=3.7
```
Activate the virtual environment:
```
conda activate data-pipes
```
Install requirements:
```
pip install -r requirements.txt
```

Ejecuta los scripts

App interactiva

Para ejecutar la app interactiva, simplemente ejecuta el comando de Streamlit con el entorno virtual activado:

(data-pipes) streamlit run app.py

Esto abrirá un servidor local en: http://localhost:8501.

Pipeline de datos

Si deseas ejecutar una tarea en específico ,supongamos la TareaX que se encuentra en el script tareas.py, entonces ejecuta el comando:

PYTHONPATH=. luigi --module tareas TareaX --local-scheduler

¡Puedes extender el código y agregar las tareas que tú desees!

Pipelines de datos, 2021.

Related tags

Overview

Stack principal

Idea

Setup

Ejecuta los scripts

App interactiva

Pipeline de datos

Owner

Rodolfo Ferro

Code for Text Prior Guided Scene Text Image Super-Resolution

A CSRankings-like index for speech researchers

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

Score-Based Point Cloud Denoising (ICCV'21)

Reading Wikipedia to Answer Open-Domain Questions

Chinese Grammatical Error Diagnosis

基于pytorch+bert的中文事件抽取

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

An evaluation toolkit for voice conversion models.

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

AI_Assistant - This is a Python based Voice Assistant.

Implementation of Multistream Transformers in Pytorch

Exploration of BERT-based models on twitter sentiment classifications

PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Python wrapper for Stanford CoreNLP tools v3.4.1

The entmax mapping and its loss, a family of sparse softmax alternatives.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat: