LCG T-TEST USING EUCLIDEAN METHOD

Overview

LCG T-TEST USING EUCLIDEAN METHOD


Advanced Analytics and Growth Marketing Telkomsel


  • Project Supervisor : Rizli Anshari, General Manager of AAGM Telkomsel
  • Writer : Azka Rohbiya Ramadani, Muhammad Gilang, Demi Lazuardi

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

Background


In offering digital product, bussiness analyst must have considered what is the most suitable criterias of customers who have potential to buy the product. As an illustration, targetting gamers for games product campaign is better decision rather than targetting random customers without knowing customers behaviour. However, bussiness wouldn't make all the gamers as the campaign target, otherwise, market team deliberately wouldn't offer to several gamers randomly as comparison, called as Control Group. Therefore, it enables the team in measuring how success the campaign is.

After the campaign, there are product takers who are expected comes from campaign target. Additionaly, nontakers group is expected comes from Control Group and non-target customers. The analysis problems emerge afterwards, since nontakers might come from outside target. For this reason, our team made statistical technique in python algorithm to determine Control Group by applying euclideanmethod-combined t-test, in order for comparing takers and nontakers behaviour before the campaign, in this case we're focusing on revenue before. As a result, it enables bussiness analyst evaluating how success the campaign is.

Installation Guide


This algorith has been uploaded to pypi.org. Therefore, in order to get package, you can easely download using the following command

pip3 install lcgeuclideanmethod

Requirements


This python requires related package more importantly python_requires='>=3.1', so that package can be install Make sure the other packages meet the requirements below

  • pandas>=1.1.5,
  • numpy>=1.18.5,
  • scipy>=1.2.0,
  • matplotlib>=3.1.0,
  • statsmodels>=0.8.0

Usage Guide


1. EuclideanMethod

  • Input:
    • df_takers : dataframe takers containing two columns, customers and revenue before campaign
    • df_nontakers : dataframe nontakers containing two columns, customers and revenue before campaign
  • Output:
    • summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc.
    • df_result : subprocess table to find p-value from random nontakers
    • df_tukey : main result containing category customers category, based on summary calculation
    • tukey : tukey HSD evaluation, readmore Tukey HSD

sample code

from lcgttest.lcgttest import EuclideanMethod
import pandas as pd

# where you put takers and nontakers file
df_takers = pd.read_csv('takers.csv')
df_nontakers = pd.read_csv('nontakers.csv')

model = EuclideanMethod(df_takers, df_nontakers)
model.run()

# output
print(model.summary)
print(model.df_result)
print(model.df_tukey)
print(model.tukey)

2. MapEuclideanMethod

This is like map function in python

  • Input:
    • arr_df_takers : dataframe takers but in array form
    • arr_df_nontakers : dataframe nontakers but in array form
    • labels : labels of both takers and nontakers in array form
  • Output:
    • df_summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc in dataframe form.
    • dict_df_result : subprocess table to find p-value from random nontakers in dicttionary type.
    • dict_df_tukey : main result containing category customers category, based on summary calculation in dicttionary type.
    • dict_tukey : tukey HSD evaluation, readmore Tukey HSD in dicttionary type.

sample code

from lcgttest.lcgttest import MapEuclideanMethod
import pandas as pd
import numpy as np

# where you put takers and nontakers file
arr_df_takers = np.array([df_takers, df_takers2, df_takers3])
arr_df_takers = np.array([df_nontakers, df_nontakers2, df_nontakers3])
labels = ['campaignA','campaignB','campaignC']

model2 = MapEuclideanMethod(arr_df_takers, arr_df_nontakers, label = labels )

# output
print(model.df_summary)
print(model.dict_df_result)
print(model.dict_df_tukey)
print(model.dict_tukey)

3. EuclideanMethodAscDesc

This is run twice MapEuclideanMethod ascending and descending (technique to randomize the nontakers samples)

  • Input:
    • arr_df_takers : dataframe takers but in array form
    • arr_df_nontakers : dataframe nontakers but in array form
    • labels : labels of both takers and nontakers in array form
  • Output:
    • df_summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc in dataframe form.
    • dict_df_result : subprocess table to find p-value from random nontakers in dicttionary type.
    • dict_df_tukey : main result containing category customers category, based on summary calculation in dicttionary type.
    • dict_tukey : tukey HSD evaluation, readmore Tukey HSD in dicttionary type.

sample code

from lcgttest.lcgttest import EuclideanMethodAscDesc
import pandas as pd
import numpy as np

# where you put takers and nontakers file
arr_df_takers = np.array([df_takers, df_takers2, df_takers3])
arr_df_takers = np.array([df_nontakers, df_nontakers2, df_nontakers3])
labels = ['campaignA','campaignB','campaignC']

model3 = EuclideanMethodAscDesc(arr_df_takers, arr_df_nontakers, label = labels )

# output
print(model3.df_summary)
print(model3.dict_df_result)
print(model3.dict_df_tukey)
print(model3.dict_tukey)

# additional input
print(model3.df_asc_desc_avg)
Prithivida 690 Jan 04, 2023
Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Twitter-News-Summarizer Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline 1.) Extracts all tweets fr

Rohit Govindan 1 Jan 27, 2022
What are the best Systems? New Perspectives on NLP Benchmarking

What are the best Systems? New Perspectives on NLP Benchmarking In Machine Learning, a benchmark refers to an ensemble of datasets associated with one

Pierre Colombo 12 Nov 03, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 13.6k Jan 05, 2023
CorNet Correlation Networks for Extreme Multi-label Text Classification

CorNet Correlation Networks for Extreme Multi-label Text Classification Prerequisites python==3.6.3 pytorch==1.2.0 torchgpipe==0.0.5 click==7.0 ruamel

Guangxu Xun 38 Dec 31, 2022
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

Accurately generate all possible forms of an English word Word forms can accurately generate all possible forms of an English word. It can conjugate v

Dibya Chakravorty 570 Dec 31, 2022
Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

2 Dec 29, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 464 Jan 04, 2023
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

Explosion 75 Dec 19, 2022
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

Rishikesh (ऋषिकेश) 33 Sep 22, 2022
Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing.

Ke Technologies 34 Sep 08, 2022
CDLA: A Chinese document layout analysis (CDLA) dataset

CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label: 正文 标题 图片 图片标题 表格 表格标题 页眉 页脚 注释 公式 Text Title

buptlihang 84 Dec 28, 2022
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022
Deep Learning Topics with Computer Vision & NLP

Deep learning Udacity Course Deep Learning Topics with Computer Vision & NLP for the AWS Machine Learning Engineer Nanodegree Program Tasks are mostly

Simona Mircheva 1 Jan 20, 2022
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023
🏖 Easy training and deployment of seq2seq models.

Headliner Headliner is a sequence modeling library that eases the training and in particular, the deployment of custom sequence models for both resear

Axel Springer Ideas Engineering GmbH 231 Nov 18, 2022
Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Speech_38_ru_commands Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR Программа умеет распознавать 38 ключевы

Andrey 9 May 05, 2022
Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

Franck Dernoncourt 1.6k Dec 27, 2022