[email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media" | PythonRepo" /> [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media" | PythonRepo">

This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Overview

Subreddit Analysis

This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by the Will?: Themes in online discussions of Fitness".

Installation and Requirements

You need to use Python 3.9, R 4.1.0 and git basically to run the scripts provided in this repo. For Ubuntu, to install essential dependencies:

sudo apt update
sudo apt install git python3.9 python3-pip
pip3 install virtualenv

Now clone this repo:

git clone https://github.com/gchochla/subreddit-analysis
cd subreddit-analysis

Create and activate a python environment to download the python requirements for the scripts:

~/.local/bin/virtualenv .venv
source .venv/bin/activate
pip install .

Usage

  1. Download a subreddit into a JSON that preserves the hierarchical structure of the posts by running:
python subreddit_analysis/subreddit_forest.py -r <SUBREDDIT_NAME>

where <SUBREDDIT_NAME> is the name of the subreddit after r/. You can also limit the number of submissions returned by setting -l <LIMIT>. The result can be found in the file <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json.

  1. Transform this JSON to a rectangle (CSV), you can use:
python subreddit_analysis/json_forest_to_csv.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json

which creates <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.csv.

  1. To have a background corpus for control, you can download posts from the redditors that have posted in your desired subreddit from other subreddits:
python subreddit_analysis/user_baseline.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json -pl 200

where -pl specifies the number of posts per redditor to fetch (before filtering the desired subreddit). The file is saved as <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.json

  1. Transform that as well to a CSV:
python subreddit_analysis/json_baseline_to_csv.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.json

which creates <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.csv.

  1. Create a folder, <ROOT>, move the subreddit CSV to it, and create another folder inside it named dictionaries that includes a file (note: the filename -- with a possible extensions -- will be used as the header of the loading) per distributed dictionary with space-separated words:
positive joy happy excited
  1. Tokenize CSVs using the r_scripts.

  2. Compute each post's loadings and write it into the CSV:

python subreddit_analysis/submission_loadings.py -d <ROOT> -doc <CSV_FILENAME>

where <CSV_FILENAME> is relative to <ROOT>.

  1. If annotations are available, which should be in a CSV with (at least) a column for the labels themselves and the ID of the post with a post_id header, you can use these to design a data-driven distributed dictionary. You can first train an RNN to create another annotation file with a predicted label for each post with:
python subreddit_analysis/rnn.py --doc_filename <SUBREDDIT_CSV> --label_filename <ANNOTATION_CSV> --label_column <LABEL_HEADER_1> <LABEL_HEADER_2> ... <LABEL_HEADER_N> --out_filename <NEW_ANNOTATION_CSV>

where you can provide multiple labels for multitasking, thought the model provides predictions only for the first specified label for now. Finally, if annotations are ordinal, you can get learned coefficients from Ridge Regression for each word in the vocabulary of all posts (in descending order of importance) using a tf-idf model to represent each document using:

python subreddit_analysis/bow_model.py --doc_filename <SUBREDDIT_CSV> --label_filename <ANY_ANNOTATION_CSV> --label_column <LABEL_HEADER> --out_filename <IMPORTANCE_CSV>
  1. Run analyses using r_scripts.
Owner
Georgios Chochlakis
ML researcher; CS PhD student @ Uni of Southern California
Georgios Chochlakis
Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

StackGAN-v2 StackGAN-v1: Tensorflow implementation StackGAN-v1: Pytorch implementation Inception score evaluation Pytorch implementation for reproduci

Han Zhang 809 Dec 16, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

73 Nov 06, 2022
Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

The Official Implementation of CLIB (Continual Learning for i-Blurry) Online Continual Learning on Class Incremental Blurry Task Configuration with An

NAVER AI 34 Oct 26, 2022
Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

PRIS-CV: Computer Vision Group 230 Dec 31, 2022
IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

IDRLnet IDRLnet is a machine learning library on top of PyTorch. Use IDRLnet if you need a machine learning library that solves both forward and inver

IDRL 105 Dec 17, 2022
Large-scale Hyperspectral Image Clustering Using Contrastive Learning, CIKM 21 Workshop

Spectral-spatial contrastive clustering (SSCC) Yaoming Cai, Yan Liu, Zijia Zhang, Zhihua Cai, and Xiaobo Liu, Large-scale Hyperspectral Image Clusteri

Yaoming Cai 4 Nov 02, 2022
Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language This repository contains the code, model, and deployment config

16 Oct 23, 2022
Open source hardware and software platform to build a small scale self driving car.

Donkeycar is minimalist and modular self driving library for Python. It is developed for hobbyists and students with a focus on allowing fast experimentation and easy community contributions.

Autorope 2.4k Jan 04, 2023
Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

REDQ source code Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm. Paper link: https://arxiv.org/abs/2101.05

109 Dec 16, 2022
PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

Ilya Kostrikov 546 Dec 05, 2022
Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

Boostcamp AI Tech 3rd : Basic Paper Reading w.r.t Embedding TL;DR 1992년부터 2018년도까지 이루어진 word/sentence embedding의 중요한 줄기를 이루는 기초 논문 스터디를 진행하고자 합니다. 논

Soyeon Kim 14 Nov 14, 2022
PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"

MIRCO PyTorch implementation for paper: Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation Dependencies Python 3.

Big Data and Multi-modal Computing Group, CRIPAC 9 Dec 08, 2022
An implementation of EWC with PyTorch

EWC.pytorch An implementation of Elastic Weight Consolidation (EWC), proposed in James Kirkpatrick et al. Overcoming catastrophic forgetting in neural

Ryuichiro Hataya 166 Dec 22, 2022
Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification

This repo holds the codes of our paper: Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification, which is ac

Feng Gao 17 Dec 28, 2022
The official repository for BaMBNet

BaMBNet-Pytorch Paper

Junjun Jiang 18 Dec 04, 2022
a Lightweight library for sequential learning agents, including reinforcement learning

SaLinA: SaLinA - A Flexible and Simple Library for Learning Sequential Agents (including Reinforcement Learning) TL;DR salina is a lightweight library

Facebook Research 405 Dec 17, 2022
A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

Apurva R. Umredkar 2 Nov 25, 2022
GANsformer: Generative Adversarial Transformers Drew A

GANformer: Generative Adversarial Transformers Drew A. Hudson* & C. Lawrence Zitnick Update: We released the new GANformer2 paper! *I wish to thank Ch

Drew Arad Hudson 1.2k Jan 02, 2023
Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

JKU Linz, Institute of Computer Graphics 39 Dec 09, 2022
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

Sky Computing Introduction Sky Computing is a load-balanced framework for federated learning model parallelism. It adaptively allocate model layers to

HPC-AI Tech 72 Dec 27, 2022