A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
Isobot is originally made by notsniped. This is a remix of iso.bot by archisha.

iso6.9-1.2beta iso.bot is originally made by notsniped#0002. This is a remix of iso.bot by αrchιshα#5518. iso6.9 is a Discord bot written in Python an

Kamilla Youver 3 Jan 11, 2022
AminoSpamKilla - Spam bot for amino that uses multiprocessing module

AminoSpamKilla Spam bot for amino that uses multiprocessing module Pydroid Open

4 Jun 27, 2022
Matrix trivia bot with python

Matrix-trivia-bot Getting started See SETUP.md for how to setup and run the template project. Project structure A reference of each file included in t

1 Nov 16, 2021
Clash of Clans v6.253 private server written in python

cocps Clash of Clans v6.253 private server written in python how2play download server files download Patched APK run Main.py and play Authors Patched

5 Aug 28, 2022
Automatic generation of crypto-arts based on image layers

NFT Generator Автоматическая генерация крипто-артов на основе слоев изображения. Установка pip3 install -r requirements.txt rm -rf result/* Как это ра

Zproger 31 Dec 29, 2022
Ivan Telegram Userbot with python

Riviani Ramadhan Ivan-Ubot Pada Dasarnya Ivan-Ubot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy. Berbasis Paper

1 Oct 29, 2021
(unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge.

Googletrans Googletrans is a free and unlimited python library that implemented Google Translate API. This uses the Google Translate Ajax API to make

Suhun Han 3.2k Jan 04, 2023
DongTai API SDK For Python

DongTai-SDK-Python Quick start You need a config file config.json { "DongTai":{ "token":"your token", "url":"http://127.0.0.1:90"

huoxian 50 Nov 24, 2022
this is a telegram bot repository, that can stream video on telegram group video chat.

VIDEO STREAM BOT telegram bot project for streaming video on telegram video chat, powered by tgcalls and pyrogram 🛠 Commands: /vstream (reply to vide

levina 319 Aug 15, 2022
You can share your Chegg account for answers using this bot with your friends without getting your account blocked/flagged

Chegg-Answer-Bot You can share your Chegg account for answers using this bot with your friends without getting your account blocked/flagged Reuirement

Ammey Saini 27 Dec 24, 2022
Ice-Userbot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy

Ice-Userbot Telegram Ice-Userbot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy. Berbasis Paperplane dan ProjectB

6 Apr 29, 2022
A modern, easy to use, feature-rich, and async ready API wrapper improved and revived from original discord.py.

A Python API wrapper that is improved and revived from the original discord.py

Orion 19 Nov 06, 2021
Full-featured Python interface for the Slack API

This repository is archived and will not receive any updates It's time to say goodbye. I'm archiving Slacker. It's been getting harder to find time to

Oktay Sancak 1.6k Dec 13, 2022
GBSLocalLauncher - A script to compose ENV file for Local Compose

GBSLocalLauncher This is a script to compose ENV file for Local Compose. It crea

2 Jan 27, 2022
💰 Import your ING Germany bank statements via FinTS into YNAB.

Import your ING Germany bank statements via FinTS into YNAB. Setup Before setting this up, please register your FinTS product – it's free and takes on

Arne Bahlo 23 Jan 21, 2022
Discord bot written in discord.py

Orion Discord bot written in discord.py Installation Installation of code is supported for macOS only currently First open the terminal. If incase you

Zeus 3 May 19, 2022
Python SDK for accessing the Hanko Authentication API

Hanko Authentication SDK for Python This package is maintained by Hanko. Contents Introduction Documentation Installation Usage Prerequisites Create a

Hanko.io 3 Mar 08, 2022
IBD Style Relative Strength Percentile Ranking of Stocks (i.e. 0-100 Score).

relative-strength IBD Style Relative Strength Percentile Ranking of Stocks (i.e. 0-100 Score). I also made a TradingView indicator, but it cannot give

57 Jan 06, 2023
This is a Discord script that will provide a QR Code to your scholars for Axie Infinity.

DiscordQRCodeBot This is a Discord script that will provide a QR Code to your Axie Infinity scholars. Setup Run Ubuntu on AWS ec2 instance Dowloads al

ZracheSs | xyZ 24 Oct 05, 2022
Discord RPC for Notion written in Python

Discord RPC for Notion This is a program that allows you to add your Notion workspace activities to your Discord profile. This project is currently un

Thuliumitation 1 Feb 10, 2022