A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
Aqui está disponível GRATUITAMENTE, um bot de discord feito em python, saiba que, terá que criar seu bot como aplicação, e utilizar seu próprio token, e lembrando, é um bot básico, não se utiliza Cogs nem slash commands nele!

BotDiscordPython Aqui está disponível GRATUITAMENTE, um bot de discord feito em python, saiba que, terá que criar seu bot como aplicação, e utilizar s

Matheus Muguet 4 Feb 05, 2022
ANKIT-OS/TG-SESSION-HACK-BOT: A Special Repository.Telegram Bot Which Can Hack The Victim By Using That Victim Session

🔰 ᵀᴱᴸᴱᴳᴿᴬᴹ ᴴᴬᶜᴷ ᴮᴼᵀ 🔰 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🛠️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs 🔰 • If

ANKIT KUMAR 2 Dec 24, 2021
A feishu bot daily push arxiv latest articles.

arxiv-feishu-bot We develop A simple feishu bot script daily pushes arxiv latest articles. His effect is as follows: Of course, you can also use other

huchi 6 Apr 06, 2022
DB-Drive-CSV - This is app is can be used to access CSV file as JSON from Google Drive.

DB Drive CSV This is app is can be used to access CSV file as JSON from Google Drive. How To Use Create file/ upload file to Google Drive There's 2 fi

Hartawan Bahari M. 5 Oct 20, 2022
Kodi script for proper Australian weather data

Kodi Oz Weather weather.ozweather Script for Kodi for high quality Australian weather data sourced directly from the BOM. Available from the Kodi offi

Jeremy Daalder 5 Nov 24, 2022
An EmbedBuilder in Python for discord.py embeds. Pip Module.

Discord.py-MaxEmbeds An EmbedBuilder for Discord bots in Python. You need discord.py to use this module. Installation Step 1 First you have to install

Max Tischberger 6 Jan 13, 2022
Telegram Bot to learn English by words and more.. ( in Arabic )

Get the mp3 files Extract the mp3.rar on the same file that bot.py on install requirements pip install -r requirements.txt #Then enter you bot token

Plugin 10 Feb 19, 2022
D(HE)ater is a security tool can perform DoS attack by enforcing the DHE key exchange.

D(HE)ater D(HE)ater is an attacking tool based on CPU heating in that it forces the ephemeral variant of Diffie-Hellman key exchange (DHE) in given cr

Balasys 138 Dec 15, 2022
Discord raiding tool. Made in python 3.9

XSpammer Discord raiding tool with 20 features. YT Showcase Requirements/Installation Python 3.7+ [https://python.org] Run setup.bat to install the es

Tiie 6 Oct 24, 2022
Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Automate activities in Control Tower provisioned AWS accounts Table of contents Introduction Architecture Prerequisites Tools and services Usage Clean

AWS Samples 20 Dec 07, 2022
AKShare is an elegant and simple financial data interface library for Python, built for human beings

AKShare is an elegant and simple financial data interface library for Python, built for human beings

AKFamily 5.8k Dec 30, 2022
Dashboard to monitor the performance of your Binance Futures account

futuresboard A python based scraper and dashboard to monitor the performance of your Binance Futures account. Note: A local sqlite3 database config/fu

86 Dec 29, 2022
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

1 Dec 07, 2021
🦈 Blahaj is a discord bot that shares random images of himself on discord.

🦈 Blahaj Bot Blahaj is a discord bot that shares random images of himself on discord. ⚙️ Developer's Guide To use the bot, follow along the steps Hea

Atul Anand 3 Oct 21, 2022
Pybt: a BaoTa panel python sdk

About Pybt is a BaoTa panel python sdk. Pybt 是一个宝塔面板API的Python版本sdk封装库。 公司很多服务器都装了宝塔面板,通过宝塔来部署、安装、维护一些服务,服务器的数量上以后,导致了维护的不方便,这个时候就想使用宝塔提供的API来开发一个运维平台

Adam Zhang 9 Dec 05, 2022
A simple script that can be used to track real time that user was online in telegram

TG_OnlineTracker A simple script that can be used to track real time that user was online in telegram Join @DaisySupport_Official 🎵 for help 🏃‍♂️ Ea

Inuka Asith 15 Oct 23, 2022
Projeto sobre BioInformática - MoA (mecanismos de ação)

Projeto: MoA no Paredawn Projeto sobre Bioinformatica - Mecanismos de Ação (MoA) MODELO PREDITIVO PARA PREVER O ATIVAMENTO DO MOA E MODELO PARA PREVER

Junior Torres 36 Feb 15, 2022
Ice-Userbot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy

Ice-Userbot Telegram Ice-Userbot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy. Berbasis Paperplane dan ProjectB

6 Apr 29, 2022
Display relevant information for the amazing Banano coin.

Display relevant information for the amazing Banano coin. It'll also show your current [email 

Ron Talman 4 Aug 14, 2022
A pdisk uploader bot written in Python

Pdisk Uploader Bot 🔥 Upload on Pdisk by Url, File and also by direct forward post from other channel... Features Post to Post Conversion Url Upload D

Paritosh Kumar 33 Oct 21, 2022