antiwork reddit data set

code and data to generate the antiwork dataset

project splash

antiwork Reddit Data Set

Project Summary

This repository covers the code and data to generate the antiwork dataset:

Config software

Harvesting bare posts

Need to download basic metadata of posts using PushShift API

Enrich data with freshest API info

Uses the post_id of harvested heads to get full api data using PRAW and save as Pickle file

Build Working CSV files

Work done in Master_Builder.ipynb.

Features

Example data generated

Header data set

Example PRAW item fetched

Sample of 1000 records as CSV

Analysis Notebooks

Data is built locally. Once that is done analysis it completed in Google Colab according to the following Notebooks: