Reddit Post Dataset. The Reddit Sentiment Analysis Data Pipeline is designed to coll
The Reddit Sentiment Analysis Data Pipeline is designed to collect live comments from Reddit using the Reddit API, pass them through Kafka r/datasets Current search is within r/datasets Remove r/datasets filter and expand search to all of Reddit Dataset containing Reddit Posts and Comments from various different subreddits. The dataset is ~1. Buy social media datasets from platforms such as Facebook, Instagram,, TikTok, YouTube, and Reddit. A total of 948,169 subreddits are included, the list of subreddits included in the Spatial problem: Suitability of new locations for your favorite chain store. I also appreciate it if you It encompasses posts and comments from 948,169 individual subreddits, each from its inception until October 2018. Use OpenStreetMap for the data. The dataset consists of 3,848,330 posts with an average length This dataset contains raw data from the Reddit subreddit r/unpopularopinion, collected on June 5, 2025. g. It offers valuable insight into the discussions, popular topics, and In this post, we will develop a tool in Python to collect publicly available Reddit posts from any (public) subreddit (s), including their This project explores a dataset of Reddit posts to uncover insights into user engagement, popular topics, and trends across various subreddits. I define Reddit 数据集是来自 2014 年 9 月发布的 Reddit 帖子的图形数据集。在这种情况下,节点标签是帖子所属的社区或“subreddit”。已对 We have compiled a Reddit post and comment dataset for your analysis. I want to create a directory of extreme and absurd datasets as a side project and would love to help you in return for ideas. This corpus contains preprocessed posts from the Reddit dataset. from 1 Dec 2022 to 10 Jan 2023)? Is the new API dataset complete with all the December posts? I am currently doing a massive analysis of Reddit's entire publicly available comment dataset. Have you tried toying around with GDELT or The Reddit dataset contains tuples of user name, a subreddit where the user makes a comment to a thread, and a timestamp for the interaction, split into sessions manually. By performing data cleaning, exploratory data Given the changes to the Reddit API, is there any way I could scrape the entire historical data of a subreddit? or would some sort of web scraping be necessary? I found Reddit's API to be quite Reddit post collected from nineteen top subredditsSomething went wrong and this page crashed! If the issue persists, it's likely a problem on our side. 7 billion JSON objects complete with Learn how to scrape Reddit for social data types from subreddits, posts, and user pages using plain HTTP requests and bypass Reddit content can be leveraged for testing or training natural language processing models such as content moderation or sentiment In this article, I’m going to show you how to use Pushshift to scrape a large amount of Reddit data and create a dataset. It includes 100 recent Each Corpus contains posts and comments from an individual subreddit from its inception until Oct 2018. This dataset is organized into individual corpora for each subreddit, This dataset contains a collection of Reddit post submissions from various Machine Learning and Data Science subreddits. We’re on a journey to advance and democratize artificial intelligence through open source and open science. It aims to contain all climate change discussion on Reddit in a set of CSV files - hopefully helping bridge real world Research Ideas Finding correlations between different types of datasets Determining which datasets are most popular on Reddit Analyzing the The author provides a positive outlook on the potential uses of Reddit data, proposing several innovative applications that could be developed from the collected datasets, such as an oracle . In addition to monthly dumps, Pushshift provides computational DataIsBeautiful is for visualizations that effectively convey information. Aesthetics are an important part of information visualization, but pretty pictures are not the sole aim of this We’re on a journey to advance and democratize artificial intelligence through open source and open science. Hundreds of millions of social media public Pushshift's Reddit dataset is updated in real-time, and includes historical data back to Reddit's inception. How do I fetch all data (posts, comments, etc. ) for a specific date range (e.