77 Forks
155 Stars
155 Watchers

RedditDataEngineering

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

How to download and setup RedditDataEngineering

Open terminal and run command
git clone https://github.com/airscholar/RedditDataEngineering.git
git clone is used to create a copy or clone of RedditDataEngineering repositories. You pass git clone a repository URL.
it supports a few different network protocols and corresponding URL formats.

Also you may download zip file with RedditDataEngineering https://github.com/airscholar/RedditDataEngineering/archive/master.zip

Or simply clone RedditDataEngineering with SSH
[email protected]:airscholar/RedditDataEngineering.git

If you have some problems with RedditDataEngineering

You may open issue on RedditDataEngineering support forum (system) here: https://github.com/airscholar/RedditDataEngineering/issues