SrcLog.com

airscholar

👤 Developer

3 repositories on SrcLog

View on GitHub

3 Repos

580 Stars

271 Forks

580 Watchers

Repositories (3)

e2e-data-engineering airscholar/e2e-data-engineering Python

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

323 147

RedditDataEngineering airscholar/RedditDataEngineering Python

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

213 94

RealtimeStreamingEngineering airscholar/RealtimeStreamingEngineering Python

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.

44 30