This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.
What is the airscholar/RealtimeStreamingEngineering GitHub project? Description: "This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.". Written in Python. Explain what it does, its main use cases, key features, and who would benefit from using it.
Question is copied to clipboard — paste it after the AI opens.
Clone via HTTPS
Clone via SSH
Download ZIP
Download master.zipReport bugs or request features on the RealtimeStreamingEngineering issue tracker:
Open GitHub Issues