aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

lambda aws serverless

View on GitHub

166 Stars

20 Forks

166 Watchers

TypeScript Language

mit License

100 SrcLog Score

Cost to Build

$92.2K

Market Value

$167.5K

How is this calculated?

Growth over time

14 data points · 2021-05-01 → 2026-04-01

Stars Forks Watchers

💬

How do you feel about this project?

Ask AI about aws-pdf-textract-pipeline

Question copied to clipboard

What is the aeksco/aws-pdf-textract-pipeline GitHub project? Description: ":mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript ". Written in TypeScript. Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone aws-pdf-textract-pipeline

Clone via HTTPS

git clone https://github.com/aeksco/aws-pdf-textract-pipeline.git

Clone via SSH

[email protected]:aeksco/aws-pdf-textract-pipeline.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the aws-pdf-textract-pipeline issue tracker:

Open GitHub Issues

Similar to aws-pdf-textract-pipeline

serverless data-science-ipython-notebooks localstack apex aws-cli awesome-aws up chalice aws-sdk-js aws-sdk-go aws-shell awless dev-setup saws boto3 kubespray serverless-application-model caprover amplify-js awesome-kubernetes practicalnode aws-sam-cli go-cloud claudia security_monkey aws-sdk-ruby aws-sdk-java ice empire docker-curriculum