textract

dbashford

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

nodejs

View on GitHub

1.7k Stars

198 Forks

41 Watchers

HTML Language

mit License

100 SrcLog Score

Cost to Build

$261.4K

Market Value

$1.01M

How is this calculated?

Growth over time

4 data points · 2026-04-08 → 2026-07-22

Stars Forks Watchers

💬

How do you feel about this project?

Ask AI about textract

Question copied to clipboard

What is the dbashford/textract GitHub project? Description: "node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!". Written in HTML. Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone textract

Clone via HTTPS

git clone https://github.com/dbashford/textract.git

Clone via SSH

[email protected]:dbashford/textract.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the textract issue tracker:

Open GitHub Issues

Similar to textract

freeCodeCamp electron node axios meteor express freecodecamp.cn pm2 awesome-nodejs hackathon-starter hexo 30-seconds-of-code standard react-starter-kit webtorrent nativefier mocha nodebestpractices ava node-lessons keystone-classic mysql N-blog sheetjs date-fns smartcrop.js pkg best-resume-ever wp-calypso svgo