unstructured

unstructured

Unstructured-IO

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

12.3k Stars
1k Forks
12.3k Watchers
HTML Language
apache-2.0 License
Cost to Build
$7.82M
Market Value
$40.45M

Growth over time

5 data points  ·  2023-02-01 → 2025-08-01
Stars Forks Watchers
💬

How do you feel about this project?

Ask AI about unstructured

Question copied to clipboard

What is the Unstructured-IO/unstructured GitHub project? Description: "Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.". Written in HTML. Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone unstructured

Clone via HTTPS

git clone https://github.com/Unstructured-IO/unstructured.git

Clone via SSH

[email protected]:Unstructured-IO/unstructured.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the unstructured issue tracker:

Open GitHub Issues